15. Median
The value which occupies
the middle position when
all the observations are
arranged in an
ascending/descending
order.
It divides the frequency distribution exactly into two halves. Median is the 50th
percentile.
Median is also known as ‘positional average’.
If the number of observations are odd, then (n + 1)/2th observation (in the ordered set)
is the median.
When the total number of observations are even, it is given by the mean of n/2th and
(n/2 + 1)th observation.
It is not distorted by outliers/skewed data.
It does not take into account the precise value of each observation and hence does not
use all information available in the data.
Median of the pooled group cannot be expressed in terms of the individual medians of
the pooled groups.
16. Categorical Variables
Cannot compute central tendency by
Mean.
Mode is probably the best way to
represent the central tendency for these
variables.
Mode : Defined as the value that occurs
most frequently in the data. Can have
more than 1 value.
Not a good representative for small
samples.
18. Measurement of spread of data
(variability)
The first step in assessing spread of data is to examine it in either a table
or an appropriate graphical form.
A graph often makes clear any symmetry (or lack of it) in the spread of
data, whether there are obvious atypical values (outliers) and whether the
data is skewed in one direction or the other (a tendency for more values
to fall in the upper or lower tail of the distribution).
Range : highest value - lowest value
Percentiles: Q% of data is <= x, then x is the Qth percentile
Variance / Standard Deviation:
19. Percentile
Percentile Meaning
1st Quartile 25% of values are less than this
2nd Quartile 50% of values are less than this
3rd Quartile 75% of values are less than this
4th Quartile 100% of values are less than this
>summary(Sachin$Runs)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 8.00 28.50 40.77 63.00 200.00 11
20. Variance &
Standard Deviation
The average of
the squared differences from
the Mean.
SD = Square root of variance > var(Sachin$Runs)
NA
> var(Sachin$Runs, na.rm=TRUE)
1603.16
> sd(Sachin$Runs)
NA
> sd(Sachin$Runs, na.rm=TRUE)
40.03948
25. Describing data by tables and graphs
Qualitative variable (Category Variables)
The number of observations that fall into
particular class (or category) of the
qualitative variable is called the frequency
(or count) of that class. A table listing all
classes and their frequencies is called a
frequency distribution.
27. Describing data by tables and graphs
Relative Frequency Distribution
>
as.data.frame(table(Sachin$Op
position)/nrow(Sachin))
Var1 Freq
1 Australia 0.153347732
2 Bangladesh 0.025917927
3 Bermuda 0.002159827
4 England 0.079913607
5 Ireland 0.004319654
6 Kenya 0.021598272
7 Namibia 0.002159827
8 Netherlands 0.004319654
9 New Zealand 0.090712743
10 Pakistan 0.149028078
11 South Africa 0.123110151
12 Sri Lanka 0.181425486
13 U.A.E. 0.004319654
14 West Indies 0.084233261
15 Zimbabwe 0.073434125
>
as.data.frame(table(S
achin$Opposition))
Var1 Freq
1 Australia 71
2 Bangladesh 12
3 Bermuda 1
4 England 37
5 Ireland 2
6 Kenya 10
7 Namibia 1
8 Netherlands 2
9 New Zealand 42
10 Pakistan 69
11 South Africa 57
12 Sri Lanka 84
13 U.A.E. 2
14 West Indies 39
15 Zimbabwe 34
Frequency Distribution
The number of observations
that fall into particular class
30. Quantitative Variables
Cumulative frequency
Cumulative relative frequency
Exercise 2: Find the cumulative frequency distribution of
runs of a player (?cumsum)
Exercise 3: Find the relative cumulative frequency
distribution of runs of a player