Statistics
Statistics is the science which deals with the
methods of collecting, classifying, presenting,
comparing and interpreting numerical data
collected to throw some light on any sphere of
enquiry. (Selligman)
Statistics, very generally, is the branch of
mathematics, pure and applied, which deals with
collecting, classifying and analyzing data (Reber,
1987)
Type of Statistics
On the basis of function
On the basis of data
distribution
On the basis of function
1. Descriptive statistic (Mean, Median, Mode, SD,
range, Frequency)
2. Correlation Statistics
3. Inferential statistics
On the basis of data distribution
1. Parametric statistics
2. Non- parametric statistics
Population: The term population and universe mean all
the members of any well defined class of people ,
events or objects(Kerlinger 1986)
Sample: Sample is a part of population selected (usually
according to some procedure and some purpose in
mind) such that it is considered to be representation
of the population as a whole.
Parametric statistics:
• Normal distribution
• Independence of samples
• Arithmetic operation
• Method of random sampling
• Size of sample
• Interval data
(Mean, t- test, F test ,Standard deviation pearson
correlation)
Non parametric statistics
• Distribution free
• Unbiased observation
• Sample size
• Ordinal data, Nominal data
(chi square, Mann Whitney U-test, Spearman’s Rank
difference method of correlation, Two way
ANOVA etc.)
• Definition of Skewness
The term ‘skewness’ is used to mean the absence of
symmetry from the mean of the dataset. It is characteristic
of the deviation from the mean, to be greater on one side
than the other, i.e. attribute of the distribution having one
tail heavier than the other. Skewness is used to indicate the
shape of the distribution of data.
In a skewed distribution, the curve is extended to either left
or right side. So, when the plot is extended towards the
right side more, it denotes positive skewness, wherein
mode < median < mean. On the other hand, when the plot
is stretched more towards the left direction, then it is called
as negative skewness and so, mean < median < mode.
• Definition of Kurtosis
In statistics, kurtosis is defined as the parameter of relative
sharpness of the peak of the probability distribution curve. It
ascertains the way observations are clustered around the centre
of the distribution. It is used to indicate the flatness or
peakedness of the frequency distribution curve and measures the
tails or outliers of the distribution.
Positive kurtosis represents that the distribution is more peaked
than the normal distribution, whereas negative kurtosis shows
that the distribution is less peaked than the normal distribution.
There are three types of distributions:
• Leptokurtic: Sharply peaked with fat tails, and less variable.
• Mesokurtic: Medium peaked
• Platykurtic: Flattest peak and highly dispersed.
Significant of the difference between means: t- test
t-test is a statistical test that is used to compare the means of two groups. It is often
used in hypothesis testing to determine whether a process or treatment actually has
an effect on the population of interest, or whether two groups are different from one
another.
When to use a t-test
A t-test can only be used when comparing the means of two groups (a.k.a. pairwise
comparison). If you want to compare more than two groups, or if you want to do
multiple pair wise comparisons, use an ANOVA test or a post-hoc test.
The t-test is a parametric test of difference, meaning that it makes the same
assumptions about your data as other parametric tests. The t-test assumes your data:
• are independent
• are (approximately) normally distributed.
• have a similar amount of variance within each group being compared (a.k.a.
homogeneity of variance)
• If your data do not fit these assumptions, you can try a nonparametric alternative to
the t-test, such as the Wilcoxon Signed-Rank test for data with unequal variances.
1 – One-Sample T-Test
While performing this test, the mean or average of one group is compared
against the set average, which is either the theoretical value or means of the
population. For example, a teacher wishes to figure out the average height of
the students of class 5 and compare the same against a set value of more than
45 kgs.
The teacher first randomly selects a group of students and records individual
weights to achieve this. Next, she finds out the mean weight for that group
and checks if it meets the standard set value of 45+. The formula used to
obtain one-sample t-test results is:
• Where,
• T = t-statistic
• m = mean of the group
• = theoretical mean value of the population
• s = standard deviation of the group
• n = sample size
• 2 – Independent Two-Sample T-Test
This is the test conducted when samples from two different
groups, species, or populations are studied and compared. It is
also known as an independent T-test. For example, if a teacher
wants to compare the height of male students and female
students in class 5, she would use the independent two-sample
test.
The T-test formula used to calculate this is:
Where,
• mA – mB = means of samples from two different groups or
populations
• nA – nB = respective sample sizes
• s2 = standard deviation or common variance of two samples
• 3 – Paired Sample T-Test
This hypothesis testing is conducted when two groups
belong to the same population or group. The groups are
studied either at two different times or under two varied
conditions. The formula used to obtain the t-value is:
Where,
• T = t-statistic
• m = mean of the group
• = theoretical mean value of the population
• s = standard deviation of the group
• n = sample size
Correlation: Correlation means association - more precisely it is a measure of the extent
to which two variables are related. There are three possible results of a correlational study: a
positive correlation, a negative correlation, and no correlation.
A scattergraph indicates the strength and direction of the correlation between the co-
variables
Some uses of Correlations
Prediction
• If there is a relationship between two variables, we can
make predictions about one from another.
Validity
• Concurrent validity (correlation between a new measure
and an established measure).
Reliability
• Test-retest reliability (are measures consistent).
• Inter-rater reliability (are observers consistent).
• Theory verification
• Predictive validity
Correlation Coefficients: Determining Correlation Strength
Instead of drawing a scattergram a correlation can be expressed numerically as a coefficient,
ranging from -1 to +1. When working with continuous variables, the correlation
coefficient to use is Pearson’s r.
1.The product moment correlation coefficient
2. The Rank difference correlation method
3. The Biserial correlation method
4. The point Biserial correlation method
5. The Tetrachoric correlation method
6. The Phi- coefficient method
7. The coefficient of contingency method
8. The partial correlation method
9. The multiple correlation method

satatistics Presentation.pptx

  • 1.
    Statistics Statistics is thescience which deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to throw some light on any sphere of enquiry. (Selligman) Statistics, very generally, is the branch of mathematics, pure and applied, which deals with collecting, classifying and analyzing data (Reber, 1987)
  • 2.
    Type of Statistics Onthe basis of function On the basis of data distribution
  • 3.
    On the basisof function 1. Descriptive statistic (Mean, Median, Mode, SD, range, Frequency) 2. Correlation Statistics 3. Inferential statistics
  • 4.
    On the basisof data distribution 1. Parametric statistics 2. Non- parametric statistics Population: The term population and universe mean all the members of any well defined class of people , events or objects(Kerlinger 1986) Sample: Sample is a part of population selected (usually according to some procedure and some purpose in mind) such that it is considered to be representation of the population as a whole.
  • 5.
    Parametric statistics: • Normaldistribution • Independence of samples • Arithmetic operation • Method of random sampling • Size of sample • Interval data (Mean, t- test, F test ,Standard deviation pearson correlation)
  • 6.
    Non parametric statistics •Distribution free • Unbiased observation • Sample size • Ordinal data, Nominal data (chi square, Mann Whitney U-test, Spearman’s Rank difference method of correlation, Two way ANOVA etc.)
  • 7.
    • Definition ofSkewness The term ‘skewness’ is used to mean the absence of symmetry from the mean of the dataset. It is characteristic of the deviation from the mean, to be greater on one side than the other, i.e. attribute of the distribution having one tail heavier than the other. Skewness is used to indicate the shape of the distribution of data. In a skewed distribution, the curve is extended to either left or right side. So, when the plot is extended towards the right side more, it denotes positive skewness, wherein mode < median < mean. On the other hand, when the plot is stretched more towards the left direction, then it is called as negative skewness and so, mean < median < mode.
  • 8.
    • Definition ofKurtosis In statistics, kurtosis is defined as the parameter of relative sharpness of the peak of the probability distribution curve. It ascertains the way observations are clustered around the centre of the distribution. It is used to indicate the flatness or peakedness of the frequency distribution curve and measures the tails or outliers of the distribution. Positive kurtosis represents that the distribution is more peaked than the normal distribution, whereas negative kurtosis shows that the distribution is less peaked than the normal distribution. There are three types of distributions: • Leptokurtic: Sharply peaked with fat tails, and less variable. • Mesokurtic: Medium peaked • Platykurtic: Flattest peak and highly dispersed.
  • 9.
    Significant of thedifference between means: t- test t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. When to use a t-test A t-test can only be used when comparing the means of two groups (a.k.a. pairwise comparison). If you want to compare more than two groups, or if you want to do multiple pair wise comparisons, use an ANOVA test or a post-hoc test. The t-test is a parametric test of difference, meaning that it makes the same assumptions about your data as other parametric tests. The t-test assumes your data: • are independent • are (approximately) normally distributed. • have a similar amount of variance within each group being compared (a.k.a. homogeneity of variance) • If your data do not fit these assumptions, you can try a nonparametric alternative to the t-test, such as the Wilcoxon Signed-Rank test for data with unequal variances.
  • 10.
    1 – One-SampleT-Test While performing this test, the mean or average of one group is compared against the set average, which is either the theoretical value or means of the population. For example, a teacher wishes to figure out the average height of the students of class 5 and compare the same against a set value of more than 45 kgs. The teacher first randomly selects a group of students and records individual weights to achieve this. Next, she finds out the mean weight for that group and checks if it meets the standard set value of 45+. The formula used to obtain one-sample t-test results is: • Where, • T = t-statistic • m = mean of the group • = theoretical mean value of the population • s = standard deviation of the group • n = sample size
  • 11.
    • 2 –Independent Two-Sample T-Test This is the test conducted when samples from two different groups, species, or populations are studied and compared. It is also known as an independent T-test. For example, if a teacher wants to compare the height of male students and female students in class 5, she would use the independent two-sample test. The T-test formula used to calculate this is: Where, • mA – mB = means of samples from two different groups or populations • nA – nB = respective sample sizes • s2 = standard deviation or common variance of two samples
  • 12.
    • 3 –Paired Sample T-Test This hypothesis testing is conducted when two groups belong to the same population or group. The groups are studied either at two different times or under two varied conditions. The formula used to obtain the t-value is: Where, • T = t-statistic • m = mean of the group • = theoretical mean value of the population • s = standard deviation of the group • n = sample size
  • 13.
    Correlation: Correlation meansassociation - more precisely it is a measure of the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation. A scattergraph indicates the strength and direction of the correlation between the co- variables
  • 14.
    Some uses ofCorrelations Prediction • If there is a relationship between two variables, we can make predictions about one from another. Validity • Concurrent validity (correlation between a new measure and an established measure). Reliability • Test-retest reliability (are measures consistent). • Inter-rater reliability (are observers consistent). • Theory verification • Predictive validity
  • 15.
    Correlation Coefficients: DeterminingCorrelation Strength Instead of drawing a scattergram a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.
  • 16.
    1.The product momentcorrelation coefficient 2. The Rank difference correlation method 3. The Biserial correlation method 4. The point Biserial correlation method 5. The Tetrachoric correlation method 6. The Phi- coefficient method 7. The coefficient of contingency method 8. The partial correlation method 9. The multiple correlation method