SlideShare a Scribd company logo
1 of 43
Basic Statistical Knowledge
(For Bio Science Student)
Dipika Patra
Department of Statistics
S.A.Jaipuria College, Kolkata
Email: dipika.patra1988@gmail.com
Image of a Frequency table
Basic Statistics
Measure of Central Tendency
(how to describe a set of data by identifying
the central position within that set of data)
Mean
The mean (or average) is the most popular and well known measure of
central tendency. It can be used with both discrete and continuous
data, although its use is most often with continuous data and
represented as
Median
Middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data.
In order to calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude:
14 35 45 55 55 56 56 65 87 89 92
In this case, 56 middle mark because there are 5 scores before it
and after it. This works fine when you have an odd number of scores.
But what will be the median if you have an even number??
(take the middle two scores and average)
Different Measure of Central Tendency
Median for Grouped Data
Where
Example
Class
Interval f <cf
LL UL
11 22 3 3
23 34 5 8
35 46 11 19
47 58 19 38
59 70 14 52
71 82 6 58
83 94 2 60
60
Mode
The mode is the most frequent score in our data set. On a
histogram it represents the highest bar in a bar chart or
histogram.
However, one of the
problems with the mode
is that it is not
unique, so it leaves
us with problems when
we have two or more
values that share the
highest frequency.
***
Mean-mode=3(mean- median)
Different Measure of Central Tendency
Mode for Grouped Data
Formula:
Steps in Computing the Mode for Grouped Data
1. Determine the modal class.
2. Get the value of ∆1.
3. Get the value of ∆2.
4. Get the lower boundary of the modal class.
5. Apply the formula by substituting the values
obtained in the preceding steps.
Example
Class
Interval f
LL UL
11 22 3
23 34 5
35 46 11
47 58 19
59 70 14
71 82 6
83 94 2
60
Example:
Example 1:
Ms. Sulit collects the data on the ages of Mathematics teachers
in Santa Rosa School, and her study yields the following:
38 35 28 36 35 33 40
Solution:
= 35
Based on the computed mean, 38 is the average age of Mathematics
teachers.
Example
Below are Amaya’s subjects and the corresponding number of
units and grades she got for the previous grading period.
Compute her grade point average.
Subject Units Grade
Filipino .9 86
English 1.5 85
Mathematics 1.5 88
Science 1.8 87
Social Studies .9 86
TLE 1.2 83
MAPEH 1.2 87
= 86.1
Percentile
Provides information about how the data are spread over the
interval from the smallest value to largest value.
i.e.
P th percentile is a value such that at least p% of the items
take on this value or less and at least (100-p)% takes on the
value or more.
Quartile
a. First Quartile (Q1) = 25th percentile
b. Second Quartile (Q2) = 50th percentile(median)
c. Third Quartile (Q3) = 75th percentile
******
Computation: p th percentile is the value in the I th position.
i=(p/100)n
Different Measure of Central Tendency
Measure of Dispersion
(how "spread out" a group of scores is)
 Range: Difference between Largest and smallest value of dataset
 Interquartile Range: Difference between Third and First
Quartile
 Variance: Average of the Squared difference between each data
value and the mean.
For sample, For Population
 Standard deviation: Positive Square root of Variance.
For Sample, For Population
 Quartile Deviation: Half of the distance between first and third
quartile.
Mean Absolute deviation: Average of the absolute deviation of
values from “a”, ”a” is the central value.
Coefficient of variation: How large the Standard Deviation is in
relation to mean, defined as (standard deviation/mean)X 100
Measure of Dispersion
(how "spread out" a group of scores is)
Distribution Shape
 Skewness : It is the measure of asymmetry of a frequency
distribution.
Zero indicates perfect symmetry; the normal distribution has
a skewness of zero.
Positive skewness indicates that the "tail" of the
distribution is more stretched on the side above the mean.
Negative skewness indicates that the tail of the
distribution is more stretched on the side below the mean."
Mean=Median=Mode
Mode<Median<Mean
Mean<Median<Mode
Skewness = 3(mean-median)/Standard deviation , lie between the interval [-3,3]
Skewness = (Q3 –Q1-2 median)/(Q3-Q1)
 Kurtosis: It is the measure of flatness or peakedness AND
measured by the following equation
The normal distribution has a kurtosis of 3 (MESOKURTIC).
Positive kurtosis (>3) indicates a relatively peaked
distribution (LEPTOKURTIC).
Negative kurtosis (<3) indicates a relatively flat
distribution (PLAYTKURTIC).
Another measure of kurtosis is
Distribution Shape
Platykurtic LeptokurticMesokurtic
Probability Distribution
(How probabilities are distributed over the values of Random Variable)
Probability is represented by a real number in the range from 0
to 1. An Impossible event has a probability of 0 and a certain
event has a probability of 1.
i.e.
Probability refers to chance or likelihood of a particular
event take place.
Defined as (number of favorable outcomes/the possible outcomes)
Discrete Distribution
(Binomial ,Poisson…)
 Binomial
Where a variable follows Binomial Distribution?
 Failure and success in exam
 Determination of probabilities of occurrence of certain
combination of head & tail
 Defective and non-defective items from a manufacturing
process
Probability Mass Function of Binomial:
where
p(x)= probability of x success in n trials
n= number of trials
p= probability of success in one trial
 Mean=np ,Variance=npq , Standard Deviation=√(npq)
 if p=0.5, skewness 0
 if p<0.5, positively skewed,
 if p>0.5, negatively skewed
 Poisson
A binomial distribution will become a Poisson Distribution if
number of trial n->infinite, probability of success p->0,
but np=λ is finite.
Probability Mass Function of Poisson:
 Mean=Variance=λ
 Positively skewed, coefficient of skewness= 1/√λ
 Leptokurtic as ϒ2=1/λ which is > 0
Real life example:
 Number of printing mistake per page in large text
 Number of telephone call in per unit interval of time
 Number of death from rare disease.
Discrete Distribution
(Binomial ,Poisson…)
The normal distribution is the most important and most widely
used distribution in statistics. It is sometimes called the
"bell curve.“
Probability Density Function of Normal distribution:
Seven features of normal distributions are listed below. These
features are illustrated in more detail in the remaining
sections of this chapter.
 Normal distributions are symmetric around their mean.
 The mean, median, and mode of a normal distribution are equal.
 The area under the normal curve is equal to 1.0.
 Normal distributions are defined by two parameters, the mean
(μ) and the standard deviation (σ).
 68% of the area of a normal distribution is within one
standard deviation of the mean.
 Approximately 95% of the area of a normal distribution is
within two standard deviations of the mean.
Continuous Distribution
(Normal…..)
Sampling Distribution
What is sampling distribution?
 A sampling distribution is created by sampling.
 Sampling distribution is defined as the frequency
distribution of the statistic for many samples.
 If it is the distribution of means and is also called the
sampling distribution of the mean.
Sampling distribution of Mean
 Unbiased
 Variance of sampling distribution of means based on N
observation :
 Large Samples produce sample estimates very close to the
parameter.
 Independent Random samples be drawn from each two Normal
Population , then the sampling distribution of difference
between two sample means will be normally distributed.
Difference between non parametric and
parametric Statistics
Parametric statistics – inferential test that assumes certain
characteristics are true of an underlying population,
especially the shape of its distribution. Commonly used for
normally distributed interval or ratio dependent variables.
Non-parametric statistics – inferential test that makes few or no
assumptions about the population from which observations were
drawn (distribution-free tests).
There is generally at least one non-parametric equivalent test
for each type of parametric test. Non-parametric statistics are
less powerful that parametric tests.
Non-parametric tests are generally used when assumptions about
the underlying population are questionable (e.g., non-
normality).Commonly used to analyse DVs that are non-normal or are
nominal or ordinal.
Statistical Inference
Inference
•Use a sample to learn something about a population.
Hypothesis Testing
•A hypothesis is an assertion about the population.
•A statistical method that uses sample data to evaluate a
hypothesis about a population parameter.
•We answer a question such as, “If the hypothesis were true,
would it be unlikely to get data such as we obtained?”
Why Test?
•Statistics is an experimental science, not really a branch of
mathematics.
•It’s a tool that can tell you whether data are accidentally or
really similar.
•It does not give you certainty.
Statistical Inference
Five Parts of a Test
• Assumptions about type of data (quantitative, categorical),
population distribution (e.g., normal, binomial)
•Hypotheses:
Null hypothesis(H0): A statement indicating “no effect”
Alternative hypothesis(Ha): A statement indicating “an effect”
•Test Statistic:
A function of data to measure discrepancy between the null and
alternative hypotheses.
•P-value (p):
A measure of evidence about the null hypothesis H0. The smaller
the P-value, the stronger the evidence against H0.
•Conclusion:
Select a significance level (such as 0.05 or 0.01) and reject H0if
P-value ≤ significance level. Otherwise, we fail to reject H0,
i.e. H0is not necessarily true, but it is plausible.
Hypothesis Testing P- value approach
The P-value approach involves determining "likely" or "unlikely"
by determining the probability — assuming the null hypothesis
were true. If the P-value is small, say less than (or equal to)
α, then it is "unlikely." And, if the P-value is large, say more
than α, then it is "likely.“
Four Steps involve in P- value approach:
1> specify null and alternative hypothesis.
2> Calculate the value of test statistic using sample data and
assuming null hypothesis is true.
3> Using the known distribution of the test statistic, calculate
the P-value:
i.e. "If the null hypothesis is true, what is the probability
that we'd observe a more extreme test statistic in the direction
of the alternative hypothesis than we did?“
4> Set the significance level, α, the probability of making a
Type I error to be small — 0.01, 0.05, or 0.10. Compare the P-
value to α.
An Example :-
Suppose we want to know if the new drug had an influence on
IQ.
Null hypothesis – The average IQ of the people that uses the
drug is 100
Alternative hypothesis – The average IQ of a the people that
uses the drug is not 100
We need the following data in order to perform a z test:
a) Population mean b) hypothesis mean c) sample size
d)sample mean e) population standard deviation
Then you will calculate something called the one-sample z
test statistic, like this:
Let us our Z-
statistic
value = 2.17
Example:
The z statistic assumes a normal probability distribution,
so we would find the P-value like this:
The area in red is 0.015 +
0.015 = 0.030, 3 percent.
If we had chosen a
significance level of 5
percent, this would mean
that we had achieved
statistical significance.
We would reject the null
hypothesis in favour of
the alternative
hypothesis.
We conclude that we had
evidence that the drug
caused the average IQ to
deviate from 100 IQ
points.
0.025
0.025
Single Sample problems: students t test
If the data is available on a single variable, then an
appropriate test is t test provided the following assumptions
are satisfied:
•The data is quantitative in nature
•Normality is satisfied
•Observations are independent
Examples:
•The cereal packs are sold in market in packets of 300 grams. A
sample of 10 packs revealed the following weights:
296,298,300.1,299,302,290,289,297,296,299.5
•The objective is to know whether the average weight is
actually maintained at 300 grams.
•Null Hypothesis: Mean weight is 300 grams
•Alternative Hypothesis: Mean weight is lower than 300 grams
•The data is quantitative and hence student t test is
appropriate.
•P value 0.016
•Therefore, the data strongly support that the mean weight is
not maintained at 300 grams.
Two Sample Problems: An example
•Consider the data on Cadmium level on persons categorized as
smokers and non-smokers
Smokers: 10, 8.4, 3.5, 8.9, 9.0, 8.8,7.9
Non-Smokers : 3.1,3.5, 4.5, 4.3, 2.2, 2.7
•A Query: Is mean Cadmium level higher among the smokers?
•Fisher's t-test is appropriate
•Null Hypothesis: Mean Cadmium level s are equal in two groups
•Alternative Hypothesis: Mean Cadmium level is higher among the
smokers
•p-value = 0.0003045
•Conclusion: P value<.05 implies the rejection of the null
hypothesis.
Example
Suppose scores of 6 students given special coaching are as
follows:
Before: 8,3,4,5,6,7 After: 9,6,5,5,6,8
Query: Is the special coaching beneficial?
Observe: Experimental units( i.e. students) are the same;
only situations are different.
The assumption of independence is violated.
Paired t test is appropriate instead of Fisher’s t test.
• Null Hypothesis: Mean scores are the same in two situations
• Alternative Hypothesis: Mean score increases after the
special coaching.
Result: t = -2.2361, df= 5, p-value = 0.03779
What will the conclusion here???
Inference in Categorical data:
Sometimes both the independent and dependent variables are
categorical.
(e.g. Treatment (Drug/placebo) versus survival (alive/dead)
or Smoke (Y/N) versus lung cancer (Y/N))
In these situations, we usually count the number (or
proportion) of patients or subjects that fall into each
possible category.
For categorical data, the statistical tests we commonly use
•Test for proportions
•chi-square test
An Example
Suppose that 39 out of 80 people contacted in a survey of city
residents oppose a new tax.
Test whether the data is consistent with the hypothesis that
people accept new taxes.
For the data, number of successes x=39
•Number of data points n=80
•Null hypothesis: theoretical proportion of people accepting
new tax is 50% against it is higher
• P value 0.4875
Conclusion: Since p value is higher than .05, we failed to
reject or we accept the null hypothesis that people accept new
taxes.
Another Example:
Consider the following data on smoking and drinking habits of
500 individuals:
Query: Whether smoking and drinking habits are associated.
Heavy
Smoker
Moderate
Smoker
Non-Smoker
Heavy
Drinker
20 62 6
Moderate
Drinker
40 8 159
Non
Drinker
10 20 175
Null Hypothesis : Smoking and Drinking are
independent.
Alternative : They are not associated.
CHI-Square Test
ANOVA
• An extension of Fisher’s t test
• Number of groups is more than two
One Way Data
(Variation in one direction only(one way))
Categories Dose 1 Dose 2 Dose 3
Sources of variation in the data are
A. Variation within each dose group
B. Variation between dose groups
Main Question: Do the responses depend on which dose group the subject
is in?
The higher is the variation between the dose groups the higher is the
influence of dose groups.
wi
th
in
between
ANOVA Assumptions:
•Normality
•Independence
•Homoscedasticity
Hypothesis : Whether there is any effect
of the doses?
Two Way ANOVA
Two-way ANOVA is a type of study design with one numerical
outcome variable and two categorical explanatory variables.
Example:
Consider yield figures of three paddy varieties when three
Pesticides are used.
•Pesticides & Variety are both categorical
•The data is two way.
Variety
Pesticides A B C
1 33 99 67
2 56 87 65
3 78 89 77
Two hypotheses of interest:
• Whether there is any effect of the pesticides .
• Whether there is any effect of the paddy varieties.
Analysis of Variance Table
Df Sum Sq Mean Sq F value Pr(>F)
variety 2 2225.17 1112.58 44.063 0.000259
pesticide 3 1191.00 397.00 15.723 0.003008
Residuals 6 151.50 25.25
Conclusion: P value(.000259 & .003008) < .05, hence reject the
null hypothesis i.e. Varieties and pesticides are both
significantly effective.
Summary of single Sample Test:
Which Variable
are we looking
for?
Qualitative
Frequency Chi
Square/ Test of
Proportion
Quantitative
Student t test
Summary of Two sample Test:
Which Variable
are we looking
for?
Qualitative
X
Qualitative
Chi Square test
Qualitative
X
Quantitative
ANOVA
Quantitative
X
Quantitative
Paired t
test/Fisher t test
ANCOVA
•A covariate Independent Variable is added to an ANOVA
(can be dichotomous or metric)
•Effect of the covariate on the Dependent Variable is removed .
•Of interest are:
Main effects of IVs and interaction terms
Contribution of CV (akin to Step 1 in HMLR)
•e.g., GPA is used as a CV, when analysing whether there is a
difference in Educational Satisfaction between Males and
Females.
•Reduces variance associated with covariate (CV) from the DV
error (unexplained variance) term
•Increases power of F-test
•May not be able to achieve experimental control over a
variable (e.g., randomisation), but can measure it and
statistically control for its effect.
Assumption of ANCOVA
•Normality
•Homogeneity of Variance
•Independence of observations
•Independence of IV and CV
•Multicollinearity - if more than one CV, they should not
be highly correlated - eliminate highly correlated CVs
•Reliability of CVs - not measured with error - only use
reliable CVs
Thanking You

More Related Content

What's hot

Statistics
StatisticsStatistics
Statistics
itutor
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aileen Balbido
 

What's hot (20)

Two-way Repeated Measures ANOVA
Two-way Repeated Measures ANOVATwo-way Repeated Measures ANOVA
Two-way Repeated Measures ANOVA
 
Regression
RegressionRegression
Regression
 
Statistics
StatisticsStatistics
Statistics
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Measures of Central Tendency and Dispersion
Measures of Central Tendency and DispersionMeasures of Central Tendency and Dispersion
Measures of Central Tendency and Dispersion
 
What is Statistics
What is StatisticsWhat is Statistics
What is Statistics
 
Scales of Measurement - Thiyagu
Scales of Measurement - ThiyaguScales of Measurement - Thiyagu
Scales of Measurement - Thiyagu
 
Two-way Mixed Design with SPSS
Two-way Mixed Design with SPSSTwo-way Mixed Design with SPSS
Two-way Mixed Design with SPSS
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Descriptive statistics ii
Descriptive statistics iiDescriptive statistics ii
Descriptive statistics ii
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Lesson 2 percentiles
Lesson 2   percentilesLesson 2   percentiles
Lesson 2 percentiles
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variability
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Measures of Central tendency
Measures of Central tendencyMeasures of Central tendency
Measures of Central tendency
 

Viewers also liked

Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
piyushdhaker
 
Outskewer: Using Skewness to Spot Outliers in Samples and Time Series
Outskewer: Using Skewness to Spot Outliers in Samples and Time SeriesOutskewer: Using Skewness to Spot Outliers in Samples and Time Series
Outskewer: Using Skewness to Spot Outliers in Samples and Time Series
Sébastien
 
Transporters and Their Role in Drug Interactions
Transporters and Their Role in Drug InteractionsTransporters and Their Role in Drug Interactions
Transporters and Their Role in Drug Interactions
shabeel pn
 

Viewers also liked (20)

Introduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical TermsIntroduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical Terms
 
Introduction to statistics...ppt rahul
Introduction to statistics...ppt rahulIntroduction to statistics...ppt rahul
Introduction to statistics...ppt rahul
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
 
Assignment in regression1
Assignment in regression1Assignment in regression1
Assignment in regression1
 
Multivariate1
Multivariate1Multivariate1
Multivariate1
 
Outskewer: Using Skewness to Spot Outliers in Samples and Time Series
Outskewer: Using Skewness to Spot Outliers in Samples and Time SeriesOutskewer: Using Skewness to Spot Outliers in Samples and Time Series
Outskewer: Using Skewness to Spot Outliers in Samples and Time Series
 
Transporters and Their Role in Drug Interactions
Transporters and Their Role in Drug InteractionsTransporters and Their Role in Drug Interactions
Transporters and Their Role in Drug Interactions
 
An Overview of Basic Statistics
An Overview of Basic StatisticsAn Overview of Basic Statistics
An Overview of Basic Statistics
 
Gravetter & Wallnau 7e Ch 08 PPt Handout
Gravetter & Wallnau 7e Ch 08 PPt HandoutGravetter & Wallnau 7e Ch 08 PPt Handout
Gravetter & Wallnau 7e Ch 08 PPt Handout
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
MD Paediatrics (Part 1) - Overview of Basic Statistics
MD Paediatrics (Part 1) - Overview of Basic StatisticsMD Paediatrics (Part 1) - Overview of Basic Statistics
MD Paediatrics (Part 1) - Overview of Basic Statistics
 
Basic Statistics
Basic  StatisticsBasic  Statistics
Basic Statistics
 
Basic Statistics & Data Analysis
Basic Statistics & Data AnalysisBasic Statistics & Data Analysis
Basic Statistics & Data Analysis
 
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
Mean, median, mode, Standard deviation for grouped data for Statistical Measu...
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
Basic business statistics 2
Basic business statistics 2Basic business statistics 2
Basic business statistics 2
 
Math 102- Statistics
Math 102- StatisticsMath 102- Statistics
Math 102- Statistics
 
Degrees of freedom
Degrees of freedomDegrees of freedom
Degrees of freedom
 

Similar to Basic statistics

QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
makdul
 
Statistics and permeability engineering reports
Statistics and permeability engineering reportsStatistics and permeability engineering reports
Statistics and permeability engineering reports
wwwmostafalaith99
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
 
Descriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing ResearchDescriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing Research
enamprofessor
 

Similar to Basic statistics (20)

Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Data analysis
Data analysisData analysis
Data analysis
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
 
M.Ed Tcs 2 seminar ppt npc to submit
M.Ed Tcs 2 seminar ppt npc   to submitM.Ed Tcs 2 seminar ppt npc   to submit
M.Ed Tcs 2 seminar ppt npc to submit
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
 
Statistics and permeability engineering reports
Statistics and permeability engineering reportsStatistics and permeability engineering reports
Statistics and permeability engineering reports
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
Ds vs Is discuss 3.1
Ds vs Is discuss 3.1Ds vs Is discuss 3.1
Ds vs Is discuss 3.1
 
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
 
250Lec5INFERENTIAL STATISTICS FOR RESEARC
250Lec5INFERENTIAL STATISTICS FOR RESEARC250Lec5INFERENTIAL STATISTICS FOR RESEARC
250Lec5INFERENTIAL STATISTICS FOR RESEARC
 
Descriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing ResearchDescriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing Research
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Advanced statistics
Advanced statisticsAdvanced statistics
Advanced statistics
 
Basic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxBasic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptx
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

Basic statistics

  • 1. Basic Statistical Knowledge (For Bio Science Student) Dipika Patra Department of Statistics S.A.Jaipuria College, Kolkata Email: dipika.patra1988@gmail.com
  • 2. Image of a Frequency table
  • 3. Basic Statistics Measure of Central Tendency (how to describe a set of data by identifying the central position within that set of data)
  • 4. Mean The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data and represented as Median Middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below: 65 55 89 56 35 14 56 55 87 45 92 We first need to rearrange that data into order of magnitude: 14 35 45 55 55 56 56 65 87 89 92 In this case, 56 middle mark because there are 5 scores before it and after it. This works fine when you have an odd number of scores. But what will be the median if you have an even number?? (take the middle two scores and average) Different Measure of Central Tendency
  • 5. Median for Grouped Data Where
  • 6. Example Class Interval f <cf LL UL 11 22 3 3 23 34 5 8 35 46 11 19 47 58 19 38 59 70 14 52 71 82 6 58 83 94 2 60 60
  • 7. Mode The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. However, one of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency. *** Mean-mode=3(mean- median) Different Measure of Central Tendency
  • 8. Mode for Grouped Data Formula: Steps in Computing the Mode for Grouped Data 1. Determine the modal class. 2. Get the value of ∆1. 3. Get the value of ∆2. 4. Get the lower boundary of the modal class. 5. Apply the formula by substituting the values obtained in the preceding steps.
  • 9. Example Class Interval f LL UL 11 22 3 23 34 5 35 46 11 47 58 19 59 70 14 71 82 6 83 94 2 60
  • 10. Example: Example 1: Ms. Sulit collects the data on the ages of Mathematics teachers in Santa Rosa School, and her study yields the following: 38 35 28 36 35 33 40 Solution: = 35 Based on the computed mean, 38 is the average age of Mathematics teachers.
  • 11. Example Below are Amaya’s subjects and the corresponding number of units and grades she got for the previous grading period. Compute her grade point average. Subject Units Grade Filipino .9 86 English 1.5 85 Mathematics 1.5 88 Science 1.8 87 Social Studies .9 86 TLE 1.2 83 MAPEH 1.2 87 = 86.1
  • 12. Percentile Provides information about how the data are spread over the interval from the smallest value to largest value. i.e. P th percentile is a value such that at least p% of the items take on this value or less and at least (100-p)% takes on the value or more. Quartile a. First Quartile (Q1) = 25th percentile b. Second Quartile (Q2) = 50th percentile(median) c. Third Quartile (Q3) = 75th percentile ****** Computation: p th percentile is the value in the I th position. i=(p/100)n Different Measure of Central Tendency
  • 13. Measure of Dispersion (how "spread out" a group of scores is)  Range: Difference between Largest and smallest value of dataset  Interquartile Range: Difference between Third and First Quartile  Variance: Average of the Squared difference between each data value and the mean. For sample, For Population  Standard deviation: Positive Square root of Variance. For Sample, For Population
  • 14.  Quartile Deviation: Half of the distance between first and third quartile. Mean Absolute deviation: Average of the absolute deviation of values from “a”, ”a” is the central value. Coefficient of variation: How large the Standard Deviation is in relation to mean, defined as (standard deviation/mean)X 100 Measure of Dispersion (how "spread out" a group of scores is)
  • 15. Distribution Shape  Skewness : It is the measure of asymmetry of a frequency distribution. Zero indicates perfect symmetry; the normal distribution has a skewness of zero. Positive skewness indicates that the "tail" of the distribution is more stretched on the side above the mean. Negative skewness indicates that the tail of the distribution is more stretched on the side below the mean." Mean=Median=Mode Mode<Median<Mean Mean<Median<Mode Skewness = 3(mean-median)/Standard deviation , lie between the interval [-3,3] Skewness = (Q3 –Q1-2 median)/(Q3-Q1)
  • 16.  Kurtosis: It is the measure of flatness or peakedness AND measured by the following equation The normal distribution has a kurtosis of 3 (MESOKURTIC). Positive kurtosis (>3) indicates a relatively peaked distribution (LEPTOKURTIC). Negative kurtosis (<3) indicates a relatively flat distribution (PLAYTKURTIC). Another measure of kurtosis is Distribution Shape Platykurtic LeptokurticMesokurtic
  • 17. Probability Distribution (How probabilities are distributed over the values of Random Variable) Probability is represented by a real number in the range from 0 to 1. An Impossible event has a probability of 0 and a certain event has a probability of 1. i.e. Probability refers to chance or likelihood of a particular event take place. Defined as (number of favorable outcomes/the possible outcomes)
  • 18. Discrete Distribution (Binomial ,Poisson…)  Binomial Where a variable follows Binomial Distribution?  Failure and success in exam  Determination of probabilities of occurrence of certain combination of head & tail  Defective and non-defective items from a manufacturing process Probability Mass Function of Binomial: where p(x)= probability of x success in n trials n= number of trials p= probability of success in one trial  Mean=np ,Variance=npq , Standard Deviation=√(npq)  if p=0.5, skewness 0  if p<0.5, positively skewed,  if p>0.5, negatively skewed
  • 19.  Poisson A binomial distribution will become a Poisson Distribution if number of trial n->infinite, probability of success p->0, but np=λ is finite. Probability Mass Function of Poisson:  Mean=Variance=λ  Positively skewed, coefficient of skewness= 1/√λ  Leptokurtic as ϒ2=1/λ which is > 0 Real life example:  Number of printing mistake per page in large text  Number of telephone call in per unit interval of time  Number of death from rare disease. Discrete Distribution (Binomial ,Poisson…)
  • 20. The normal distribution is the most important and most widely used distribution in statistics. It is sometimes called the "bell curve.“ Probability Density Function of Normal distribution: Seven features of normal distributions are listed below. These features are illustrated in more detail in the remaining sections of this chapter.  Normal distributions are symmetric around their mean.  The mean, median, and mode of a normal distribution are equal.  The area under the normal curve is equal to 1.0.  Normal distributions are defined by two parameters, the mean (μ) and the standard deviation (σ).  68% of the area of a normal distribution is within one standard deviation of the mean.  Approximately 95% of the area of a normal distribution is within two standard deviations of the mean. Continuous Distribution (Normal…..)
  • 21. Sampling Distribution What is sampling distribution?  A sampling distribution is created by sampling.  Sampling distribution is defined as the frequency distribution of the statistic for many samples.  If it is the distribution of means and is also called the sampling distribution of the mean.
  • 22. Sampling distribution of Mean  Unbiased  Variance of sampling distribution of means based on N observation :  Large Samples produce sample estimates very close to the parameter.  Independent Random samples be drawn from each two Normal Population , then the sampling distribution of difference between two sample means will be normally distributed.
  • 23. Difference between non parametric and parametric Statistics Parametric statistics – inferential test that assumes certain characteristics are true of an underlying population, especially the shape of its distribution. Commonly used for normally distributed interval or ratio dependent variables. Non-parametric statistics – inferential test that makes few or no assumptions about the population from which observations were drawn (distribution-free tests). There is generally at least one non-parametric equivalent test for each type of parametric test. Non-parametric statistics are less powerful that parametric tests. Non-parametric tests are generally used when assumptions about the underlying population are questionable (e.g., non- normality).Commonly used to analyse DVs that are non-normal or are nominal or ordinal.
  • 24. Statistical Inference Inference •Use a sample to learn something about a population. Hypothesis Testing •A hypothesis is an assertion about the population. •A statistical method that uses sample data to evaluate a hypothesis about a population parameter. •We answer a question such as, “If the hypothesis were true, would it be unlikely to get data such as we obtained?” Why Test? •Statistics is an experimental science, not really a branch of mathematics. •It’s a tool that can tell you whether data are accidentally or really similar. •It does not give you certainty.
  • 25. Statistical Inference Five Parts of a Test • Assumptions about type of data (quantitative, categorical), population distribution (e.g., normal, binomial) •Hypotheses: Null hypothesis(H0): A statement indicating “no effect” Alternative hypothesis(Ha): A statement indicating “an effect” •Test Statistic: A function of data to measure discrepancy between the null and alternative hypotheses. •P-value (p): A measure of evidence about the null hypothesis H0. The smaller the P-value, the stronger the evidence against H0. •Conclusion: Select a significance level (such as 0.05 or 0.01) and reject H0if P-value ≤ significance level. Otherwise, we fail to reject H0, i.e. H0is not necessarily true, but it is plausible.
  • 26. Hypothesis Testing P- value approach The P-value approach involves determining "likely" or "unlikely" by determining the probability — assuming the null hypothesis were true. If the P-value is small, say less than (or equal to) α, then it is "unlikely." And, if the P-value is large, say more than α, then it is "likely.“ Four Steps involve in P- value approach: 1> specify null and alternative hypothesis. 2> Calculate the value of test statistic using sample data and assuming null hypothesis is true. 3> Using the known distribution of the test statistic, calculate the P-value: i.e. "If the null hypothesis is true, what is the probability that we'd observe a more extreme test statistic in the direction of the alternative hypothesis than we did?“ 4> Set the significance level, α, the probability of making a Type I error to be small — 0.01, 0.05, or 0.10. Compare the P- value to α.
  • 27. An Example :- Suppose we want to know if the new drug had an influence on IQ. Null hypothesis – The average IQ of the people that uses the drug is 100 Alternative hypothesis – The average IQ of a the people that uses the drug is not 100 We need the following data in order to perform a z test: a) Population mean b) hypothesis mean c) sample size d)sample mean e) population standard deviation Then you will calculate something called the one-sample z test statistic, like this: Let us our Z- statistic value = 2.17
  • 28. Example: The z statistic assumes a normal probability distribution, so we would find the P-value like this: The area in red is 0.015 + 0.015 = 0.030, 3 percent. If we had chosen a significance level of 5 percent, this would mean that we had achieved statistical significance. We would reject the null hypothesis in favour of the alternative hypothesis. We conclude that we had evidence that the drug caused the average IQ to deviate from 100 IQ points. 0.025 0.025
  • 29. Single Sample problems: students t test If the data is available on a single variable, then an appropriate test is t test provided the following assumptions are satisfied: •The data is quantitative in nature •Normality is satisfied •Observations are independent
  • 30. Examples: •The cereal packs are sold in market in packets of 300 grams. A sample of 10 packs revealed the following weights: 296,298,300.1,299,302,290,289,297,296,299.5 •The objective is to know whether the average weight is actually maintained at 300 grams. •Null Hypothesis: Mean weight is 300 grams •Alternative Hypothesis: Mean weight is lower than 300 grams •The data is quantitative and hence student t test is appropriate. •P value 0.016 •Therefore, the data strongly support that the mean weight is not maintained at 300 grams.
  • 31. Two Sample Problems: An example •Consider the data on Cadmium level on persons categorized as smokers and non-smokers Smokers: 10, 8.4, 3.5, 8.9, 9.0, 8.8,7.9 Non-Smokers : 3.1,3.5, 4.5, 4.3, 2.2, 2.7 •A Query: Is mean Cadmium level higher among the smokers? •Fisher's t-test is appropriate •Null Hypothesis: Mean Cadmium level s are equal in two groups •Alternative Hypothesis: Mean Cadmium level is higher among the smokers •p-value = 0.0003045 •Conclusion: P value<.05 implies the rejection of the null hypothesis.
  • 32. Example Suppose scores of 6 students given special coaching are as follows: Before: 8,3,4,5,6,7 After: 9,6,5,5,6,8 Query: Is the special coaching beneficial? Observe: Experimental units( i.e. students) are the same; only situations are different. The assumption of independence is violated. Paired t test is appropriate instead of Fisher’s t test. • Null Hypothesis: Mean scores are the same in two situations • Alternative Hypothesis: Mean score increases after the special coaching. Result: t = -2.2361, df= 5, p-value = 0.03779 What will the conclusion here???
  • 33. Inference in Categorical data: Sometimes both the independent and dependent variables are categorical. (e.g. Treatment (Drug/placebo) versus survival (alive/dead) or Smoke (Y/N) versus lung cancer (Y/N)) In these situations, we usually count the number (or proportion) of patients or subjects that fall into each possible category. For categorical data, the statistical tests we commonly use •Test for proportions •chi-square test
  • 34. An Example Suppose that 39 out of 80 people contacted in a survey of city residents oppose a new tax. Test whether the data is consistent with the hypothesis that people accept new taxes. For the data, number of successes x=39 •Number of data points n=80 •Null hypothesis: theoretical proportion of people accepting new tax is 50% against it is higher • P value 0.4875 Conclusion: Since p value is higher than .05, we failed to reject or we accept the null hypothesis that people accept new taxes.
  • 35. Another Example: Consider the following data on smoking and drinking habits of 500 individuals: Query: Whether smoking and drinking habits are associated. Heavy Smoker Moderate Smoker Non-Smoker Heavy Drinker 20 62 6 Moderate Drinker 40 8 159 Non Drinker 10 20 175 Null Hypothesis : Smoking and Drinking are independent. Alternative : They are not associated. CHI-Square Test
  • 36. ANOVA • An extension of Fisher’s t test • Number of groups is more than two One Way Data (Variation in one direction only(one way)) Categories Dose 1 Dose 2 Dose 3 Sources of variation in the data are A. Variation within each dose group B. Variation between dose groups Main Question: Do the responses depend on which dose group the subject is in? The higher is the variation between the dose groups the higher is the influence of dose groups. wi th in between
  • 38. Two Way ANOVA Two-way ANOVA is a type of study design with one numerical outcome variable and two categorical explanatory variables. Example: Consider yield figures of three paddy varieties when three Pesticides are used. •Pesticides & Variety are both categorical •The data is two way. Variety Pesticides A B C 1 33 99 67 2 56 87 65 3 78 89 77 Two hypotheses of interest: • Whether there is any effect of the pesticides . • Whether there is any effect of the paddy varieties. Analysis of Variance Table Df Sum Sq Mean Sq F value Pr(>F) variety 2 2225.17 1112.58 44.063 0.000259 pesticide 3 1191.00 397.00 15.723 0.003008 Residuals 6 151.50 25.25 Conclusion: P value(.000259 & .003008) < .05, hence reject the null hypothesis i.e. Varieties and pesticides are both significantly effective.
  • 39. Summary of single Sample Test: Which Variable are we looking for? Qualitative Frequency Chi Square/ Test of Proportion Quantitative Student t test
  • 40. Summary of Two sample Test: Which Variable are we looking for? Qualitative X Qualitative Chi Square test Qualitative X Quantitative ANOVA Quantitative X Quantitative Paired t test/Fisher t test
  • 41. ANCOVA •A covariate Independent Variable is added to an ANOVA (can be dichotomous or metric) •Effect of the covariate on the Dependent Variable is removed . •Of interest are: Main effects of IVs and interaction terms Contribution of CV (akin to Step 1 in HMLR) •e.g., GPA is used as a CV, when analysing whether there is a difference in Educational Satisfaction between Males and Females. •Reduces variance associated with covariate (CV) from the DV error (unexplained variance) term •Increases power of F-test •May not be able to achieve experimental control over a variable (e.g., randomisation), but can measure it and statistically control for its effect.
  • 42. Assumption of ANCOVA •Normality •Homogeneity of Variance •Independence of observations •Independence of IV and CV •Multicollinearity - if more than one CV, they should not be highly correlated - eliminate highly correlated CVs •Reliability of CVs - not measured with error - only use reliable CVs