SlideShare a Scribd company logo
1 of 76
Assistant Professor,
Anthropology & Tribal Development,
Guru Ghasidas Vishwavidyalaya
(A Central University),C.G, India
E-mail: dsubalvu@gmail.com
Mobile: 9753117388
Statistics presents a rigorous scientific method for gaining insight
into data. For example, suppose we measure the weight of 100
patients in a study. With so many measurements, simply looking at
the data fails to provide an informative account. However statistics
can give an instant overall picture of data based on graphical
presentation or numerical summarization irrespective to the number
of data points. Besides data summarization, another important task
of statistics is to make inference and predict relations of variables
Definition: Science of collection, presentation,
analysis, and reasonable interpretation of data.
Basics of Statistics
A Taxonomy of Statistics
TYPES OF STATISTICS:
• Descriptive statistics
• Inferential statistics
DESCRIPTIVE STATISTICS:
• Descriptive statistics is a discipline of quantitatively describing the main features of a
collection of data, or the quantitative description of itself. Descriptive statistics are used to
summarize, organize and simplify data. Descriptive statistics are techniques that take raw
scores and organize or summarize them in a form that is more manageable. Often the scores
are organized ina table or a graph so that it is possible to see the entire set of scores.
Descriptive statistics are not developed on the basis of probability theory. Different measures
are used to describe descriptive statistics.
INFERENTIAL STATISITCS:
• Inferential statistics Consists of techniques that allow us to study samples and then make
generalizations about the population from which they were selected.
• It is usually not possible to measure everyone in the population. Because population is
typically very large, a sample is selected form population which represents population. So by
analyzing the results from sample, we hope to make general statements about population
• Statistics describes a numeric set of data by its
• Center
• Variability
• Shape
• Statistics describes a categorical set of data by
• Frequency, percentage or proportion of each category
Statistical Description of Data
Variable
• A variable is any kind of attribute or characteristic that you are trying to
measure, manipulate and control in statistics and research. All studies
analyze a variable, which can describe a person, place, thing or idea. A
variable's value can change between groups or over time.
Researchers organize variables into a variety of categories, the most common
of which include:
• Independent variables
• Dependent variables
• Quantitative variables
• Qualitative variables
• Intervening variables
• Moderating variables
• Extraneous variables
• Confounding variables
• Control variables
• Composite variables
Independent Vs. dependent variables
Quantitative vs. qualitative variables
• Researchers can further
categorize Quantitative variables into two
types:
• Discrete: Any numerical variables you can
realistically count, such as the coins in your
wallet or the money in your savings account.
• Continuous: Numerical variables that you
could never finish counting, such as time.
• Researchers can further categorize Qualitative, or
categorical, variables into three types:
• Binary: Variables with only two categories, such
as male or female, red or blue.
• Nominal: Variables you can organize in more than
two categories that do not follow a particular
order. Take, for example, housing types: Single-
family home, condominium, tiny home.
• Ordinal: Variables you can organize in more than
two categories that follow a particular order.
Take, for example, level of satisfaction:
Unsatisfied, neutral, satisfied.
Intervening Vs. Moderating variables
Extraneous Vs. Confounding variables
Scales of Measurements
• Nominal - Categorical variables with no inherent order or ranking
sequence such as names or classes (e.g., gender). Value may be a
numerical, but without numerical value (e.g., I, II, III). The only operation
that can be applied to Nominal variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate,
severe. Can be compared for equality, or greater or less, but not how
much greater or less.
• Interval - Values of the variable are ordered as in Ordinal, and
additionally, differences between values are meaningful, however, the
scale is not absolutely anchored. Calendar dates and temperatures on the
Fahrenheit scale are examples. Addition and subtraction, but not
multiplication and division are meaningful operations.
• Ratio - Variables with all properties of Interval plus an absolute, non-
arbitrary zero point, e.g. age, weight, temperature (Kelvin). Addition,
subtraction, multiplication, and division are all meaningful operations.
• Distribution - (of a variable) tells us what values the variable takes and
how often it takes these values.
 Unimodal - having a single peak
 Bimodal - having two distinct peaks
 Symmetric - left and right half are mirror images.
• What is the highest
mark?
• What is the lowest
mark?
• What is the average
marks?
• How the students
performed?
Central
tendency
Dispersion
Central tendency:
• An average value within the range of
entire data that is used to represent all
the values of the series.
• Such indexes are called measures of
central tendency.
Central Tendency
Mean
Arithmetic
Geometric
Harmonic
Median Mode
Arithmetic Mean is the most widely used measure
of
central tendency.
Mean is more commonly referred to as
average. Mean is the average of a set of data.
To calculate the mean, find the sum of the data
and then divide by the number of data.
• Best applied in normally distributed
continuous data.
Nominal
(N)
Ordinal
(O)
Interval
(I)
Ratio
(R)
12, 15, 11, 11, 7, 13
First, find the sum of the data.
12 + 15 +11 + 11 + 7 + 13 = 69
Then divide by the number of data.
69 / 6 = 11.5
From grouped data
Age (years) = x No. of students = f x f
16 35 560
17 31 527
18 20 360
19 14 266
Ʃ f = 100 Ʃ x f = 1713
Arithmetic mean = 1713 / 100 = 17.13
Merits:
Easy to compute
Affected by each item
Demerits:
 Very small / large items affect - outlier
Outlier:
• One value is at least 1.5 IQR (inter
quartile range) below the first
quartile (Q1) or
• At least 1.5 IQR above the third
quartile (Q3)
Geometric mean:
• Fractional or reciprocal values
Harmonic mean:
• Averaging ratio in two
different units
Both of them less affected by
outlier
Median:
• Median is the middle number in a set of data when
the data is arranged in either ascending or descending
order.
• Divides the distribution in two equal parts
• One half is lower and other half is greater than that
value
• Robust measure
First, arrange the data in
numerical order.
7, 11, 11, 12, 13, 14, 15
Median = 12
12, 15, 11, 11, 7, 13, 14
12, 15, 11, 11, 7, 13
First, arrange the data in numerical
order.
7, 11, 11, 12, 13, 15
Then find the number in the middle or the
average of the two numbers in the middle.
11 + 12 = 23 23 / 2 = 11.5
Median value for odd number of
observations:
• Median for odd number of
observations = [(n + 1) / 2]th value;
while n is total number of observations.
• If n = 9, so (9+1)/2 = 5th value
Median value for even number of
observations:
• Average of (n / 2)th and [(n / 2) + 1]th
value in the series
• If n = 10, then average of 5th and 6th
value will be the median
Nominal
(N)
Ordinal
(O)
Interval
(I)
Ratio
(R)
Merits:
• Used in outlier
• Calculated in incomplete
longitudinal data, unlike the mean.
Demerits:
• Has to be arranged
Mode:
• The mode is the number that occurs the most.
Advantage – Like median, it is also not
affected by extreme values.
Disadvantage – Exact location is uncertain and
not clearly defined.
• Mode is rarely used and used in determining
peak of disease in case of an epidemic or
outbreak.
12, 15, 11, 11, 7, 13
The mode is 11.
Sometimes a set of data will have more than
one mode.
For example, in the following set the
numbers both the numbers 5 and 7 appear
twice.
2, 9, 5, 7, 8, 6, 4, 7, 5
5 and 7 are both the mode and this set is said
to be bimodal.
Sometimes there is no mode in a set of data.
3, 8, 7, 6, 12, 11, 2, 1
All the numbers in this set occur only once
therefore there is no mode in this set.
Nominal
(N)
Ordinal
(O)
Interval
(I)
Ratio
(R)
Mean The average
Median
The
average
number or
of the
numbers in the middle
Mode The number that
occurs most
Calculation of mean, median and mode
from excel: Afterwards
MEASURE OFDISPERSION
Range
Quartiles
Standard Deviation
Coefficient of Variation
Variance
R
V
Red Orange Yellow Green Blue Indigo Violet
ROYG BIV
Range
▪ Simplest measure of dispersion
▪ Difference between the largest and the smallest values:
Range = Xlargest – Xsmallest
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13
Range = 13 - 1 = 12
Measures of Dispersion:
Why The Range Can Be Misleading
• Ignores the way in which data are distributed
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
▪ Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11
12
Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Inter-quartile range (IQR) / Quartile deviation (Q):
• It ranges between Q3 and Q1. So, IQR = Q3 – Q1.
• The IQR is Q3 – Q1 and measures the
spread in the middle 50% of the
data
• Measures like Q1, Q3, and IQR
are not influenced by outliers
Box & Whisker Plot
Diagram
Xsmalles
t
Q
1
Media
n
Q
3
Xlarges
t
Standard deviation:
• Standard deviation is the most frequently used
measure
of dispersion of data.
• It is defined as ‘Root-Means-Square-Deviation’.
• It is denoted by Greek letter σ (sigma) or by initials
SD or S.
• The variable and its standard deviation are expressed
in similar units.
Applications:
N =100 students, mean weight = 62 kg, SD = 5 kg
Weight is normally distributed:
68.3% = Mean ± 1 SD = (62 ± 1 X 5) = 57 to 67 kg
95.4% = Mean ± 2 SD = (62 ± 2 X 5) = 52 to 72 kg
99.7% = Mean ± 3 SD = (62 ± 3 X 5) = 47 to 77 kg
Coefficient of variation:
• Coefficient of variation is used to compare the
relative variability or spread.
• It can measure variation of same variable in two
or more different series having different
magnitude of values (weight in children in two
different sections)
OR
• two different variables in the same group (weight
& height of children in same class).
Variance
• Variance is principally used in the
calculation of
standard deviation.
• Variance (V) is (Standard
Deviation)2, often symbolized by s2,
i.e., V = (SD)2.
Hands on
• t-tests and Chi-square tests
Both t-tests and chi-square tests are statistical
tests, designed to test, and possibly reject, a null
hypothesis. The null hypothesis is usually a
statement that something is zero, or that
something does not exist. For example, you could
test the hypothesis that the difference between
two means is zero, or you could test the
hypothesis that there is no relationship between
two variables.
Test of Inference
• Null Hypothesis Tested
A t-test tests a null hypothesis about two means; most often, it tests the
hypothesis that two means are equal, or that the difference between
them is zero. For example, we could test whether boys and girls in fourth
grade have the same average height.
A chi-square test tests a null hypothesis about the relationship between
two variables. For example, you could test the hypothesis that men and
women are equally likely to vote "Democratic," "Republican," "Other" or
"not at all."
Types of Data
A t-test requires two variables; one must be categorical and have exactly
two levels, and the other must be quantitative and be estimable by a
mean. For example, the two groups could be Republicans and Democrats,
and the quantitative variable could be age. A chi-square test requires
categorical variables, usually only two, but each may have any number of
levels. For example, the variables could be ethnic group — White, Black,
Asian, American Indian/Alaskan native, Native Hawaiian/Pacific Islander,
other, multiracial; and presidential choice in 2008 — (Obama, McCain,
other, did not vote).
T-test
• A t-test is used to compare the mean of two given samples.
Like a z-test, a t-test also assumes a normal distribution of the
sample. A t-test is used when the population parameters (mean and
standard deviation) are not known.
There are three versions of t-test
1. Independent samples t-test which compares mean for two groups
2. Paired sample t-test which compares means from the same group at
different times
3. One sample t-test which tests the mean of a single group against a
known mean. The statistic for this hypothesis testing is called t-
statistic, the score for which is calculated as
t = (x1 — x2) / (σ / √n1 + σ / √n2), where
x1 = mean of sample 1
x2 = mean of sample 2
n1 = size of sample 1
n2 = size of sample 2
• There are multiple variations of t-test which
are explained in detail here T Test (Student's
T-Test): Definition and Examples Contents: The
t test (also called Student's T Test) compares
two averages ( means) and tells you if they are
different…
Chi-Square Test
Chi-square test is used to compare categorical
variables. There are two type of chi-square test
1. Goodness of fit test, which determines if a sample
matches the population.
2. A chi-square fit test for two independent variables is
used to compare two variables in a contingency table
to check if the data fits.
a. A small chi-square value means that data fits
b. b. A high chi-square value means that data doesn’t fit.
The hypothesis being tested for chi-square is Null:
Variable A and Variable B are independent
• Alternate: Variable A and Variable B are not independent.
The statistic used to measure significance, in this case, is
called chi-square statistic. The formula used for calculating
the statistic is
• Χ2 = Σ [ (Or,c — Er,c)2 / Er,c ] where
Or,c = observed frequency count at level r of Variable A and
level c of Variable B
Er,c = expected frequency count at level r of Variable A and
level c of Variable B
Note: As one can see from the above examples, in all the
tests a statistic is being compared with a critical value to
accept or reject a hypothesis. However, the statistic and
way to calculate it differ depending on the type of variable,
the number of samples being analyzed and if the
population parameters are known. Thus depending upon
such factors a suitable test and null hypothesis is chosen.
It’s a process for establishing the relationships between two variables. It is
plot on a “scatter plot”. Correlation is the most commonly used
Correlation coefficient
Methods (measure) of correlation summarize the relationship
between two variables in a single number called the
correlation coefficient.
The correlation coefficient is usually represented using the
symbol r, and it ranges from -1 to +1.
• A correlation coefficient quite close to 0, but either positive or
negative, implies little or no relationship between the two
variables.
• A correlation coefficient close to plus 1 means a positive
relationship between the two variables, with increases in one
of the variables being associated with increases in the other
variable.
Correlation
• A correlation coefficient close to -1 indicates a
negative relationship between two variables, with
an increase in one of the variables being
associated with a decrease in the other variable.
• A correlation coefficient can be produced for
ordinal, interval or ratio level variables, but has
little meaning for nominal.
• For ordinal scales, the correlation coefficient can
be calculated by using Spearman’s rho.
• For interval or ratio level scales, the most
commonly used correlation coefficient is
Pearson’s r, ordinarily referred to as simply the
correlation coefficient.
ANOVA, also known as analysis of variance, is used to compare multiple (three or
more) samples with a single test. There are 2 major flavors of ANOVA
1. One-way ANOVA: It is used to compare the difference between the three or more
samples/groups of a single independent variable.
2. MANOVA: MANOVA allows us to test the effect of one or more independent
variable on two or more dependent variables. In addition, MANOVA can also
detect the difference in co-relation between dependent variables given the groups
of independent variables
• The hypothesis being tested in ANOVA is
• Null: All pairs of samples are same i.e. all sample means are equal Alternate: At
least one pair of samples is significantly different The statistics used to measure
the significance, in this case, is called F-statistics. The F value is calculated using
the formula
• F= (SSE1 — SSE2)/m)/ SSE2/n-k, where
SSE = residual sum of squares
m = number of restrictions
k = number of independent variables
There are multiple tools available such as SPSS, R packages, Excel etc. to carry out
ANOVA on a given sample.
ANOVA
• Skewness is usually described as a measure of
a dataset’s symmetry – or lack of symmetry. A
perfectly symmetrical data set will have a
skewness of 0. The normal distribution has a
skewness of 0.
The skewness is defined as (Advanced Topics in
Statistical Process Control, Dr. Donald
Wheeler, www.spcpress.com):
SKEWNESS
So, when is the skewness too much? The rule of thumb seems to
be: • If the skewness is between -0.5 and 0.5, the data are fairly
symmetrical • If the skewness is between -1 and – 0.5 or
between 0.5 and 1, the data are moderately skewed • If the
skewness is less than -1 or greater than 1, the data are highly
skewed
How to define kurtosis? This is really the reason this
article was updated. If you search for definitions of
kurtosis, you will see some definitions that includes the
word “peakedness” or other similar terms. For
example,
• “Kurtosis is the degree of peakedness of a distribution”
– Wolfram MathWorld
• “We use kurtosis as a measure of peakedness (or
flatness)” – Real Statistics Using Excel
“Kurtosis tells you virtually nothing about the shape of
the peak – its only unambiguous interpretation is in
terms of tail extremity.”
KURTOSIS
Figure 5 is shows a dataset with more weight in the tails. The kurtosis of this dataset is
1.86.
• A citation appears in the main text of the paper. It is a way of giving credit to the
information that you have specifically mentioned in your research paper by leading the
reader to the original source of information. You will need to use citation in research
papers whenever you are using information to elaborate a particular concept in the
paper, either in the introduction or discussion sections or as a way to support your
research findings in the results section.
• A reference is a detailed description of the source of information that you want to give
credit to via a citation. The references in research papers are usually in the form of a list
at the end of the paper. The essential difference between citations and references is
that citations lead a reader to the source of information, while references provide the
reader with detailed information regarding that particular source.
• A bibliography in research paper is a list of sources that appears at the end of a research
paper or an article, and contains information that may or may not be directly
mentioned in the research paper. The difference between reference and bibliography in
research is that an individual source in the list of references can be linked to an in-text
citation, while an individual source in the bibliography may not necessarily be linked to
an in-text citation.
Citation :
Reference :
Bibliography :
Citations References Bibliography
Purpose
To lead a reader
toward a source of
information
included in the text
To elaborate on of a
particular source of
information cited in
the research paper
To provide a list of
all relevant sources
of information on
the research topic
Placement In the main text
At the end of the
text; necessarily
linked to an in-text
citation
At the end of the
text; not necessarily
linked to an in-text
citation
Information
Minimal; denoting
only the essential
components of the
source, such as
numbering, names
of the first and last
authors, etc.
Descriptive; gives
complete details
about a particular
source that can be
used to find and
read the original
paper if needed
Descriptive; gives all
the information
regarding a
particular source for
those who want to
refer to it
THANK YOU

More Related Content

Similar to STATISTICS.pptx for the scholars and students

Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.pptssuserf0d95a
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptxCallplanetsDeveloper
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
Introduction to statistics RSS6 2014
Introduction to statistics RSS6 2014Introduction to statistics RSS6 2014
Introduction to statistics RSS6 2014RSS6
 
Stats !.pdf
Stats !.pdfStats !.pdf
Stats !.pdfphweb
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)AhmedToheed3
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptxVanmala Buchke
 
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersChapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersInternational advisers
 
measures of central tendency.pptx
measures of central tendency.pptxmeasures of central tendency.pptx
measures of central tendency.pptxManish Agarwal
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy Pranav Krishna
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxPETTIROSETALISIC
 
Statistics for Medical students
Statistics for Medical studentsStatistics for Medical students
Statistics for Medical studentsANUSWARUM
 

Similar to STATISTICS.pptx for the scholars and students (20)

Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
 
Statistics
StatisticsStatistics
Statistics
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
Introduction to statistics RSS6 2014
Introduction to statistics RSS6 2014Introduction to statistics RSS6 2014
Introduction to statistics RSS6 2014
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
Stats !.pdf
Stats !.pdfStats !.pdf
Stats !.pdf
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
Descriptive
DescriptiveDescriptive
Descriptive
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index NumbersChapter 12 Data Analysis Descriptive Methods and Index Numbers
Chapter 12 Data Analysis Descriptive Methods and Index Numbers
 
Statistics
StatisticsStatistics
Statistics
 
measures of central tendency.pptx
measures of central tendency.pptxmeasures of central tendency.pptx
measures of central tendency.pptx
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
 
Data collection
Data collectionData collection
Data collection
 
Statistics for Medical students
Statistics for Medical studentsStatistics for Medical students
Statistics for Medical students
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

STATISTICS.pptx for the scholars and students

  • 1. Assistant Professor, Anthropology & Tribal Development, Guru Ghasidas Vishwavidyalaya (A Central University),C.G, India E-mail: dsubalvu@gmail.com Mobile: 9753117388
  • 2. Statistics presents a rigorous scientific method for gaining insight into data. For example, suppose we measure the weight of 100 patients in a study. With so many measurements, simply looking at the data fails to provide an informative account. However statistics can give an instant overall picture of data based on graphical presentation or numerical summarization irrespective to the number of data points. Besides data summarization, another important task of statistics is to make inference and predict relations of variables Definition: Science of collection, presentation, analysis, and reasonable interpretation of data. Basics of Statistics
  • 3. A Taxonomy of Statistics
  • 4. TYPES OF STATISTICS: • Descriptive statistics • Inferential statistics DESCRIPTIVE STATISTICS: • Descriptive statistics is a discipline of quantitatively describing the main features of a collection of data, or the quantitative description of itself. Descriptive statistics are used to summarize, organize and simplify data. Descriptive statistics are techniques that take raw scores and organize or summarize them in a form that is more manageable. Often the scores are organized ina table or a graph so that it is possible to see the entire set of scores. Descriptive statistics are not developed on the basis of probability theory. Different measures are used to describe descriptive statistics. INFERENTIAL STATISITCS: • Inferential statistics Consists of techniques that allow us to study samples and then make generalizations about the population from which they were selected. • It is usually not possible to measure everyone in the population. Because population is typically very large, a sample is selected form population which represents population. So by analyzing the results from sample, we hope to make general statements about population
  • 5. • Statistics describes a numeric set of data by its • Center • Variability • Shape • Statistics describes a categorical set of data by • Frequency, percentage or proportion of each category Statistical Description of Data
  • 6. Variable • A variable is any kind of attribute or characteristic that you are trying to measure, manipulate and control in statistics and research. All studies analyze a variable, which can describe a person, place, thing or idea. A variable's value can change between groups or over time. Researchers organize variables into a variety of categories, the most common of which include: • Independent variables • Dependent variables • Quantitative variables • Qualitative variables • Intervening variables • Moderating variables • Extraneous variables • Confounding variables • Control variables • Composite variables
  • 9. • Researchers can further categorize Quantitative variables into two types: • Discrete: Any numerical variables you can realistically count, such as the coins in your wallet or the money in your savings account. • Continuous: Numerical variables that you could never finish counting, such as time.
  • 10. • Researchers can further categorize Qualitative, or categorical, variables into three types: • Binary: Variables with only two categories, such as male or female, red or blue. • Nominal: Variables you can organize in more than two categories that do not follow a particular order. Take, for example, housing types: Single- family home, condominium, tiny home. • Ordinal: Variables you can organize in more than two categories that follow a particular order. Take, for example, level of satisfaction: Unsatisfied, neutral, satisfied.
  • 13.
  • 14. Scales of Measurements • Nominal - Categorical variables with no inherent order or ranking sequence such as names or classes (e.g., gender). Value may be a numerical, but without numerical value (e.g., I, II, III). The only operation that can be applied to Nominal variables is enumeration. • Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can be compared for equality, or greater or less, but not how much greater or less. • Interval - Values of the variable are ordered as in Ordinal, and additionally, differences between values are meaningful, however, the scale is not absolutely anchored. Calendar dates and temperatures on the Fahrenheit scale are examples. Addition and subtraction, but not multiplication and division are meaningful operations. • Ratio - Variables with all properties of Interval plus an absolute, non- arbitrary zero point, e.g. age, weight, temperature (Kelvin). Addition, subtraction, multiplication, and division are all meaningful operations. • Distribution - (of a variable) tells us what values the variable takes and how often it takes these values.  Unimodal - having a single peak  Bimodal - having two distinct peaks  Symmetric - left and right half are mirror images.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. • What is the highest mark? • What is the lowest mark? • What is the average marks? • How the students performed? Central tendency Dispersion
  • 20. Central tendency: • An average value within the range of entire data that is used to represent all the values of the series. • Such indexes are called measures of central tendency.
  • 22. Arithmetic Mean is the most widely used measure of central tendency. Mean is more commonly referred to as average. Mean is the average of a set of data. To calculate the mean, find the sum of the data and then divide by the number of data.
  • 23. • Best applied in normally distributed continuous data. Nominal (N) Ordinal (O) Interval (I) Ratio (R)
  • 24. 12, 15, 11, 11, 7, 13 First, find the sum of the data. 12 + 15 +11 + 11 + 7 + 13 = 69 Then divide by the number of data. 69 / 6 = 11.5
  • 25. From grouped data Age (years) = x No. of students = f x f 16 35 560 17 31 527 18 20 360 19 14 266 Ʃ f = 100 Ʃ x f = 1713 Arithmetic mean = 1713 / 100 = 17.13
  • 26. Merits: Easy to compute Affected by each item Demerits:  Very small / large items affect - outlier
  • 27. Outlier: • One value is at least 1.5 IQR (inter quartile range) below the first quartile (Q1) or • At least 1.5 IQR above the third quartile (Q3)
  • 28. Geometric mean: • Fractional or reciprocal values Harmonic mean: • Averaging ratio in two different units Both of them less affected by outlier
  • 29. Median: • Median is the middle number in a set of data when the data is arranged in either ascending or descending order. • Divides the distribution in two equal parts • One half is lower and other half is greater than that value • Robust measure
  • 30. First, arrange the data in numerical order. 7, 11, 11, 12, 13, 14, 15 Median = 12 12, 15, 11, 11, 7, 13, 14
  • 31. 12, 15, 11, 11, 7, 13 First, arrange the data in numerical order. 7, 11, 11, 12, 13, 15 Then find the number in the middle or the average of the two numbers in the middle. 11 + 12 = 23 23 / 2 = 11.5
  • 32. Median value for odd number of observations: • Median for odd number of observations = [(n + 1) / 2]th value; while n is total number of observations. • If n = 9, so (9+1)/2 = 5th value
  • 33. Median value for even number of observations: • Average of (n / 2)th and [(n / 2) + 1]th value in the series • If n = 10, then average of 5th and 6th value will be the median
  • 35. Merits: • Used in outlier • Calculated in incomplete longitudinal data, unlike the mean. Demerits: • Has to be arranged
  • 36. Mode: • The mode is the number that occurs the most. Advantage – Like median, it is also not affected by extreme values. Disadvantage – Exact location is uncertain and not clearly defined. • Mode is rarely used and used in determining peak of disease in case of an epidemic or outbreak.
  • 37. 12, 15, 11, 11, 7, 13 The mode is 11.
  • 38. Sometimes a set of data will have more than one mode. For example, in the following set the numbers both the numbers 5 and 7 appear twice. 2, 9, 5, 7, 8, 6, 4, 7, 5 5 and 7 are both the mode and this set is said to be bimodal.
  • 39. Sometimes there is no mode in a set of data. 3, 8, 7, 6, 12, 11, 2, 1 All the numbers in this set occur only once therefore there is no mode in this set.
  • 41. Mean The average Median The average number or of the numbers in the middle Mode The number that occurs most Calculation of mean, median and mode from excel: Afterwards
  • 42.
  • 43.
  • 45. R V Red Orange Yellow Green Blue Indigo Violet ROYG BIV
  • 46. Range ▪ Simplest measure of dispersion ▪ Difference between the largest and the smallest values: Range = Xlargest – Xsmallest Example: 1 2 3 4 5 6 7 8 9 10 11 12 13 Range = 13 - 1 = 12
  • 47. Measures of Dispersion: Why The Range Can Be Misleading • Ignores the way in which data are distributed 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 ▪ Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
  • 48. Inter-quartile range (IQR) / Quartile deviation (Q): • It ranges between Q3 and Q1. So, IQR = Q3 – Q1.
  • 49. • The IQR is Q3 – Q1 and measures the spread in the middle 50% of the data • Measures like Q1, Q3, and IQR are not influenced by outliers
  • 50. Box & Whisker Plot Diagram Xsmalles t Q 1 Media n Q 3 Xlarges t
  • 51. Standard deviation: • Standard deviation is the most frequently used measure of dispersion of data. • It is defined as ‘Root-Means-Square-Deviation’. • It is denoted by Greek letter σ (sigma) or by initials SD or S. • The variable and its standard deviation are expressed in similar units.
  • 53. N =100 students, mean weight = 62 kg, SD = 5 kg Weight is normally distributed: 68.3% = Mean ± 1 SD = (62 ± 1 X 5) = 57 to 67 kg 95.4% = Mean ± 2 SD = (62 ± 2 X 5) = 52 to 72 kg 99.7% = Mean ± 3 SD = (62 ± 3 X 5) = 47 to 77 kg
  • 54. Coefficient of variation: • Coefficient of variation is used to compare the relative variability or spread. • It can measure variation of same variable in two or more different series having different magnitude of values (weight in children in two different sections) OR • two different variables in the same group (weight & height of children in same class).
  • 55. Variance • Variance is principally used in the calculation of standard deviation. • Variance (V) is (Standard Deviation)2, often symbolized by s2, i.e., V = (SD)2.
  • 57.
  • 58. • t-tests and Chi-square tests Both t-tests and chi-square tests are statistical tests, designed to test, and possibly reject, a null hypothesis. The null hypothesis is usually a statement that something is zero, or that something does not exist. For example, you could test the hypothesis that the difference between two means is zero, or you could test the hypothesis that there is no relationship between two variables. Test of Inference
  • 59. • Null Hypothesis Tested A t-test tests a null hypothesis about two means; most often, it tests the hypothesis that two means are equal, or that the difference between them is zero. For example, we could test whether boys and girls in fourth grade have the same average height. A chi-square test tests a null hypothesis about the relationship between two variables. For example, you could test the hypothesis that men and women are equally likely to vote "Democratic," "Republican," "Other" or "not at all." Types of Data A t-test requires two variables; one must be categorical and have exactly two levels, and the other must be quantitative and be estimable by a mean. For example, the two groups could be Republicans and Democrats, and the quantitative variable could be age. A chi-square test requires categorical variables, usually only two, but each may have any number of levels. For example, the variables could be ethnic group — White, Black, Asian, American Indian/Alaskan native, Native Hawaiian/Pacific Islander, other, multiracial; and presidential choice in 2008 — (Obama, McCain, other, did not vote).
  • 60. T-test • A t-test is used to compare the mean of two given samples. Like a z-test, a t-test also assumes a normal distribution of the sample. A t-test is used when the population parameters (mean and standard deviation) are not known. There are three versions of t-test 1. Independent samples t-test which compares mean for two groups 2. Paired sample t-test which compares means from the same group at different times 3. One sample t-test which tests the mean of a single group against a known mean. The statistic for this hypothesis testing is called t- statistic, the score for which is calculated as t = (x1 — x2) / (σ / √n1 + σ / √n2), where x1 = mean of sample 1 x2 = mean of sample 2 n1 = size of sample 1 n2 = size of sample 2
  • 61. • There are multiple variations of t-test which are explained in detail here T Test (Student's T-Test): Definition and Examples Contents: The t test (also called Student's T Test) compares two averages ( means) and tells you if they are different…
  • 62. Chi-Square Test Chi-square test is used to compare categorical variables. There are two type of chi-square test 1. Goodness of fit test, which determines if a sample matches the population. 2. A chi-square fit test for two independent variables is used to compare two variables in a contingency table to check if the data fits. a. A small chi-square value means that data fits b. b. A high chi-square value means that data doesn’t fit. The hypothesis being tested for chi-square is Null: Variable A and Variable B are independent
  • 63. • Alternate: Variable A and Variable B are not independent. The statistic used to measure significance, in this case, is called chi-square statistic. The formula used for calculating the statistic is • Χ2 = Σ [ (Or,c — Er,c)2 / Er,c ] where Or,c = observed frequency count at level r of Variable A and level c of Variable B Er,c = expected frequency count at level r of Variable A and level c of Variable B Note: As one can see from the above examples, in all the tests a statistic is being compared with a critical value to accept or reject a hypothesis. However, the statistic and way to calculate it differ depending on the type of variable, the number of samples being analyzed and if the population parameters are known. Thus depending upon such factors a suitable test and null hypothesis is chosen.
  • 64. It’s a process for establishing the relationships between two variables. It is plot on a “scatter plot”. Correlation is the most commonly used Correlation coefficient Methods (measure) of correlation summarize the relationship between two variables in a single number called the correlation coefficient. The correlation coefficient is usually represented using the symbol r, and it ranges from -1 to +1. • A correlation coefficient quite close to 0, but either positive or negative, implies little or no relationship between the two variables. • A correlation coefficient close to plus 1 means a positive relationship between the two variables, with increases in one of the variables being associated with increases in the other variable. Correlation
  • 65. • A correlation coefficient close to -1 indicates a negative relationship between two variables, with an increase in one of the variables being associated with a decrease in the other variable. • A correlation coefficient can be produced for ordinal, interval or ratio level variables, but has little meaning for nominal. • For ordinal scales, the correlation coefficient can be calculated by using Spearman’s rho. • For interval or ratio level scales, the most commonly used correlation coefficient is Pearson’s r, ordinarily referred to as simply the correlation coefficient.
  • 66. ANOVA, also known as analysis of variance, is used to compare multiple (three or more) samples with a single test. There are 2 major flavors of ANOVA 1. One-way ANOVA: It is used to compare the difference between the three or more samples/groups of a single independent variable. 2. MANOVA: MANOVA allows us to test the effect of one or more independent variable on two or more dependent variables. In addition, MANOVA can also detect the difference in co-relation between dependent variables given the groups of independent variables • The hypothesis being tested in ANOVA is • Null: All pairs of samples are same i.e. all sample means are equal Alternate: At least one pair of samples is significantly different The statistics used to measure the significance, in this case, is called F-statistics. The F value is calculated using the formula • F= (SSE1 — SSE2)/m)/ SSE2/n-k, where SSE = residual sum of squares m = number of restrictions k = number of independent variables There are multiple tools available such as SPSS, R packages, Excel etc. to carry out ANOVA on a given sample. ANOVA
  • 67.
  • 68. • Skewness is usually described as a measure of a dataset’s symmetry – or lack of symmetry. A perfectly symmetrical data set will have a skewness of 0. The normal distribution has a skewness of 0. The skewness is defined as (Advanced Topics in Statistical Process Control, Dr. Donald Wheeler, www.spcpress.com): SKEWNESS
  • 69.
  • 70. So, when is the skewness too much? The rule of thumb seems to be: • If the skewness is between -0.5 and 0.5, the data are fairly symmetrical • If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed • If the skewness is less than -1 or greater than 1, the data are highly skewed
  • 71. How to define kurtosis? This is really the reason this article was updated. If you search for definitions of kurtosis, you will see some definitions that includes the word “peakedness” or other similar terms. For example, • “Kurtosis is the degree of peakedness of a distribution” – Wolfram MathWorld • “We use kurtosis as a measure of peakedness (or flatness)” – Real Statistics Using Excel “Kurtosis tells you virtually nothing about the shape of the peak – its only unambiguous interpretation is in terms of tail extremity.” KURTOSIS
  • 72. Figure 5 is shows a dataset with more weight in the tails. The kurtosis of this dataset is 1.86.
  • 73.
  • 74. • A citation appears in the main text of the paper. It is a way of giving credit to the information that you have specifically mentioned in your research paper by leading the reader to the original source of information. You will need to use citation in research papers whenever you are using information to elaborate a particular concept in the paper, either in the introduction or discussion sections or as a way to support your research findings in the results section. • A reference is a detailed description of the source of information that you want to give credit to via a citation. The references in research papers are usually in the form of a list at the end of the paper. The essential difference between citations and references is that citations lead a reader to the source of information, while references provide the reader with detailed information regarding that particular source. • A bibliography in research paper is a list of sources that appears at the end of a research paper or an article, and contains information that may or may not be directly mentioned in the research paper. The difference between reference and bibliography in research is that an individual source in the list of references can be linked to an in-text citation, while an individual source in the bibliography may not necessarily be linked to an in-text citation. Citation : Reference : Bibliography :
  • 75. Citations References Bibliography Purpose To lead a reader toward a source of information included in the text To elaborate on of a particular source of information cited in the research paper To provide a list of all relevant sources of information on the research topic Placement In the main text At the end of the text; necessarily linked to an in-text citation At the end of the text; not necessarily linked to an in-text citation Information Minimal; denoting only the essential components of the source, such as numbering, names of the first and last authors, etc. Descriptive; gives complete details about a particular source that can be used to find and read the original paper if needed Descriptive; gives all the information regarding a particular source for those who want to refer to it