PG STAT 531 lecture 1 introduction about statistics and collection, compilation, tabulation of data

STAT-531
Data Analysis using Statistical Packages
Dr. Ashish. C. Patel
Assistant Professor,
Dept. of Animal Genetics & Breeding,
Veterinary College, Anand
Lecture 1
History and Introduction to Statistics

Lecture 1
Introduction about statistics and
Collection, Compilation, Tabulation
of data

• Statistics: Statistics is the study of the collection,
compilation, organization (classification), analysis
and interpretation of numerical data.
• (Statistics in plural sense means numerical set of
data and in singular sense (statistic) means
science of certain statistical methods and
techniques used for various statistical
procedures. e.g. arithmetic mean value which is
single measure of some character of a sample.)

History of Statistics
• The Word statistics have been derived from
Latin word “Status” or the Italian word
“Statista”, meaning of these words is “Political
State”.
• Shakespeare used a word Statist is his drama
Hamlet (1602). In the past, the statistics was
used by rulers.

• The Roman Empire was one of the first states
to extensively gather data on the size of the
empire's population, geographical area and
wealth.

Ancient
Greece
Philosophers Ideas on quantitative
analyses
17th
Century
John Graunt,
William Petty
studied affairs of state,
vital statistics of
populations
Pascal,
Bernoulli
studied probability
through games of chance,
gambling
18th
Century
Laplace,
Gauss
normal curve, regression
through study of
astronomy

19th
Century
Adolphe
Quetelet
astronomer who first applied statistical
analyses to human biology
Francis
Galton
Studied genetic variation in humans(used
regression)
20th
Century
(early)
Karl
Pearson
Studied natural selection using correlation,
formed first academic department of
statistics, Biometrikajournal, helped
develop the Chi Square analysis
Gossett
(Student)
Studied process of brewing, alerted the
statistics community about problems with
small sample sizes, developed Student's
test
R. A. Fisher evolutionary biologists -developed ANOVA,
stressed the importance of experimental
design

20th
Century
(later)
Wilcoxon Biochemist studied pesticides, non-
parametric equivalent of two-
samples test
Kruskal,
Wallis
Economists who developed the
non-parametric equivalent of the
ANOVA
Spearman psychologist who developed a non-
parametric equivalent of the
correlation coefficient
Kendall statistician who developed another
non-parametric equivalent the
correlation coefficient

20th
Century
(later)
Tukey statistician who developed
multiple comparisons
procedure
Dunnett biochemist who studied
pesticides, developed multiple
comparisons procedure for
control groups
Keuls agronomist who developed
multiple comparisons
procedure

Terminology of Statistics
• Data:.
Data may be of
1. Qualitative data: it describes something e.g. body coat
colour, types of hair, hair colour etc.
2. Quantitative data: in the form of numerical information
(numbers). It may be of
• A. Discrete data whole numbers e.g. Number of students
present in the class, Number of animals on farm.
• B. Continuous data can take any value (within a range) e.g.
Height of students, Body weight, marks obtained in the
examination etc.
• C. Univariate data are data of only one variable. we are
working with only one variable.
• Bivariate data are data of two variables. we are working
with two variables (height and weight).

• Population: Populations are not just people but it includes
animals, businesses, buildings, motor vehicles, farms, objects
or events etc.
• Sample: It is subset of population that represents whole
population.
• Parameter: parameter is any numerical quantity that
characterizes a given population. E.g. mean, median and
mode, standard deviation, standard error, variance etc.
• Variable: It is the character under studies which show
variation from individual to individuals and also vary from
time to time.
• Variate: When any variable takes values on measurement
scale is called variate. e.g. Body weight is variable and
suppose body weight is 60.5 kg then 60.5 is variate.

• A variable that contains quantitative data is a
quantitative variable; a variable that contains
categorical data is a categorical variable.
• Quantitative variables
• When you collect quantitative data, the numbers you
record represent real amounts that can be added,
subtracted, divided, etc. There are two types of
quantitative variables: discrete and continuous.

• Categorical variables
• Categorical variables represent groupings of some kind. They
are sometimes recorded as numbers, but the numbers
represent categories rather than actual amounts of things.
• There are three types of categorical variables: binary,
nominal, and ordinal variables.

• An ordinal variable can also be used as a quantitative variable
if the scale is numeric and doesn’t need to be kept as discrete
integers. For example, star ratings on product reviews are
ordinal (1 to 5 stars), but the average star rating is quantitative

Other common types of variables
• Confounding variable: extra variables that have a hidden effect
on your experimental results.
• Control variable: a factor in an experiment which must be held
constant. For example, in an experiment to determine whether
light makes plants grow faster, you would have to control for
soil quality and water.
• Dependent variable: the outcome of an experiment. As you
change the independent variable, you watch what happens to
the dependent variable.
• Independent variable: a variable that is not affected by
anything that you, the researcher, does. Usually plotted on the
x-axis.

Collection of data:
• The process of counting or measurement or listing together
with the systematic recording of result is called collection of
data.
Primary data: The data which are originally collected by an
investigator for the first time. Methods of primary data
collection:
1. By Enquiry:
a. Official or unofficial enquiry
b. Initial (first time enquiry) or Repetitive (more than one time
enquiry)
c. Direct or Indirect enquiry
d. Census or sample (A Census is when we collect data for
every member of the group (the whole "population").
2. Direct personal investigation
3. Information from local agencies
4. Using Social media/E-mail through Internet

Secondary data: The data which have already been collected
and analyze by person or agency and which can be taken over
and used by some other agency or person. So, the data
become secondary data for second agency or person.
• Source of secondary data collection:
1. Official publication of central or state government
2. Publication of semi-government or private agencies
3. Publication of regional research station
4. News paper, periodicals, magazine, scientific journals,
books etc.

• Compilation of data: Compilation of data is a process of
condensing information by classifying and tabulating them
into various categories or groups.
• Classification of data: The process of arranging the data
in to classes or groups according to their similarities is
called classification.
• Purposes of classification:
a. For reduction of data
b. For comparision between groups, individuals
c. For studying relationship between different criteria of
group

Bases of classification:
a. Geographical (includes Area-wise or region wise) means
classification based on countries, states, cities, regions etc.
b. Chronological (time wise) means classification based on the
differences in time viz. year wise, month wise etc.

c. Qualitative
d. Quantitative means classification based on
quantitative measurement like income,
expenditure, height, weight, marks etc.

• TYPES OF TABLES:
• Simple or one-way Table: A simple or one-way table is the
simplest table. A simple table is easy to construct and simple to
follow.
• Two-way Table: A table contains data on two characteristics, is
called a two way table.

• Manifold Table: A table has more than two
characteristics of data is considered as a manifold
table. Manifold tables enable to incorporate full
information related facts.

• Frequency distribution:
• Frequency is a count of the occurrence of values
within a particular group or interval. The process
in which the observations are classified &
distributed in the proper class intervals and
recording the number of observations against
each class is known as frequency distribution.
• A frequency distribution is classification of data in
on the basis of types of variable whether
continuous or discrete.
• Thus, a frequency distribution may be
i) discrete frequency distribution and
ii) continuous frequency distribution.

Discrete frequency distribution Continuous frequency
distribution
Number of
lactation
No. of cow Weight of
cow
No. of cow
1-2 25 400-410 8
3-4 20 410-420 20
5-6 10 420-430 25
7-8 5 430-440 10

Common Statistical Tools/ Packages

• It will ask you whether u want install now?
• Call Yes
• It will take some time…..and will automatically
installed
• You can see data analysis option in Data menu
of excel

SISA (https://www.quantitativeskills.com/sisa/)

• StatsCalculator.com (https://statscalculator.com)

The Statistics Calculator (https://www.statpac.com/statistics-
calculator/free-version.htm)

Other freely available statistical
packages

PG STAT 531 lecture 1 introduction about statistics and collection, compilation, tabulation of data

PG STAT 531 lecture 1 introduction about statistics and collection, compilation, tabulation of data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PG STAT 531 lecture 1 introduction about statistics and collection, compilation, tabulation of data

Similar to PG STAT 531 lecture 1 introduction about statistics and collection, compilation, tabulation of data (20)

More from Aashish Patel

More from Aashish Patel (20)

Recently uploaded

Recently uploaded (20)

PG STAT 531 lecture 1 introduction about statistics and collection, compilation, tabulation of data