2. Statistical Analysis
Analysis of data refers to the critical examination of the
assembled and grouped data for studying the characteristics of the object under
study and for determining the patterns of relationship among the variables
relating to it.
Statistical analysis summarizes data into understandable and meaningful forms
and helps in the identification of the casual factors underlying complex
phenomena. It also helps in making estimations or generalization from the
results of sample surveys.
3. Univariate Analysis
It is a method for analyzing data on a single variable at a time,
where we’re observing only one aspect of phenomenon at a time time. With
single-variable data, we can put all our observations into a list of numbers.
Answers to statistical problems by collecting and analyzing data on one variable
are known as Univariate analysis.
Univariate analysis explores each variables in a data set separately.
4. For example,
If a researcher records the income of all employed residents of a particular
area and tabulates that data, it would depict just one variable, the income of
employed people in that area.
The statistics used to summarize Univariate data describe the data’s
center and spread. There are many options for displaying such summaries. The
most frequently used illustrations of univariate data are:
Frequency distributions
Histograms
Stem and leaf plots
Box and whisker plots
Pie charts
5. Cont.
Frequency distributions:-
It shows you the number of times an event occurs within the topic
being researched.
For example, if one were to ask students about the mode of transport they
take to come to college and the answers can be tabulated as follows
6. Cont.
Student’s mode of transport to college Frequency
Train 60
Bus 20
Bike 10
Walk 10
Total no. of students being surveyed 100
7. Cont.
Histogram:-
The bars convey the relationship of one group or class of the variable to
the other(s).
For example, income earned (represented on the Y-axis in lakhs of rupees and
types of grain sown(series) represented on the X-axis in four states in India.
8. Cont.
Stem and Leaf plots:-
A plot where each data value is split in to a “Leaf”(usually the last digit)
and a “Stem”( the other digits).
For example, “32” is split into “3”(stem) and “2”(leaf).
The stem values are listed down, and the leaf values are listed next to them.
9. Cont.
Pie chart:-
In a Pie chart, each “slice” represents the proportion of the total phenomenon
that is due to each of the classes or groups.
For example, sales revenue in a year
10. Bivariate analysis
Often researchers are interested in gathering information about more
than one variable.
For example, in study on the education levels of populations, researchers also
obtain data on other variables such as age, sex, family income, distance of
educational institutions etc.
When we only two variables are under consideration, we are studying
bivariate data.
11. Cont.
In Bivariate data, two values are recorded for each observations. For
example data on income and weight of individuals.
Income of the respondents(in ‘000
rupees)(Y)
Weight(in kgs.)(X)
1000 50
2000 55
3000 60
4000 62
5000 68
12. Cont.
There are two important characteristics of the data revealed in this table:
We can clearly observe that as the variable(income-Y) increases, the second
variable (weight-X)also increases.
If we graph the data we will see that the points cluster along a straight line.
When this occurs, the relationship between two variables is called a linear
relationship.
13. Types of variables
Nominal variables:
can be referred to as a set of categories that vary in quality but not
quantity. There is no order, distance or origins between the attributes or
variables. e.g., race, white, black, Asian.
Ordinal variable:
The distances between the values of the variables do not have precise
numerical meaning, distance between the categories is unknown. The variables
are ranked on certain characteristics or attributes of the objects.
14. Cont.
Interval variables: include those variables whose attributes are separated by
a uniform distance between them. The numbers associated with interval
variables usually have real meaning. The interval measures has an arbitrary
zero point and constant unit of measurement.e.g., age; city population;
income.
Ratio measures: are the same as interval measures except ratio measures
are based on a true zero point(e.g., age). All statistical operations can be
performed on ratio measures.
15. Association
With bivariate analysis, we are testing the hypothesis of “association”
and causality. Association is useful as it can be used to predict the value of the
dependent variable once we know the value of the independent variable .
A measures of association known as Correlation coefficient.
Regression analysis: Regression analysis is used to measure the degree of
relationship between two or more ratio variables. In regression analysis, the
dependent variable is generically denoted by Y, and the independent variable is
denoted by X.
16. Multivariate analysis
Multivariate analysis is a simultaneous study of several variables. It is more
informative than univariate analysis. However, also more complex than univariate
analysis. Multivariate stats test 3 or more variables together to check for all kinds of
effects that occur together.
Application areas
Social science: (gender, age, Nationality)of an individual
Climatology: (minimum temperature, maximum temperature. rainfall, humadity)on
a day
Econometrics: (input costs, production, profit) of a firm
17. Cont.
Socio-Demographic: (GDP, Life expectancy, Literacy rate)of a country
Medical: (systolic BP, diastolic BP, pulse rate)of persons
Pathological: (blood sugar, uric acid, hemoglobin count) of patients
Administrative: (admissions, operations, discharges, deaths) per day bin
hospitals
Pharmaceutical: (drug A, drug B,.. drug Z)sales per day in a pharmacy
18. conclusion
• Statistical analysis summarizes data into understandable and meaningful
forms and helps in the making estimations or generalizations from the results
of sample surveys. Answers to statistical problems by collecting and
analyzing data on one variable are known as Univariate analysis. When we
only two variables are under consideration, we are studying bivariate analysis
or data. With bivariate analysis ,we are testing hypothesis of “association”
and causality. Simultaneous study of several variables are known as
multivariate analysis.