SlideShare a Scribd company logo
1 of 134
Download to read offline
Research Methodology
PART 8
Statistical Techniques for Processing &
Analysis of Data
M S Sridhar
Head, Library & Documentation
ISRO Satellite Centre
Bangalore 560017
E-mail: sridhar@isac.gov.in & mirlesridhar@gmail.com
Research Methodology 8 M S Sridhar, ISRO 2
Statistical techniques for processing & analysis of data
Synopsis
1. Introduction to Research
& Research methodology
2. Selection and formulation
of research problem
3. Research design and
plan
4. Experimental designs
5. Sampling and sampling
strategy or plan
6. Measurement and scaling
techniques
7. Data collection methods
and techniques
8. Testing of hypotheses
9. Statistical techniques for
processing & analysis of
data
10. Analysis, interpretation
and drawing inferences
11. Report writing
1. Introduction
Statistics: what, why and
characteristics
2. Statistic Types
Quantitative & Qualitative (Variable
& Attribute) data
Descriptive & Inferential statistics
3. Processing & Analysis of data
¾ Processing:
1. Editing
2. Coding
3. Classification
4. Tabulation
¾ Analysis
1. Descriptive & inferential
2. Correlational, causal &
multivariate …contd.
Research Methodology 8 M S Sridhar, ISRO 3
Statistical Techniques for Processing & Analysis of Data:contd.
4. Some processing techniques
¾Tally sheet / chart
¾ Presentation of data
9 Textual or descriptive
9 Tabular
9 Diagrammatic/ graphical
5. Univariate analysis/
measures
¾ Central tendency
¾ Dispersion
¾ Asymmetry (skewness)
6. Bivariate & Multivariate
analysis/ measures
Synopsis
1. Introduction to Research
& Research
methodology
2. Selection and formulation
of research problem
3. Research design and
plan
4. Experimental designs
5. Sampling and sampling
strategy or plan
6. Measurement and scaling
techniques
7. Data collection methods
and techniques
8. Testing of hypotheses
9. Statistical techniques for
processing & analysis of
data
10. Analysis, interpretation
and drawing inferences
11. Report writing
Research Methodology 8 M S Sridhar, ISRO 4
Statistics
•Science of statistics cannot be ignored by researcher
•Statistics is both singular and plural. As plural it means numerical
facts systematically collected and as singular it is the science of
collecting, classifying and using statistics
•It is a tool for designing research, processing & analysing data and
drawing inferences / conclusions
•It is also a double edged tool easily lending itself for abuse and
misuse
Abuse⇒ Poor data + Sophisticated techniques = Unreliable Result
Misuse⇒Honest facts (Hard data) + Poor techniques =
Impressions
Examples:
Percentage for very small sample
Using wrong average
Playing with probability
Scale & origin and proportion between ordinate & abscissa
Funny correlation
One-dimensional figure
Unmentioned base
Research Methodology 8 M S Sridhar, ISRO 5
Characteristics of Statistics
1. Aggregates of facts
2. Affected by multiple causes
3. Numerically expressed
4. Collected in a systematic manner
5. Collected for a predetermined purpose
6. Enumerated or estimated according to reasonable
standard of accuracy
7. Statistics must be placed in relation to each other
(context)
Research Methodology 8 M S Sridhar, ISRO 6
What statistics does?
1. Enables to present facts on a precise definite form
that helps in proper comprehension of what is stated.
Exact facts are more convincing than vague
statements
2. Helps to condense the mass of data into a few
numerical measures, i.e., summarises data and
presents meaningful overall information about a mass
of data
3. Helps in finding relationship between different factors
in testing the validity of assumed relationship
4. Helps in predicting the changes in one factor due to
the changes in another
5. Helps in formulation of plans and policies which
require the knowledge of further trends and hence
statistics plays vital role in decision making
Research Methodology 8 M S Sridhar, ISRO 7
Statistic types
• Deductive statistics describe a complete set of data
• Inductive statistics deal with a limited amount of data
like a sample
• Descriptive statistics ( & causal analysis) is concerned
with development of certain indices from the raw data
and causal analysis. Measures of central tendency and
measures of dispersion are typical descriptive
statistical measures
• Inferential (sampling / statistical) analysis: Inferential
statistics is used for (a) estimation of parameter values
(point and interval estimates) (b) testing of hypothesis
(using parametric / standard tests and non-parametric /
distribution-free tests) and (c) drawing inferences
Research Methodology 8 M S Sridhar, ISRO 8
Descriptive Statistics (Techniques)
1. Uni-dimension analysis (Mostly one variable)
(I) Central tendency - Mean, median, mode,
GM & HM
(ii) Dispersion - variance, standard deviation ,
mean deviation & range
(iii) Asymmetry (Skewness) & Kurtosis
(iv) Relationship - Pearson’s product moment
correlation, spearman’s rank order
correlation, Yule's coefficient of association
(v) Others - One way ANOVA, index numbers,
time series analysis, simple correlation &
regression analysis
Research Methodology 8 M S Sridhar, ISRO 9
Descriptive Statistics (Techniques) …contd.
2. Bivariate analysis
(I) Simple regression & correlation
(ii) Association of attributes
(iii)Two-way ANOVA
3. Multivariate analysis
(i)Multiple regression & correlation/partial correlation
(ii)Multiple Discriminate Analysis: Predicting an
entity’s possibility of belonging to a particular
group based on several predictors
(iii)Multi-ANOVA: Extension of two-way ANOVA;
ratio of among group variance to within group
variance
(iv)Canonical analysis : Simultaneously predicting a
set of dependent variables (both measurable &
non measurable)
(v)Factor analysis, cluster analysis, etc.
Research Methodology 8 M S Sridhar, ISRO 10
Quantitative and Qualitative (Variable and Attribute) Data
• Quantitative (or numerical) data
an expression of a property or quality in numerical terms
data measured and expressed in quantity
enables (i) precise measurement (ii) knowing trends or
changes over time, and (iii) comparison of trends or
individual units On the other hand,
• Qualitative (or categorical ) data
involves quality or kind with subjectivity
Variables data are quality characteristics that are
measurable values, i.e., they are measurable, normally
continuous and may take on any value
Attribute data are quality characteristics that are observed
to be either present or absent, conforming or not
conforming, i.e., they are countable, normally discrete and
integer
Research Methodology 8 M S Sridhar, ISRO 11
Processing and Analysis of Qualitative Data
When feel & flavour of the situation become important,
researchers resort to qualitative data (some times called
attribute data)
Qualitative data describe attributes of a single or a group of
persons that is important to record as accurately as
possible even though they cannot be measured in
quantitative terms.
More time & efforts are needed to collect & process qualitative
data. Such data are not amenable for statistical rules &
manipulations. However, Scaling techniques help
converting qualitative data into quantitative data. Usual data
reduction, synthesis and plotting trends are required but
differ substantially and extrapolation of finding is difficult. It
calls for sensitive interpretation & creative presentation.
Examples: Quotation from interview, open remarks in
questionnaire, case histories bringing evidence, content
analysis of verbatim material, etc. …contd.
Research Methodology 8 M S Sridhar, ISRO 12
Process and Analysis of Qualitative Data …contd.
Note: Identifying & coding recurring answers to open ended
questions help categorise key concepts & behaviour. May even
count & cross analyse (requires pattern discerning skill); even
unstructured depth interviews can be coded to summarise key
concepts & present in the form of master charts
Qualitative coding involves classifying data which are (i) not
originally created for research purpose and (ii) having very little
order
STEPS:
1. Initial formalisation with issues arising (build themes &
issues)
2. Systematically describing the contents (compiling a list of
key themes)
3. Indexing the data (note reflections for patterns, links, etc.) in
descriptions; interpreting in relation to objective; checking
the interpretation
4. Charting the data themes
5. Refining the charted material
6. Describing & discussing the emerging story
Research Methodology 8 M S Sridhar, ISRO 13
Processing and Analysis of Quantitative Data
¾ Quantitative data are numbers representing counts, ordering or
measurements can be described, summarised (data reduction),
aggregated, compared and manipulated arithmetically & statistically
¾ Levels of measurement (ie., nominal, ordinal, interval & ratio)
determine the kind of statistical techniques to be used
¾ Use of computer is necessary in many situations
1. Organisation and classification of data
2. Presentation of data
3. Analysis of data
Inferential (Sampling / Statistical) Analysis is concerned with
process of generalisation through estimation of parameter values
and testing of hypotheses
4. Interpretation of data
Inference: Data processing, analysis, presentation (presenting in
table, chart or graph) & interpretation (interpreting is to expound the
meaning) should lead to drawing inference, i.e., (i) Validation of
hypotheses and (ii) Realisation of objectives with respect to (a)
Relationship between variables (b) Discovering a fact (c)
Establishing a general or universal law
Research Methodology 8 M S Sridhar, ISRO 14
Processing and Analysis of Quantitative Qata …contd.
STEPS: 1. Data reduction : Reduce large batches & data
sets (a) to numerical summaries, tabular & graphical form
(b) to enable to ask questions about observed patterns
2. Data presentation 3. Exploratory data analysis 4.
Looking for relationships & trends 5. Graphical
presentation
PROCESSING (Aggregation & compression):
1. Editing : (i) Field editing (ii) Central editing
2. Coding: Assigning to a limited number of mutually
exclusive but exhaustive categories or classes
3. Classification: arranging data in groups or classed on the
basis of common characteristics (i) By attributes ( statistics
of attributes) (ii) By class intervals (statistics of variables)
Note:class limits, class intervals,magnitude, determination of
frequencies & number of classes (normally 5-15; size of class
interval, i = R / 1+3.3 log N Where R = Range & N = No. of items to
be grouped) are discussed later
Research Methodology 8 M S Sridhar, ISRO 15
Processing and Analysis of Quantitative Data
4. Tabulation/ Tabular Presentation :
To make voluminous data readily usable and easily comprehensible
three forms of presentation are possible
A. Textual (descriptive) presentation: When the quantity of data
is not too large and no difficulty in comprehending while going
through, textual presentation helps to emphasise certain points.
E.g. There are 30 students in the class and of which 10 (one-third)
are female students.
B.Tabular presentation: Summarising and displaying data in a
concise / compact and logical order for further analysis is the
purpose of tabulation. It is a statistical representation presenting as
a simple or complex table for summarising and comparing
frequencies, determining bases for and computing percentages,
etc. Note: While tabulating responses to questionnaire that
problems concerning the responses like ‘Don’t know’ and not
answered responses, computation of percentages, etc. have to be
handled carefully.
C. Diagrammatic presentation
Research Methodology 8 M S Sridhar, ISRO 16
Tabular Presentation of Data
Table organises data presenting in rows and columns
with cells containing data for further statistical
treatment and decision making. Four kinds of
classification used in tabulation are:
i) Qualitative classification based on qualitative
characteristics like status, nationality and gender
ii) Quantitative classification based on characteristics
measured quantitatively like age, height and income
(assigning class limits for the values forms classes)
iii) Temporal classification : Categorised according to
time (with time as classifying variable). E.g., hours,
days, weeks, months, years
iv) Spatial classification: Place as a classifying variable.
E.g. Village, town, block, district, state, country
Research Methodology 8 M S Sridhar, ISRO 17
Parts of Table
Table is conceptualised as data presented in rows and
columns along with some explanatory notes.
Tabulation can be one-way, two-way, or three-way
classification depending upon the number of
characteristics involved
i) Table number for identification purpose at the top or at
the beginning of the title of the table; Whole numbers
are used in ascending order; Subscripted numbers are
used if there are many tables
ii) Title, usually placed at the head, narrates about the
contents of the table; Clearly, briefly and carefully
worded so as to make interpretations from the table
clear and free from ambiguity
iii) Captions or column heading: are column
designations to explain figures of the column
Research Methodology 8 M S Sridhar, ISRO 18
Parts of Table …contd.
iv) Stab or row leadings (stab column) are designations
of the rows
v) Body of the table contains the actual data
vi) Unit of measurement: stated along with the title;
does not change throughout the table unless stated
when different units are used for rows and columns;
if stated figures are large, they are rounded up and
indicated
vii)Source Note at the bottom of the table to indicate
the source of data presented
viii)Foot Note is the last part of the table; explains the
specific feature of the data content, which is not
self explanatory and has not been explained earlier
Research Methodology 8 M S Sridhar, ISRO 19
Preparation of Frequency Distribution Table
1. Deciding number of classes: The rule of thumb is to have 5
to 15 classes. Know the range and variations in variable’s
value. Range is the difference between the largest and the
smallest value of the variable (i.e., It is the sum of all class
intervals or the number of classes multiplied by class
interval) (Class interval is the various intervals of the
variable chosen for classifying data)
2. Deciding size of each class : 1 and 2 are inter-linked
3. Determining the class limit : Choose a value less than the
minimum value of the variable as the lower limit of the first
class and a value greater than the maximum value of the
variable is the upper class limit for the last class.
Note: It is important to choose class limit in such a way that
mid-point or class mark of each class coincides, as far as
possible, with any value around which the data tend to be
concentrated, i.e., Class limits are chosen in such a way
that midpoint is close to average
Research Methodology 8 M S Sridhar, ISRO 20
Class Intervals in Frequency Tables
11 12 13 14 16 17 18 19
5 UNITS 5 UNITS
10 15 20
LOWER MID-POINT UPPER
LIMIT LIMIT
11 12 13 14
2.5 UNITS 2.5 UNITS
10 12.5 20
LOWER MID-POINT UPPER
LIMIT LIMIT
A
B
15
Even class-interval
& its mid point
Odd class-interval
& its mid point
Research Methodology 8 M S Sridhar, ISRO 21
Preparation of Frequency Distribution contd.
Two methods for class limits: Exclusive & inclusive type class
intervals for determination of frequency of each class (see tally
sheet example given later)
(i) Exclusive method: Upper class limit of one class equals the
lower class limit of the next class. Suitable in case of data of a
continuous variable and here the upper class limit is excluded but
the lower class limit of a class is included in the interval
(ii) Inclusive method: Both class limits are parts of the class interval.
An adjustment in class interval is done if we found ‘gap’ or
discontinuing between the upper limit of a class and the lower limit
of the next class.
Divide the difference between the upper limit of first class and lower
limit of the second class by 2 and subtract it from all lower limits
and add it to all upper class limits.
Adjusted class mark = (Adjusted upper limit + Adjusted lower limit) /2
This adjustment restores continuity of data in the frequency
distribution
Research Methodology 8 M S Sridhar, ISRO 22
Preparation of Frequency Distribution contd.
4. Find the frequency of each class (i.e., how many times that
observation occurs in the row data) by tally marking. Frequency of an
observation is the number of times a certain observations occurs.
Frequency table gives the class intervals and the frequencies
associated with them
Loss of information: Frequency distribution summarises raw data to
make it concise and comprehensible, but does not show the details
that are found in raw data.
Bivariate Frequency distribution is a frequency distribution of two
variables (e.g.:No. of books in stock and budget of 10 libraries)
Frequency Distribution with unequal classes: Some classes having
either densely populated or sparsely populated observations, the
observations deviate more from their respective class marks than in
comparison to those in other classes. In such cases, unequal classes
are appropriate. They are formed in such a way that class marks
coincide, as far as possible, to a value around which the observations
in a class tend to concentrate, then in that case unequal class interval
is more appropriate.
Frequency array: For a discrete variable, the classification of its data
is known as a frequency array (e.g. No. of books in 10 libraries)
Research Methodology 8 M S Sridhar, ISRO 23
Analysis of Data
Computation of certain indices or measures, searching for patterns
of relationships, estimating values of unknown parameters, & testing
of hypothesis for inferences
1. Descriptive analysis : Largely the study of distributions of one
variable (uni-dimension); Univariate analysis → Two variables
Multivariate analysis → More than two variables
2. Inferential or statistical analysis :
• Correlation & causal analysis:
™Joint variation of two or more variables is correlation analysis
™How one or more variables affect another variable is causal
analysis
™Functional relation existing between two or more variables is
regression analysis
• Multivariate analysis: Simultaneously analysing more than two
variables
• Multiple regression analysis: Predicting dependent variable based
on its covariance with all concerned independent variables
Research Methodology 8 M S Sridhar, ISRO 24
Tally (tabular) sheets /charts for frequency distribution of
qualitative, quantitative and grouped/ interval data
I. Single variable (Univariate measures)
1. Quantitative
(I) Simple data
(ii) Frequency distribution of grouped / interval data
2. Qualitative (Attributes)
II. Two or more variables (Bivariate & multivariate
measures)
1. Quantitative / Quantitative
(I) Simple (ii) Frequency distribution
2. Quantitative / Qualitative (Attributes)
(I) Simple (ii) Frequency distribution
3. Qualitative / Qualitative (Attributes)
examples of tabulation and tabular presentation follows
Research Methodology 8 M S Sridhar, ISRO 25
Table 8.1 (Quantitative data)
Frequency distribution of
citations in technical reports
No. of
citations
Tally Frequency
(No. of tech.
reports)
0 ⎟⎟ 2
1 ⎟⎟⎟⎟ 4
2 ⎟⎟⎟⎟ 5
3 ⎟⎟⎟⎟ 4
4 ⎟⎟⎟⎟ ⎟⎟ 7
5 ⎟⎟⎟⎟ ⎟⎟⎟ 8
Total 30
Table 8.2 (Qualitative data) Frequency
distribution of qualification (educational
level) of users
Qualification Tally Frequency
(No. of
users)
Undergraduates ⎟⎟⎟⎟ ⎟ 6
Graduates ⎟⎟⎟⎟ ⎟⎟⎟⎟ 9
Postgraduates ⎟⎟⎟⎟ ⎟⎟ 7
Doctorates ⎟⎟⎟ 3
Total 25
Research Methodology 8 M S Sridhar, ISRO 26
Table 8.3: Frequency distribution of age of 66 users who used a
branch public library during an hour (Grouped/ interval data of
single variable) (Note that the raw data of age of individual users is
already grouped here)
Age in years
(Groups/Classes)
Tally Frequency (No. of
users)
< 11 11
11 – 20 14
21 – 30 16
31 – 40 12
41 – 50 6
51 - 60 3
> 60 4
Total 66
Research Methodology 8 M S Sridhar, ISRO 27
Table 8.4: No. of books
acquired by a library over last
six years
Year No. of Books
acquired
(Qualitative) (Quantitative)
2000 772
2001 910
2002 873
2003 747
2004 832
2005 891
Total 5025
Table 8.5: The daily visits of users
to a library during a week are
recorded and summarised
Day Number of
users
(Qualitative) (Quantitative)
Monday 391
Tuesday 247
Wednesday 219
Thursday 278
Friday 362
Saturday 96
Total 1593
Research Methodology 8 M S Sridhar, ISRO 28
Table 8.6: The frequency distribution of number of authors
per paper of 224 sample papers
No. of Authors No. of Papers
1 43
2 51
3 53
4 30
5 19
6 15
7 6
8 4
9 2
10 1
Total 224
Research Methodology 8 M S Sridhar, ISRO 29
Table 8.7: Total books (B), journals (J) and reports ( R)
issued out from a library counter in one hour are recorded
as below:
B B B J B B
B B J B B B
B B B B B B
B B B B B J
B R B B B J
A frequency table can be worked out for above data as shown below:
Document Tally Frequency Relative Cumulative Cumulative
Type (Number) frequency frequency relative
frequency
Books 20 0.8 20 0.8
Journal 4 0.16 24 0.96
Reports 1 0.04 25 1.0
Total 25 1.0
Research Methodology 8 M S Sridhar, ISRO 30
Table 8.7 contd.
Note: If the proportion of each type of document
(category) are of interest rather than actual numbers,
the same can be expressed in percentages or as
proportions as shown below:
Proportions of books, journals and reports issued from
a library in one hour is 20:4:1
OR
Type of
document
Proportion of each type
of document (%)
Books 80
Journal 16
Reports 4
Total 100
Research Methodology 8 M S Sridhar, ISRO 31
Table 8.8: Given below is a summarized table of the relevant
records retrieved from a database in response to six queries
Search Total Relevant % of relevant
No. Documents Documents records
Retrieved Retrieved Retrieved
1 79 21 26.6
2 18 10 55.6
3 20 11 55.0
4 123 48 39.0
5 6 8 50.0
6 109 48 44.0
Total 375 146
Note:Percentage of relevant records retrieved for each query gives better
picture about which query is more efficient than observing just
frequencies.
Research Methodology 8 M S Sridhar, ISRO 32
Table 8.9: Frequency distribution of borrowed use of books of a library over
four years
No. Times borrowed No. of Books Percentage Cumulative borrowed
(Quantitative) (Quantitative) Percentage
0 19887 57.12 57.12
1 4477 12.56 69.68
2 4047 11.93 81.61
3 1328 3.81 85.42
4 897 2.57 87.99
5 726 2.02 90.01
6 557 1.58 91.68
7 447 1.28 92.96
8 348 1.00 93.96
9 286 0.92 94.78
10 290 0.84 95.62
>10 1524 4.38 100.00
Research Methodology 8 M S Sridhar, ISRO 33
Table 8.10: The raw data of self-citations in a
sample of 10 technical reports are given below:
5 0 1 4 0
3 8 2 3 0
4 2 1 0 7
3 1 2 6 0
2 2 5 7 2
Frequency distribution of self-citations of technical reports:
No. of self- Frequency Less than (or equal) More than (or equal)
citations (No. of reports) cumulative frequency cumulative frequency
No. % % %
0 5 20 20 100
1 3 12 32 80
2 6 24 56 68
3 3 12 68 44
4 2 8 76 32
5 2 8 84 24
6 1 4 88 16
7 2 8 96 12
8 1 4 100 4
Total 25
Research Methodology 8 M S Sridhar, ISRO 34
Table 8.11: (Qualitative Data) Responses in the form of
True (T) or False (F) to a questionnaire (opinionnaire) is
tabulated and given along with qualitative raw data
True 17 T T T F F
False 8 F T T T T
No response 5 T T F F T
Total 30 F T F T T
T T T T F
Research Methodology 8 M S Sridhar, ISRO 35
Grouped or Interval Data
¾So far (except in Table 8.3) only discrete data are presented and
the number of cases/ items are also limited
¾As against discrete data, continuous data like heights of people,
have to be collected in groups or intervals, like height between 5’
and 5’5” for a meaningful analysis
¾Even large quantity of discrete data require compression and
reduction for meaningful observation, analysis and inferences
¾Table 8.12 in the next slide presents 50 observations and if we
create a frequency table of these discrete data it will have 22 lines
in the table as there are 22 different values 9ranging from Rs.10/-
to Rs.100/-). Such large tables are undesirable as they not only
take more time but also the resulting frequency table is less
appealing. In such situations, we transform discrete data into
grouped or interval data by creating manageable number of
classes or groups. Such data compression and reduction are
inevitable and worth despite some loss of accuracy (or data)
Research Methodology 8 M S Sridhar, ISRO 36
Table 8.12 : (Grouped or interval data) Raw data of prices
(in Rs.) of a set of 50 popular science books in Kannada
30 80 100 12 40
50 60 40 30 45
40 30 70 43 40
25 50 10 30 35
18 35 60 35 25
27 25 25 30 30
35 35 14 32 35
25 30 40 15 30
20 16 13 30 60
20 65 60 40 10
Research Methodology 8 M S Sridhar, ISRO 37
Frequency distribution of grouped or interval data of Table 8.12
Price in Rs.
No. of
books
10 1 Mean = Rs. 35.9
12 1 Median = Rs. 33.5
13 1 Mode = Rs. 30
14 1
15 1
16 1
18 1
20 2
25 5
27 1
30 9
32 1
35 6
40 6
43 1
45 1
50 2
60 4
65 1
70 1
80 1
100 1
Research Methodology 8 M S Sridhar, ISRO 38
Frequency distribution of grouped or interval data of
price (in Rs.) of popular science books in Kannada
(Table 8.12) :
Price (in Rs.) (class) Frequency (f) (No. of books)
1 - 10 2
11 - 20 8
21 - 30 15
31 - 40 13
41 - 50 4
51 - 60 4
61 - 70 2
71 - 80 1
81 - 90 0
91 -100 1
Total 50
Research Methodology 8 M S Sridhar, ISRO 39
Home work
Work out a frequency table with less than cumulative
and more than cumulative frequencies for the raw
data of number of words per line in a book given
below :
12 10 12 09 11 10 13 13
07 11 10 10 09 10 12 11
01 10 13 10 15 13 11 12
08 13 11 10 08 12 13 11
09 11 14 12 07 12 11 10
Research Methodology 8 M S Sridhar, ISRO 40
Diagrammatic/ Graphical Presentation
¾ quickest understanding of the actual
situation to be explained by data
compared to textual or tabular
presentation
¾ translates quite effectively the highly
abstract ideas contained in numbers
into more concrete and easily
comprehensive from
¾ may be less accurate but more
effective than table
¾ tables and diagrams may be suitable
to illustrate discrete data while
continuous data is better represented
by graphs
Note: Sample charts are constructed
and presented using data from
previously presented tables. Different
types of data may require different
modes of diagrammatic representation
Three important kinds
of diagrams:
i) Geometric diagram
(a) Bar (column) chart:
simple, multiple, and
component
(b) Pie
ii) Frequency diagram
(a) Histogram
(b) Frequency
polygon
(c) Frequency curve
(d) Ogive or
cumulative frequency
curve
iii) Arithmetic line
graph
Research Methodology 8 M S Sridhar, ISRO 41
Simple column chart for data in Table 8.2 : Qualification
of users
6
9
7
3
0
1
2
3
4
5
6
7
8
9
10
U
n
d
e
r
g
r
a
d
u
a
t
e
s
G
r
a
d
u
a
t
e
s
P
o
s
t
g
r
a
d
u
a
t
e
s
D
o
c
t
o
r
a
t
e
s
No.
of
usres
Research Methodology 8 M S Sridhar, ISRO 42
Bar chart for data from Table 8.9: Frequency distribution
of borrowed use of books of a library over four years
19 8 8 7
4 4 77 4 0 4 7
13 2 8
8 9 7 72 6 557 4 4 7 3 4 8 2 8 6 2 9 0
152 4
0
5000
10000
15000
20000
25000
0 1 2 3 4 5 6 7 8 9 10 >10
No. of times borrowed
No.
of
books
Research Methodology 8 M S Sridhar, ISRO 43
Bar chart for data in Table 8.1 : Frequency
distribution of citations in technical reports
2
4
5
4
7
8
0 2 4 6 8 10
0
1
2
3
4
5
N
o.
of
c
ita
tions
No. of reports
Research Methodology 8 M S Sridhar, ISRO 44
Component bar chart
Research Methodology 8 M S Sridhar, ISRO 45
100% component column chart
Research Methodology 8 M S Sridhar, ISRO 46
Grouped column chart
Research Methodology 8 M S Sridhar, ISRO 47
Comparative 100% columnar chart
Chart with figures / symbols
Research Methodology 8 M S Sridhar, ISRO 48
Histogram (frequency polygon) for data in
Table 8.6: No. of authors per paper
43
51
53
30
19
15
6
4
2 1
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10
No. of authors
N
o.
of
papers
Frequency polygon
Research Methodology 8 M S Sridhar, ISRO 49
Line graphs
Research Methodology 8 M S Sridhar, ISRO 50
Line graph for data in Table 8.6 : No. of authors per paper
43
51 53
30
19
15
6 4 2 1
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 10
No. of authors
No.
of
papers
Research Methodology 8 M S Sridhar, ISRO 51
Frequency Distribution of No. of Words per Line of a
Book (Home work)
2.5 7.5 12.5
20
42.5
62.5
80
95 97.5 100
0
20
40
60
80
100
120
Less than (or equal) cumulative frequency in %
Research Methodology 8 M S Sridhar, ISRO 52
Cumulative frequency graph of reduction in no. of journals
subscribed and no. of reports added over the years
0
200
400
600
800
1000
1200
1980 1985 1990 1995 2000 2002
Reports (annual intake)
Journals (subscribed)
Reports Journals
Year (annual intake) (subscribed)
1980 1063 533
1985 936 519
1990 523 444
1995 288 416
2000 67 326
2002 29 300
Research Methodology 8 M S Sridhar, ISRO 53
Line graph of less than or equal cumulative frequency
of self-citations in technical reports(Table 8.12)
20
32
56
68
76
84 88
96 100
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9
No. of self-citations
No.
of
reports
Research Methodology 8 M S Sridhar, ISRO 54
Line graph for more than or equal cumulative
frequency of self-citations in reports (Table 8.12)
80
68
44
32
24
16 12
4
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9
No. of self-citations
N
o
.
o
f
rep
o
rts
Research Methodology 8 M S Sridhar, ISRO 55
Pie Diagram / Chart for Example 8.7: No. of
books, journals and reports issued per hour
Repor t s
4%
Books
80%
Jour nal s
16%
Books Journals Reports
Research Methodology 8 M S Sridhar, ISRO 56
Univariate Measures: A. Central Tendency
Central tendency or averages are used to summarise data. It
specifies a single most representative value to describe the
data set.
1. The sum of the deviations of individual values of x from the
mean will always add up to zero
2. The positive deviations must balance the negative
deviations.
3. It is very sensitive to extreme values
4. The sum of squares of the deviations about the mean is
minimum
A good measure of central tendency should meet the
following requisites
- easy to calculate and understand
- rigidly delivered
- representative of data
- should have sampling stability
- should not be affected by extreme values
Research Methodology 8 M S Sridhar, ISRO 57
Univariate Measures: A. Central Tendency
1. MEAN: Arithmetic mean (called statistical / arithmetic
average) is the most commonly used measure. By dividing
the total of items by total number of items we get mean.
Characteristics of Mean
¾ most representative figure for the entire mass of data
¾ tells the point about which items have a tendency to
cluster
¾ unduly affected by extreme items (very sensitive to
extreme values)
¾ The positive deviations must balance the negative
deviations (The sum of the deviations of individual
values of x from the mean will always add up to zero)
¾ The sum of squares of the deviations about the mean is
minimum
Research Methodology 8 M S Sridhar, ISRO 58
Univariate Measures: A. Central Tendency
1. MEAN
X = xi
n
=
X1 +X2+ ….+Xn
n
EX: 4 6 7 8 9 10 11 11 11 12 13
X 11
102
=
f1x1 +f2 x2+ ….+fn xn
fi xi
fi
X
f1+f2+ ….+fn (= n)
=
=
= 9.27
For grouped or interval data
Research Methodology 8 M S Sridhar, ISRO 59
Mean for grouped or interval data
X = ∑ fi xi / n where n = ∑ fi
= f1X1 + f2 X2 + ….+ fn Xn / f1 + f2 + ….+ fn
Formula for Weighted Mean:
X w = ∑ Wx Xi / ∑ Wi
Formula for Mean Of Combined Sample:
X = n X + m Y / n + m
Formula for Moving Average (
Shortcut or Assumed Average Method):
X = fi (Xi – A) / n : where n = ∑ fi
NOTE: Step deviation method takes common factor out to enable
simple working and uses the formula X = g + [∑ f d / n] (i)
Research Methodology 8 M S Sridhar, ISRO 60
Price (in
Rs.)
(class)
Frequenc
y (f) (No.
of books)
Cumulative
less than or
equal
frequency (cf)
Distance of class
from the
assumed average
class (d) fd d2 fd2
1 -- 10 2 2 -4 -8 16 32
11 -- 20 8 10 -3 -24 9 72
21 -- 30 15 25 -2 -30 4 60
31 -- 40 13 38 -1 -13 1 13
41 -- 50 4 42 0 0 0 0
51 -- 60 4 46 1 4 1 0
61 -- 70 2 48 2 8 4 8
71 -- 80 1 49 3 3 9 9
81 -- 90 0 50 4 4 16 0
Total 50 -56 194
Calculation of the mean (¯X ) from a frequency distribution of grouped or
interval data of price (in Rs.) of popular science books in Kannada (Table
8.12) using Step deviation method is shown below:
g = 46 ; ∑ƒd = - 56 ; n = 50 ; i = 10
¯X = g + [∑ f d / n] (i) = 46 + [ -56 / 50] (10) = 34.6
Note: Compare answer with mean calculated as discrete data
in Table 8.12
Research Methodology 8 M S Sridhar, ISRO 61
Assumed average (shortcut) method & step deviation
method
Table: Calculation of the mean (x ) from a frequency distribution. data represent weights or
265 male freshman students at the university of Washington
Class-Interval (Weight) ƒ d ƒd
ƒd
90 - 99 .......... 1 -5 -5 X = g + ( i )
100 -- 109 …….. 1 -4 -4 N
110 -- 119 …….. 9 -3 -27 99
120 -- 129 ……... 30 -2 -60 = 145 + ----- ( 10 )
130 -- 139 …….. 42 -1 -42 265
140 -- 149 ……… 66 0 0
150 -- 159 ……… 47 1 47 = 145 + ( .3736) (10)
160 -- 169 ……… 39 2 78 = 145 + 3.74
170 -- 179 ……… 15 3 45 = 148.74
180 -- 189 ……… 11 4 44
190 -- 199 ……… 1 5 5 fi – ( Ai - A )
200 -- 209 ……… 3 6 18 X = A + ---------------------
fi
N = 265 ƒd = 237 - 138 = 99
Research Methodology 8 M S Sridhar, ISRO 62
Univariate Measures: A. Central Tendency contd..
WEIGHTED MEAN
Xw =
EX: MEAN OF COMBINED SAMPLE
NX + MY
Z =
N+M
MOVING AVERAGE
SHORTCUT OR ASSUMED AVERAGE METHOD
(Xi – A) fi (Xi – A)
X = A + X = A+
n fi
NOTE: Step deviation method takes common factor out to enable
simple working
Wi Xi
Wi
Research Methodology 8 M S Sridhar, ISRO 63
Univariate Measures: A.Central Tendency
2. Median: Middle item of series when arranged in ascending or
descending order of magnitude
M = VALUE OF N+1 / 2 TH ITEM
EX: 11 7 13 4 11 9 6 11 10 12 8
4 6 7 8 9 10 11 11 11 12 13
1 2 3 4 5 6 7 8 9 10 11
FOR FREQUENCY DISTRIBUTION
N/2 - Cf
M = L + × i
F
L = lower limit of the median class
Cf = cum. freq. of the class preceding the median class
f = simple freq. of the median class
i = width of the class interval of the median class
Note: As a positional average does not involve values of all items and
useful only in qualitative phenomenon
Research Methodology 8 M S Sridhar, ISRO 64
Median
The median in the layman language is divider like the ‘divider’ on the
road that divides the road into two halves
A positional value of the variable which divides the distribution into
two equal parts, i.e., the median of a set of observations is a
value that divides the set of observations into two halves so that
one half of observations are less than or equal to the median
value and the other half are greater than or equal to the median
value
Extreme items do not affect median, i.e., median is a useful
measure as it is not unduly affected by extreme values and is
specially useful in open ended frequencies
For discrete data, mean and median do not change if all the
measurements are multiplied by the same positive number and
the result divided later by the same constant
As a positional average, median does not involve values of all items
and it is more useful in qualitative phenomenon
The median is always between the arithmetic mean and the mode
Research Methodology 8 M S Sridhar, ISRO 65
Median of grouped or interval data
M = L + W/F (i)
Where, W = [n/2] – Cf (No. of observations to be added
to the cumulative total in the previous class in order to
reach the middle observation in the array)
L = Lower limit of the median class (the array in which
middle observation lies)
Cf = Cumulative frequency of the class preceding the
median class
i = Width of the class interval of the median class
(the class in which the middle observation of the array
lies)
F = Frequency distribution of the median class
Research Methodology 8 M S Sridhar, ISRO 66
Calculation of the median (M) from a frequency distribution of grouped or interval
data of price (in Rs.) of popular science books in Kannada (Table 8.12) is given
below
Price (in Rs.) Frequency (f) Cumulative less than or equal
(class) (No. of books) frequency)
1 -- 10 2 2
11 -- 20 8 10
21 -- 30 15 25
31 -- 40 13 38
41 -- 50 3 41
51 -- 60 3 44
61 -- 70 2 46
71 -- 80 1 47
81 -- 90 0 47
91 -- 100 1 48
¾ 100 2 50
Total 50
L = 21 ; Cf = 10 ; I = 10 ; F = 15 ;
W = [n/2] – Cf = [50/2] – 10 = 15
M = L + W/F (i) = 21 + 15/15 (10) = 31
Note: Compare answer with median calculated as discrete data in
Table 8.12
Research Methodology 8 M S Sridhar, ISRO 67
Median for grouped or interval data
TABLE : Calculation of the median (x). data represent weights of 265 male
freshman studies at the university of Washington
Class – Interval Cumulative [ w = N/2 – Cf ]
(Weight) ƒ ƒ “Less than”
90 - 99 …….. 1 1
100 - 109 ……… 1 2 X = / + (W/f) ( i )
110 - 119 ……… 9 11
120 - 129 ……… 30 41 132.5 - 83
130 – 139 ………. 42 83 = 140 + -------------------- (10)
140 – 149 ………. 66 149 66
150 – 159 ……… 47 196 49.5
160 – 169 ……… 39 235 = 140 + --------- (10)
170 – 179 ……… 15 250 66
180 – 189 ……… 11 261 = 140 + (.750) (10)
190 – 199 ……… 1 262 = 140 + ( .750) (10)
200 – 209 ……… 3 265 = 140 + 7.50
N = 265 = 147.5
N /2 = 265/2 = 132.5
Research Methodology 8 M S Sridhar, ISRO 68
Univariate Measures: A. Central Tendency 3. Mode
MODE is the most commonly or frequently occurring value in a
series
EX. : 4 6 7 8 9 10 11 11 11 12 13
--------------
^
For Frequency Distribution
Δ1 f2
Z = L + ----------- X i OR L + --------- X i
Δ1 Δ2 f2 + f1
L = Looser limit of the modal class.
Δ1 = Difference in Freq. Between the modal class and the
preceding class.
Δ2 = Difference in Freq. Between the modal class and the
succeeding class.
i = Width of the class interval of the modal class.
f1 = Freq. of the class preceding the modal class.
f2 = Freq. of the class succeeding the modal class.
Research Methodology 8 M S Sridhar, ISRO 69
Mode
Mode is the most commonly or frequently occurring value/ observed
data or the most typical value of a series or the value around which
maximum concentration of items occur. In other words, the mode of
a categorical or a discrete numerical variable is that value of the
variable which occurs maximum number of times
The mode is not affected by extreme values in the data and can easily
be obtained from an ordered set of data
The mode does not necessarily describe the ‘most’ ( for example, more
than 50 %) of the cases
Like median, mode is also a positional average and is not affected by
values of extreme items. Hence mode is useful in eliminating the
effect of extreme variations and to study popular (highest occurring)
case (used in qualitative data)
The mode is usually a good indicator of the centre of the data only if
there is one dominating frequency. However, it does not give relative
importance and not amenable for algebraic treatment (like median)
Median lies between mean & mode.
For normal distribution, mean, median and mode are equal (one and
the same)
Research Methodology 8 M S Sridhar, ISRO 70
Mode for grouped or interval data
For frequency distribution with grouped (or interval) quantitative data
, the model class is the class interval with the highest frequency.
This is more useful when we measure a continuous variable which
results in every observed value having different frequency. Modal
class in Table 8.3 is age group 21-30. Please note that since the
notion of the location or central tendency requires order mode is
not meaningful for nominal data.
Δ2 f2
Z = L + -------- (i) OR L + --------- (i)
Δ2 + Δ1 f2 + f1
Where,
L = Lower limit of the modal class
Δ1 = Difference in frequency between the modal class and the
preceding class
Δ2 = Difference in frequency between the modal class and the
succeeding class
i = Width of the class interval of the modal class
f1 = Frequency of the class preceding the modal class
f2 = Frequency of the class succeeding the modal class
Research Methodology 8 M S Sridhar, ISRO 71
Price (in Rs.)
(class) Frequency (f) (No. of books)
Cumulative less than
or equal frequency (cf)
1 -- 10 2 2
11 -- 20 8 10
21 -- 30 15 25
31 -- 40 13 38
41 -- 50 4 42
51 -- 60 4 46
61 -- 70 2 48
71 -- 80 1 49
81 -- 90 0 50
Total 50
Calculation of the mode (Z ) from a frequency distribution of grouped or
interval data of price (in Rs.) of popular science books in Kannada (Table
8.12) is shown below:
L = 41 ; i = 10 ; f1 = 13 ; f2 = 4
Z = L + [f1 / f1 + f2] (i) OR L + [Δ2 / Δ1 + Δ2] (i)
Z = 41 + [13 / 13 + 4] (10) = 48.65
The value 48.65 lies in the class 41-50 and hence the modal class is 41-50 in
the grouped data.
Note: Compare answer with mode calculated as discrete data in Table 8.12
Research Methodology 8 M S Sridhar, ISRO 72
Table: Calculation of the mode (X).
Data represent weights of 265 freshman students at the university of Washington
Class –Interval (Weight) ƒ
90 - 99 . . . . . . . . . . . . 1 ƒ2
100-109 . . . . . . . . . . .. . 1 X = l + ---------- (i)
110 -119 . . . . . . . . . . . . 9 ƒ1 + ƒ2
120 -129. . . . . . . . . . . .. .. 30
130 -139. . . . . . . . . . . . .. 42 47
140 -149. . . . . . . . . . . .. . 66 = 140 + ----------- (10)
150-159 . . . . . . . . . . . . . 47 47 + 42
160-169 . . . . . . . . . . . .. . 39
170-179 . . . . . . . . . . . .. . 15 = 140 + 47/89 (10)
180-189 . . . . . . . . . . . . . 11 = 140 + 5.3
190-199 . . . . . . . . . . . . . 1 = 145.3
200-209. . . . . . . . . . . . . . 3
Z = L + Δ1 / Δ1 Δ2 Χi = 140 + 24/43 Χ 10 = 145 . 5
Mode for grouped or interval data
Research Methodology 8 M S Sridhar, ISRO 73
Univariate Measures: A. Central Tendancy
4. GM & 5. HM
Harmonic Mean : 1. Has limited application as it gives largest weight to the
smallest item and smallest weight to the largest item 2. Used in cases where
time and rate are involved (ex: time and motion study)
Note: 1. Median and mode could also be used in qualitative data 2. Median lies
between mean & mode 3. For normal distribution mean= median =mode
4. Geometric Mean
nth Root of the product of the
values of n items
G.M. = n ∏ xi X n x1 .x2 ….xn
Ex. 4 6 9 GM = 3 4 x 6 x 9 = 6
NOTE : 1. Log is used to simplify
2. GM is used in the preparation of
indexes (I.e., determining Average
Percent of change) and dealing with
ratios
5. Harmonic Mean
Reciprocal of the average of
reciprocals of the values of items
in series
n Σ fi
H M = -------------------------- = -------
1/x1 + 1/x2 + …fi/ xn Σ fi/xi
for frequency distribution
Ex. : 4 5 10
3
HM = ----------------- = 60/1 = 5.45
1/4 +1/5 + 1/10
Research Methodology 8 M S Sridhar, ISRO 74
Univariate Measures: B. Dispersion
Central tendency measures do not reveal the variability present
in the data. To understand the data better, we need to know
the spread of the values and quantify the variability of the
data.
Dispersion is the scatter of the values of items in the series
around the true value of average. Dispersion is the extent to
which values in a distribution differ from the average of the
distribution.
1. Range: The difference between the values of the extreme
items of a series, I.e., difference between the smallest and
largest observations
Example: 4 6 7 8 9 10 11 11 11 12 13
Range = 13 - 4 = 9
• Simplest and most crude measure of dispersion
• As it is not based on all the values, it is greatly/ unduly affected by the
two extreme values and fluctuations of sampling. The range may
increase with the size of the set of observations though it can decrease
• Gives an idea of the variability very quickly
Research Methodology 8 M S Sridhar, ISRO 75
Univariate Measures: B. Dispersion
2. Mean Deviation : The average of difference of the values of items
from some average of the series (ignoring negative sign), I.e. the
arithmetic mean of the differences of the values from their average
Note: 1. MD is based on all values and hence cannot be calculated for open-ended
distributions. It uses average but ignores signs and hence appears unmethodical.
2. MD is calculated from mean as well as from median for both ungrouped data
using direct method and for continuous distribution using assumed mean method
and short-cut-method
3. The average used is either the arithmetic mean or median
_
Σ | xi – x |
δx = -------------
n
Example: 4 6 7 8 9 10 11 11 11 12 13
14 – 9.271 + 16-9.271+………+113 – 9.271 24.73
δx = ----------------------------------------------------- = ----------- = 2.25
11 11
Coefficient of mean deviation: Mean deviation divided by the average. It is a
relative measure of dispersion and is comparable to similar measure of other series,
i.e., Coeff. of MD = δx / x (Ex: 2.25/9.27 = 0.24) . M.D. & its coefficient are used to
judge the variability and they are better measure than range
_
For grouped data Σ fi | xi – x |
δx = -------------
n
Research Methodology 8 M S Sridhar, ISRO 76
Univariate Measures: B. Dispersion
3. Standard Deviation: The square root of the average of squares of
deviations (based on mean), I.e., the positive square root of the
mean of squared deviation from mean
Σ (xi – x )2 Σ fi (xi – x)2
σ = ------------------ For grouped data σ = --------------
√ n √ Σ fi
Example: 4 6 7 8 9 10 11 12 13
(4-9.27)2 + (6-9.27)2 +……+ (13 –9.27)2
σ = --------------------------------------------------------- = 2.64
√ 11
Coefficient of S D is S D divided by mean.
Example: 2.64 / 9.27 = 0.28
Variance : Square of S D i.e., VAR = Σ (xi – x)2 / n
Example: (2.64)2 = 6.97
Coefficient of variation is Coefficient of SD multiplied by 100
Example : 0.28 x 100 = 28
Note: Coefficient of SD is a relative measure and is often used for
comparing with similar measure of other series
Research Methodology 8 M S Sridhar, ISRO 77
Univariate Measures: B. Dispersion 3. Standard Deviation
¾ SD is very satisfactory and most widely used measure of dispersion
¾ amenable for mathematical manipulation
¾ it is independent of origin, but not of scale
¾ If SD is small, there is a high probability for getting a value close
to the mean and if it is large, the value is father away from the
mean
¾ does not ignore the algebraic signs and it is less affected by
fluctuations of sampling
¾ SD is calculated using (i) Actual mean method , (ii) Assumed mean
method (iii) Direct method (iv) Step deviation method
For frequency of grouped or interval data
σ = √ [∑ fi (x i – ⎯x)2 / ∑ f i ]
Indirect method uses assumed average formula
σ = {√ [(∑ƒd2 / n) - (∑ƒd )2) / n2] } Where, d = Distance of class
from the assumed average class n = ∑ fi ,
i.e., σ =√ fi (xi – A)2 /Σ fi - Σ fi (xi – A)2 /Σ fi
For discrete data assumed average formula is
σ = √ Σ (xi – A)2 / n - Σ (xi – A) 2 / n
Research Methodology 8 M S Sridhar, ISRO 78
Price (in
Rs.) (class)
Frequency
(f) (No. of
books)
Cumulative less
than or equal
frequency (cf)
Distance of class
from the assumed
average class (d) fd d2 fd2
1 -- 10 2 2 -4 -8 16 32
11 -- 20 8 10 -3 -24 9 72
21 -- 30 15 25 -2 -30 4 60
31 -- 40 13 38 -1 -13 1 13
41 -- 50 4 42 0 0 0 0
51 -- 60 4 46 1 4 1 0
61 -- 70 2 48 2 8 4 8
71 -- 80 1 49 3 3 9 9
81 -- 90 0 50 4 4 16 0
Total 50 -56 194
Calculation of the SD (σ) from a frequency distribution of grouped or interval
data of price (in Rs.) of popular science books in Kannada (Table 8.12) using
assumed average method is shown below:
n = 50 ∑ƒd = - 56 ∑ ƒd2 = 194 i = 10
σ = {√ [(∑ƒd2 / n) - (∑ ƒd )2) / n2] } (i) = {√ [(194 / 50) - (-56)2) / 502] } (10)
= {√ [(3.88) - (1.2544)] } (10)
= {√ 2.6256 } (10)
= {1.6204} (10) = 16.204
Research Methodology 8 M S Sridhar, ISRO 79
TABLE: Calculation of the standard deviation (σ)
Data represent weights of 265 male freshman students at the university of
Washington
Class –Interval (Weight) ƒ d ƒd ƒde
90 - 9 . . . . . . . 1 -5 -5 25 Σ ƒd2 Σ ƒd 2
100 -109 . . . . . . . 1 -4 -4 16 σ = ------- - ------- (i)
110 - 119 . . . . . . . 9 -3 -27 81 √ N N
120 – 129 . . . . . . . 30 -2 -60 120
130 - 139 . . . . . . . 42 -1 -42 42 931 99 2
140 - 149. . . . . . . 66 0 0 0 = ------ - ….. (10)
150 - 159 . . . . . . . 47 1 47 47 √ 265 265
160 - 169 . . . . . . . 39 2 78 156
170 - 179 . . . . . . . 15 3 45 135 = ( √ 3.5132 - .1396 )(10)
180 - 189 . . . . . . . 11 4 44 176 = (1.8367) (10)
190 - 199 . . . . . . . 1 5 5 25 = 18.37 or 18.4
200 - 209. . . . . . . 3 6 18 108
N= 265 Σƒd = 99 Σ ƒd2 = 931 D = (Xi – A) N = Σ fi
SD for grouped data (indirect method using assumed average)
Research Methodology 8 M S Sridhar, ISRO 80
TABLE : Means, standard deviation, and coefficients of
variation of the age distributions of four groups of mothers who
gave birth to one or more children in the city of minneapolis:
1931 to 1935.
CLASSIFICATION X σ C V
Resident married………... 28.2 6.0 21.3
Non-resident married…… 29.5 6.0 20.3
Resident unmarried……... 23.4 5.8 24.8
Non-resident unmarried… 21.7 3.7 17.1
SD for grouped data
(indirect method using assumed average) …contd.
Research Methodology 8 M S Sridhar, ISRO 81
Absolute and relative measures of dispersion
The absolute measures give the answers in the units in which
original values are expressed. They may give misleading ideas
about the extent of variation especially when the averages differ
significantly
The relative measures (usually expressed in percentages) overcome
the above drawbacks. Some of them are:
i) Coefficient of range = (L – S) / (L + S) (L is largest value and S is
smallest value)
ii) Coefficient quartile deviation
iii) Coefficient of MD
iv) Co-efficient of variation
Note: Relative measures are free from the units in which the values
have been expressed. They can be compared even across
different groups having different units of measurement
Lorenz curve is a graphical measure of dispersion. It uses the
information expressed in a cumulative manner to indicate the
degree of variability. It is specially useful in comparing the
variability of two or more distributions
Research Methodology 8 M S Sridhar, ISRO 82
Univariate Measures: B. Dispersion 4. Quartiles
There are some positional measures of non-central location where it
is necessary to divide the data into equal parts. They are quartiles,
deciles and percentiles (The quartiles & the median divide the array
into four equal parts, deciles into ten equal groups, and percentiles
into one hundred equal groups)
Quartiles : Measures dispersion when median is used as average
Lower quartile: Value in the array below which there are one quarter
of the observations
Upper quartile: Value in the array below which there are three
quarters of the observations
Interquartile range: Difference between the quartiles
Interquartile range can be called a positional measure of variability
While range is overly sensitive to the number of observations, the
interquartile range can either decrease or increase when further
observations are added to the sample
Useful as a measure of dispersion to study special collections of
data like salaries of employees
Example: 4 6 7 8 9 10 11 11 12 13
Lower quartile is 7; Upper quartile is 11; Interquartile range is 4
Research Methodology 8 M S Sridhar, ISRO 83
Normal Distribution
To understand skewness (asymmetry), testing of hypotheses (Part 9)
and interpretation of data (part 10) it is necessary to know about
normal distribution.
The normal frequency distribution is developed from frequency
histogram with large sample size and small cell intervals. The
normal curve being a perfect symmetrical curve (symmetrical about
µ), the mean, median and the mode of the distribution are one and
the same (µ = M = Z). The curve is uni-modal and bell-shaped and
the data values concentrate around the mean. The sampling
distributions based on a parent normal distributions are
manageable analytically.
The normal curve is not just one curve but a family of curves which
differ only with regard to the values of μ and σ , but have the same
characteristics in all other respects.
Height is maximum at the mean value and declines as we go in either
direction from the mean and tails extend indefinitely on both sides.
The first and the third quartiles are equidistant from the mean. The
height is given by an equation
1
Y = --------------- e -1/2(X-µ/σ)2
√σ2π
Research Methodology 8 M S Sridhar, ISRO 84
Normal Distribution …contd.
It is a special continuous distribution. Great many techniques used in
applied statistics are based on this. Many populations encountered
in the course of research in many fields seems to have a normal
distribution to a good degree of approximation (I.o.w., nearly
normal distributions are encountered quite frequently). Sampling
distributions based on a parent normal distributions are
manageable analytically
Definition: The random variable x is said to be normally distributed if
density function is given by
F(x) OR n (x) = 1 e- (x - μ )2 / 2 σ2
√2Πσ
Where ∞⁄ ∞ n (x) dx = 1 and - ∞ < x < ∞
(Since n(x) is given to be a density function, it implied that n(x) dx = 1)
When the function is plotted for several values of σ (standard
deviation) , a bell shaped curve as shown below can be seen.
Changing µ (mean) merely shifts the curves to the right or left without
changing their shapes. The function given actually represents a two-
parameter family of distributions, the parameters being µ and σ2
(mean and variance)
Research Methodology 8 M S Sridhar, ISRO 85
Normal Distribution …contd.
The experimenter musts know, at least approximately, the general
form of the distribution function which his data follow. If it is
normal, he may use the methods directly; if it is not, he may
transform his data so that the transformed observations follow a
normal distribution. When experimenter does not know the form
of his population distribution, then he must use other more general
but usually less powerful methods of analysis called non-
parametric methods
An important property of normal distribution for researchers is that if
x follows normal distribution and the area under the normal curve
is taken as 1, then, the probability that x is within
1 Standard deviation of the mean is 68%
2 “ 95%
3 “ 97.7%
Research Methodology 8 M S Sridhar, ISRO 86
Normal Curves
Research Methodology 8 M S Sridhar, ISRO 87
Z-score or standardised normal deviation
The area under the normal curve bounded by the class interval for any
given class represents the relative frequency of that class. The area
under the curve lying between any two vertical lines at points A and B
along the X-axis represents the probability that the random variable x
takes on value in that interval bounded by A and B. By finding the area
under the curve between any two points along the X-axis we can find the
percentage of data occurring within these two points.
The computed value Z is also known as the Z-score or standardised
normal deviation. Actually, the value of Z follows a normal probability
distribution with a mean of zero and standard deviation of one. This
probability distribution is known as the standard normal probability
distribution. This allows us to use only one table of areas for all types of
normal distributions.
The standard table of Z scores gives the areas under the curve between
the standardised mean zero and the points to the right of the mean for all
points that are at a distance from the mean in multiples of 0.01σ. It
should be noted that only the areas are to be subtracted or added. Do
not add or subtract the Z scores and then find the area for the resulting
value.
Research Methodology 8 M S Sridhar, ISRO 88
Z-score or standardised normal deviation …contd.
TABLE – Normal Distribution
Z Prob. Z Prob. Z Prob.
3.0 .999 0.8 .788 -1.4 .081
2.8 .997 0.6 .726 -1.6 .055
2.6 .995 0.4 .655 -1.8 .036
2.4 .992 0.2 .579 -2.0 .023
2.2 .986 0.0 .500 -2.2 .014
2.0 .977 -.2 .421 -2.4 .008
1.8 .964 -.4 .345 -2.6 .005
1.6 .945 -.6 .274 -2.8 .003
1.4 .919 -.8 .212 -3.0 .001
1.2 .885 -1 .159
1.0 .841 -1.2 .115
The Standardised normal often used is obtained by assuming mean as
zero (µ = 0) and SD as one (σ = 1). Then,
x scale µ-3σ µ -2σ µ-σ µ µ+σ µ+2σ µ +3σ
z scale -3 -2 -1 0 +1 +2 +3
z = (xi - µ) / σ
Research Methodology 8 M S Sridhar, ISRO 89
Univariate Measures: C. Measure of Asymmetry (Skewness)
Example: 4 6 7 8 9 10 11 11 11 12 13
Skewness = 9.27-11 = -1.73 or 9.27-10 = -0.73
j = -1.73/2.64 = -0.66 or 9-0.73) X 3 / 2.64 = - 0.83 Hence negatively skewed.
Check the following for positive skewness 7, 8, 8, 9, 9, 10, 12, 14, 15, 16, 18
Normal Distribution of items in a series is perfectly symmetrical. Curve drawn from
normal distribution which is bell shaped, shows no asymmetry (skewness), i.e., X =
M = Z for a normal curve.
Asymmetrical distribution which has skewness to the right, i.e., curve distorted on
the right is positive skewness (Z> M> X ) and the curve distorted to the left is
negative skewness (Z > M> X) (see figure)
Skewness: The difference between the mean, median or mode, i.e., Skewness =
X – Z OR X – M
Coeff. of skewness (J) = X – Z / σ OR 3 ( X – M ) / σ
Skewness shows the manner in which the items are clustered around the average;
Useful in the study of formation of series and gives idea about the shape of the
curve
Kurtosis is a measure of flat-topped ness of a curve i.e, humped ness Indicates
the nature of distribution of items in the middle of a series(Mesokurtic: Kurtic in
the centre, i.e. normal curve, Leptokurtic:More peaked than the normal curve,
Platykurtic: More flat than the normal curve)
Research Methodology 8 M S Sridhar, ISRO 90
Normal Curve and Skewness
Research Methodology 8 M S Sridhar, ISRO 91
Relationship Between Measures of Variability (M D, S D and
Semi-interquartile Range)
Research Methodology 8 M S Sridhar, ISRO 92
Summary of Examples
Summary of examples:
4 6 7 8 9 10 11 11 11 12 13
Univariate Measures:
A. Central Tendency
1. Mean 9.27
2. Median M 10
3. Mode Z 11
4. G.M.
5. H.M
B. Dispersion
1. Range 9
2. Mean deviation 2.25
3. Coefficient of MD 0.24
4. Standard deviation2.64
5. Coefficient of SD 0.28
6. Coefficient of variation 28
7. Variance 6.97
8. Lower quartile 7
9. Upper quartile 11
10. Inter quartile range 4
C. Asymmetry
1. Skewness
w.r.t. Mode 1.73
w.r.t. Median 0.73
2. Coefficient of Skewness
w.r.t. Mode 0.66
w.r.t. .Median 0.8
Home work:
7, 8, 8, 9, 9, 10, 12, 14, 15, 16, 18
Research Methodology 8 M S Sridhar, ISRO 93
Bivariate & Multivariate Measures
A. Relationship
¾To find relation of 2 or more
variables
¾If related, directly or inversely &
degree of relation
¾Is it cause and effect relationship ?
¾If so, degree and direction
1. Association (Attributes)
(I) Cross tabulation
(ii) Yule’s co-efficient of association
(iii) Chi- square test
(iv) Co-efficient of mean square
contingency
2. Correlation (Quantitative)
(I) Spearman’s (Rank) coefficient of
correlation (ordinal)
(ii) Pearson’s coefficient of
correlation
(iii) Cross tabulation and scatter
diagram
3. Cause and Effect (Quantitative)
(I) Simple (linear) & regression
(ii) Multiple (complex
correlation & regression
(iii) Partial correlation
B. Other Measures /
Techniques
1. Index number
2. Time series analysis
3. Anova
4. Anocova
5. Discriminant analysis
6. Factor analysis
7. Cluster analysis
8. Model building
Research Methodology 8 M S Sridhar, ISRO 94
Measure
Measure
1. Pearson product
1. Pearson product
moment
moment
2. Rank order or
2. Rank order or
Kendall’s
Kendall’s tau
tau
3. Correlation ratio,
3. Correlation ratio,
(
(eta
eta)
)
4.
4. Intraclass
Intraclass
5.
5. Biserial
Biserial,
,
Point
Point biserial
biserial
6. Phi coefficient
6. Phi coefficient
7. Partial Correlation
7. Partial Correlation
Nature of Variables
Nature of Variables
Two continuous variables; interval or
ratio scale
Two continuous variables; ordinal
scale
One variable continuous, other
either continuous or discrete
One variable continuous, other
discrete; interval or ratio scale
One variable continuous, other
a) Continuous but dichotomised, or b)
true dichotomy
Two true dichotomises; nominal or
ordinal series
Three or more continuous variables
Comment
Comment
Relationship linear
Relationship nonlinear
Purpose: to determine within-
group similarity
Index of item discrimination
(used in item analysis)
Purpose: to determine
relationship between two
variables, with effect of the held
constant
8. Multiple
8. Multiple
correlation
correlation
9.Kendall’s
9.Kendall’s
coefficient of
coefficient of
concordance
concordance
Three or more continuous variables
Three or more continuous variables.;
ordinal series
Purpose: to predict one variable
from a linear weighted
combination of two or more
independent variables
Purpose; to determine the
degree of (say, interrater)
agreement
Common Measures of Relationship
Research Methodology 8 M S Sridhar, ISRO 95
Measures / Tests of Association
1. Cross Tabulation
9Useful in finding relationship in
nominal data
9But not a powerful form of measure /
test
9Classify each variable into two or
more categories
9Begin with a two-way table to see
whether there is interrelationship
between variables
9Then cross classify the variables in
subcategories to look for interaction
between them
(I) Symmetrical relationship: Two
variables vary together, but neither is
due to the other (assumed)
(ii) Reciprocal relationship: Two variables
mutually influence or reinforce each
other
(iii) Asymmetrical relationship: If one
(individual) variable is responsible for
change in the other (dependent) variable
9 Attempt can also be made
to see / find the conditional
relationships by
introducing the third factor
and cross-classifying the
three variables. Ie. To see
whether X affects Y only
when Z is held constant
9 Cross tabulate a
dependent variable (of
importance) to one or
more independent variable
9 Show the percentages in
the cells of cross
tabulation
9 Look for valid (not
spurious) explanations
9 Ask whether differences
are statistically
significant?
Research Methodology 8 M S Sridhar, ISRO 96
Example: Given below is the data regarding reference queries received
by a library. Is there a significant association between gender of user and
type of query ?
L R S R Total
query query
Male users 17 18 35
Female users 3 12 15
Total 20 30 50
Expected frequencies are worked out like E11 = 20X35 / 50 = 14
Expected frequencies are:
L S Total
M 14 21 35
W 6 9 15
Total 20 30 50
Cells Oij Eij (Oij - Eij) (Oij - Eij )2 / Eij
1,1 17 14 3 9/14 = 0.64
1,2 18 21 -3 9/21 = 0.43
2,1 3 6 -3 9/6 = 1.50
2,2 12 9 3 9/9 = 1.00
Total (∑) χ2 = 3.57 df = (C-1) (r-1) = (2-1) (2-1) = 1
Table value of χ2 for 1 df at 5 % significance is 3.841. Hence association is
not significant.
Research Methodology 8 M S Sridhar, ISRO 97
2. Association : Yule’s Coefficient of association
QAB = (AB) (ab) – (AB) (aB)
(AB) (ab) + (AB) (aB)
(AB) = Freq. of class AB in which aA and B are present.
(Ab) = Freq. of class Ab in which aA is present but B is absent
QAB takes values between + 1 and –1 indicates degree of association.
IF (AB) > (A) (B) expected Freq. Then AB are positively associated.
N
IF (AB) < (A) (B) expected Freq. Then A& B are independent.
N I.e., QAB = 0
IMMUNITY Ex :
PRESENT ABSENT
PRESENT
A. INOCULATION
ABSENT Total
5 X 4 - 2 X 1 18
QAB = ---------------------- = -------- = 0 . 82
5 X 4 + 2 X 1 22
(A) (B) 7 x 6
(AB) = 5 > -------------- = ---------- = 3.5
N 12
(AB) (Ab)
(ab) (ab)
5 2 7
1 4 5
6 6 12
The association of A and B
in the population may be
due to attribute C. In such
a case partial association
(as against total
association) between A
and B is determined by
Qabc = (ABC) (abC) –
(ABC) (aBc) / (ABC) (abC)
– (ABC) (aBc)
Illusory Association : there
is no real association
between A & B but both
are associated with third
attribute. Reasons : (i) A
and B are not properly
defined (ii) A and B are
not properly / correctly
recorded
Research Methodology 8 M S Sridhar, ISRO 98
Attribute A
A1 A2 A3 A4
B1 (A1B1) (A2B1) (A3B1) (A4B1) (B1)
B2 (A1B2) (A2B2) (A3B2) (A4B2) (B2)
B3 (A1B3) (A2B3) (A3B3) (A4B3) (B3)
B4 (A1B4) (A2B4) (A3B4) (A4B4) (B4)
Total (A1) (A2) (A3) (A4) N
Total
Attribute
B
N
(a)
(A)
Total
(B)
(b)
(aB)
(a b)
(AB)
(A b)
B
b
a
A
Total
Attribute
Attribu
te
Reduced to 2x2 Table
Note: Larger than 2X2 tables have to be reduced to 2X2 by combining
some classes to use this method
4 X 4 Contingency Table
2. Association : Yule’s Coefficient of association contd.
Research Methodology 8 M S Sridhar, ISRO 99
Yule’s Coefficient of Association contd.
Example 1 : The number of
books issued on random
sample of days in 2005 and
2006 are as follows
2005 2006
36 37 34 78
28 97 89 89
32 37 22 34
39 33 44 22
27 114 49 33
114 35 33 17
Example 2 : Data on the number of
books issued from a library during
the course of a week (both actual and
expected)
Day Actual Expected
Mon 39 42.17
Tue 14 42.17
Wed 21 42.17
Thu 47 42.17
Fri 36 42.17
Sat 96 42.17
Total 253 253(.02)
Research Methodology 8 M S Sridhar, ISRO 100
Yule’s Coefficient of Association contd.
Example 3 : In 1984 - 5, a library authority spent Rs.550 000 on books and
Rs.140 000 on other items. In 1987- 8, the authority spent Rs.810 000 on
staff, Rs.330 000 on books and Rs.210 000 on other items. Did the
pattern of expenditure change significantly between 1984-5 and 1987-8 ?
The observed data can be compiled into a contingency table as shown :
Contingency table of observed frequencies
Expenditure (‘000s)
Year Staff Books Other Total
1984-5 550 230 140 920
1987-8 810 330 210 1350
Total 1360 560 350 2270
A table of expected frequencies can be deduced as shown :
Contingency table of expected frequencies
Expenditure (‘000s)
Year Staff Books Other Total
1984-5 551.19 226.96 141.85 920
1987-8 508.81 333.04 208.15 1350
Total 1360 560 350 2270
Research Methodology 8 M S Sridhar, ISRO 101
2. Correlation : i. Cross Tabulation, Correlation Table &
Scatter Diagram
Frequency of use of a
number of documents of
different ages
Doc. Age of Frequency
No. Doc. of use
(years) (times /year)
1 1 40
2 3 18
3 2 30
4 4 21
5 3 26
6 5 10
7 4 13
8 3 35
Correlation table of age and
frequency of use of documents
Freq. of use Age of doc.(yrs) Total
(times / year) 1 2 3 4 5
1-10 1 1
11-20 1 1 2
21-30 1 1 1 3
31-40 1 1 2
Total 1 1 3 2 1 8
Monthly totals of books,
Journals and Reports issued
from a library
Month Reps Bks Jls Total
Jan 465 3216 713 4394
Feb 513 3215 686 4414
Mar 425 3126 996 4547
Research Methodology 8 M S Sridhar, ISRO 102
2. Correlation : i. Cross Tabulation, Correlation Table &
Scatter Diagram contd.
Research Methodology 8 M S Sridhar, ISRO 103
Correlation Scatter Diagrams
Research Methodology 8 M S Sridhar, ISRO 104
ii. Spearman’s Coefficient of (Rank Order) Correlation
Only between two variables which are ordinal in nature; helps to decide whether two
sets of ranks differ and to the extent they offer
6Σ di
2 di = O. b. betn ranks of 6th pair of the two variables
rs = 1 - -------------- n = No.of pairs of observations
n (n2 – 1)
Example:
Boys and girls were questioned about their reading interest and asked to put various
types of novel into their order of performance, with the following results:
Type of novel Rank orders di di
2
Boys Girls
Animal stories 4 2 2 4
Historical novels 3 3 0 0
Romances 5 1 4 16
War stories 1 5 -4 16
Westerns 2 4 -2 4
Σ di
2 40
6 x 40
rs = l - ------------- = 1 - 2 = -1
5 x 24 (that means perfect negative correlation)
rs varies from +1 to -1
rs = 0 indicates that two sets of rankings are dissimilar / independent
Research Methodology 8 M S Sridhar, ISRO 105
ii. Spearman’s Coefficient of (Rank Order) Correlation contd.
Homework: Given below is the mean scores on a 5 point scale about the
nature & type of information required by a group of physicists and
another group of mechanical engineers. Find the correlation of their
rankings ? (carryout t-test for 5% significance level)
Physicists Mech. Engrs.
A. State of the art 2.60 1.17
B. Theoretical background 2.98 2.71
C. Experimental results 2.67 2.34
D. Methods, processes & procedures 2.62 2.07
E. Product, material & equipment information 2.45 2.23
F. Computer programs & model building info. 2.00 0.85
G. Standard & patent spec. 0.93 2.15
H. Physical, technical & design data 3.05 2.65
I. S & T news 2.29 2.53
J. General information 1.21 0.92
Research Methodology 8 M S Sridhar, ISRO 106
∑ (Xi—X) (Yi—Y)
r = ----------------------------
n --- x . y
∑ Xi Yi - n. X . Y
r = --------------------------------- ASSUMING ZERO AS MEAN
√ ∑ Xi
2 - n X2 √ ∑ Yi
2 – n Y2
∑ d xi .dyi - ∑ d xi . ∑ .dyi WITH ASSUMED AVERAGES
n n Ax and A y
r = --------------------------------------------- ∑ d xi = ∑ (Xi – A x)
∑ d xi
2 ∑ d xi
2 ∑ d yi
2 ∑ d yi
2 ∑ d yi = ∑ (Yi – A y)
√ n n n n ∑ d xi
2 = ∑ (Xi – A x)2
∑ d yi
2 = ∑ (Yi – A y)2
∑ d xi . Σ d yi
2 = ∑ (Xi – A x) (Yi – A y)
√
iii. Pearson’s (Product Moment) Coefficient of Correlation
(Simple Correlation)
Research Methodology 8 M S Sridhar, ISRO 107
Most widely used method and assumes (i) linear relationship (ii) variables are not
causally related (iii) a 2 distribution of observations of booth variables
∑ (Xi - X) (Yi -Y)
r = --------------------------------------------------
∑ ( Xi - X ) 2 . √ ∑ ( Yi
2 –Y ) 2
√
EXAMPLE : The following table gives the approximate number of abstracts (in thousand) in a
selection of volumes of abstracts together with the cost of each volume:
umber of abstracts (X) Cost (Y)
(thousands) (£)
36.7 115
8.5 52
12.5 75
3.9 31
0.5 9
1.3 12
4.1 20
19.4 56
4.3 24
91. 2 39. 4
X = 10.1 Y = 4.4
iii. Pearson’s (Product Moment) Coefficient of Correlation
(Simple Correlation)
Research Methodology 8 M S Sridhar, ISRO 108
iii. Pearson’s Coefficient of Correlation (Simple Correlation) Example
contd.
GIVEN BELOW ARE THE AVERAGE NO.OF YEARS OF EXPERIENCE (X) AND THE
AVERAGE NO.OF BOOKS BORROWED PER MONTH (Y) FIND THE PRODUCT
MOMENT CORRELATION CETWEEN THE TWO
X Y XY X2 Y2 105 – 5 X 3. 4 X 5. 24
1 2 2 1 4 r = ----------------------------------------
2 4 8 4 16 √ 71 – 5 (3.4)2 √ 158 – 5 (5. 2 )2
4 5 20 16 25
5 7 35 25 49 105 – 88.4 16 . 6
5 8 40 25 64 = ----------------- = --------------
Total ------- ------- ------- ------- -------- 3.63 X 4.77 17 . 315
17 26 105 71 158
= + 0 . 96
X = 17/5 = 3 . 4 Y = 26/5 = 5.2 ∑ X2 = 71 ∑ Y2 = 158
∑ XY = 105
NOTE : 1. Most widely used method
2. It assumes (i) linear relationship (ii) normal distribution (iii) variables
are causally related. but does not indicate a cause and effect relationship
(iv) it is neutral to chance in scale and origin
3. Value of r varies from +1 (perfect positve correlation) to –1 (perfect
negative correlation). zero indicates the absence of association /
relationship.
Research Methodology 8 M S Sridhar, ISRO 109
iii. Pearson’s Coefficient of Correlation For Grouped Data contd
FOR GROUPED DATA
∑ ∑ f ji ((Xi - X ) (Yf – Y)
R = --------------------------------------------
∑ fi ((Xi - X 2) ∑ ff (Yj - Y 2)
SHORT CUT APPROACH WITH ASSUMED MEANS
∑ f j dxi . dyi - ∑ fi dxi . ∑ fi dyi
n n n
R = -----------------------------------------------------------------------
∑ f j dxi
2 ∑ f j dx i
2 - ∑ fjdyj
2 - ∑ fjdy j
2
n n √ n n
HOMEWORK : following are the average no.of ref. queries asked (x) and the average no.of books
inoensed (y) during a study of library users for three months. find the correlation coefficient between
the two. carry out a t-test for 5% significance level.
X Y t-TEST FOR SIGNIFICANCE
5 4
6 3 n – 2 n - 2
1 2 t = r ----------- t = rs -------------
4 6 √ 1 – r2 √ 1 – rs
2
2 3 (ANS r = + 0.41)
EX:
t = (0.96) (5 – 2)/ 1 – (0 . 96)2 = 5. 939
Df = n – 1 = 5 – 1 = 4; Tabulated value of t for 4 d.f. for two tabled test out 0.5% significance level is
4.264 . Hence r is significant of 0.5% significance …contd
Research Methodology 8 M S Sridhar, ISRO 110
Prefrence for
violent TV in
the 3rd grade
Prefrence for
violent TV in
the 3rd grade
0.05
0.01
0.21 -0.05
0.38
0.31
Aggression in
the 3rd grade
Aggressio
n in the
13th grade
The Correlation Between a Preference for Violent Television and
Peer-reted Aggression for 211 Boys Over a 10-Year lag
Research Methodology 8 M S Sridhar, ISRO 111
3. Cause & Effect Relationships
• Visual inspection of correlational table and scatter diagram indicates existence
and direction of relation
• Correlation coefficient shows the magnitude as well as direction of relationship
• Regression analysis shows the cause and effect relationship, ie., independent
variable (X) is the cause and dependent variable (Y) is the effect
i. Regression Analysis (Simple / Liner)
¾ Describes in quantitative terms the underlying (cause & effect) relationship or
correlation between two sets of data (two variables)
¾ Helps predicting value of dependent variable for a given / known value of
independent variable regression equation of Y on X (simple / liner)
Y = a + bX
Y = Estimated value of Y for a give value of X(a and b are constants)
a = Parameter which tells at what value the straight line cuts the Y axis
b = Slope or grdient of the regression line, i.e., unit change in X produces a
change of b in Y
Note : 1. The relationship between X & Y may take any form but here it is
assumed to be linear ie., straight line
2. Practical data may fit near or closer to straight line
3. The objective is to fit a regression line with minimum error values (difference
between the observed values and expected values)
4. To find the ‘best’ fit the largest square method is used
Research Methodology 8 M S Sridhar, ISRO 112
3. Cause & Effect Relationships : I. Regression analysis contd
TO FIND THE ‘REST’ FIT THE LEAST SQUARE METHOD IS USED
Y Y Y
X
X X
X
X X X
X X
X
X X X
NO RELATIONSHIP ACTUAL VALUES BEST FIT
THE LEAST SQUARE METHOD PROVIDES TWO NORMAL EQNS. TO DETERMINE CONSTANTS a AND b
∑ Y = n a + b ∑ X
∑ XY = a ∑ X + b ∑ X2 TSS = ∑ (Y – Y ) 2
RSS = ∑ (Y – Y ) 2
THE BASIC EQN IS Y = a + b x +E ESS = ∑ (Y – ŷ ) 2
TSS = RSS + ESS
RSS / 1
F = --------------
ESS / n –2
EXAMPLE : Given below are the estimated use of a library (Y) for a corresponding expenditure on promotion and user
orientation (x). Fit best regression line. Estimate the use (I.e., predict) for an expenditure of Rs.8000 . If the library would
like to reach a level of use of 70,000 what should be the expenditure on promotion and user-orientation. …contd.
Research Methodology 8 M S Sridhar, ISRO 113
X Ŷ (Estimate) X Ŷ X2 Y (EXPECTED) ERROR
( In thousands (In ten thousands)
of Rs.
5 4 20 25 4 . 02 -0 . 02
6 3 18 36 4 . 32 -1 . 32
1 2 2 1 2 . 82 -0 . 82
4 6 24 16 3 . 72 +2 . 28
2 3 6 4 3 . 12 -0 . 12
Tot 18 18 70 82 18 0
∑ X ∑ Ŷ ∑ X Ŷ ∑ X2
NOTE: IF r IS CALCULATED ∑X , ∑ Ŷ, ∑X Y & ∑ X2 ARE READILY AVAILABLE
18 = 15a + 18 b ⇒ a = 2.52 ⇒ y = 2.52 + 0.3 x
70 = 18a + 82 b b = 0.3
(i) X = 8 (Rs 80000/-) ⇒ Y = 2.52 + 0.3 x 8 = 4.92 i.e Rs 49200/-
(ii) Y = 7 (70,000) ⇒ 7 = 2.52 + 0.3 x = x ⇒ 14.93 i.e Rs 14930/-
REGRESSION COEFFICIENT & COEFFICIENT OF DETERMINATION
CONSTANT b IS CALLED REGRESSION COEFFIENT
REGRESSION OF X ON Y IS X= α + βY
RSS
b β = r2 COEFFICIENT OF DETERMINATION r2 = --------
VARIES BTEWEEN ZERO AND + 1 TSS
AS r2 TEND CLOSER TO + 1 IND. VAR
EXPLAINS THE MOVEMENTS IN T HE DEP.VAR
r IS THE CORRELATION COEFFICIENT & VARIES FROM –1 TO +1
Homework :
No. of Age
Books
X Y
5 23
6 35
8 41
12 58
15 75
3. Cause & Effect Relationships : i. Regression Analysis contd..
Research Methodology 8 M S Sridhar, ISRO 114
INVOLVES TWO OR MORE INDEPENDANT VARIABLES
Ŷ = a + b1 x1 + b2 x2
NORMAL EQUATIONS ARE
∑ yi = n a + b1 ∑ x 1i + b1 ∑ x 2i
∑ x1i Yi = a ∑ x 1i + b1 ∑ x1i
2 + b2 ∑ x1i x2i
∑ x2i yi = a ∑ x2i + b1 ∑ x1i x2i + b2 ∑ x2i
2
PROBLEM OF MULTICOLLINEARITY
REGRESSION COEFFICIENTS b1 AND b2 BECOME LESS RELIABLE IF
THERE IS A HIGH DECREE OF CORRELATION BETWEEN IND. VAR. X1
AND X2 .THE COLLECTIVE EFFECT OF INO. VAR X1 AND X2 IS GIVEN BY
THE COEFFICIENT OF MULTIPLE CORRELATION
b1 ∑ xi x1i - n y x1 + b2 ∑ yi x2i - n y x2
Ry. X1 x2 = ---------------------------------------------------
√ ∑ Yi – n Y
x1i = (x1i – x1)
b1 ∑ x1i yi + b2 ∑ x2i yi x2i = (x2i – x2)
OR ------------------------------ y i = ( yi – y)
√ ∑ Yi2
ii. Multiple Correlations and (Non-Linear or Complex) Regressions
Research Methodology 8 M S Sridhar, ISRO 115
iii. Partial Correlation
(iii) PARTIAL CORRELATION measures, separately, the relationship betn
two variables (i.e. dep. and a particular ind. variables) by holding all other
variables constant
FIRST SIMPLE COEFFICIENTS OF CORRELATIONS BETN EACH PAIR OF
VARIABLES HAVE TO BE CALCULATED
FOR EXAMPLE, FIRST ORDER COEFFICIENT (OF PARTIAL
CORRELATION) MEASURING EFFECT OF X ON Y IS GIVEN BY
R2 y. x1x2 – r2 y x2
r yx1. x2 = -------------------------
1 - r2 yx1
ryx1 – ryx2 . rx1x2
OR ----------------------------
√ 1- r2 yx2 √ 1- r2 x1x2
Research Methodology 8 M S Sridhar, ISRO 116
4. Other Measures : A. Index numbers
A. Index numbers
Index number is a device to measure the magnitude of
(I) Change in the price, quantity or value of an item or more, usually a group
of items over time or
(ii) Difference between the two similarly measured quantities
Example
Total number of issues of volumes of non-fiction by a library in a number of
years
Year 1960 1961 1962 1963 1964
Number of issues 8094 9288 8416 9271 8233
i. Fixed Base Index
Simple indexes for issues of volumes of non-fiction by a library in a
number of years (1960 = 100)
Year 1960 1961 1962 1963 1964
Index 100 114.75 103.98 114.54 101.72
Research Methodology 8 M S Sridhar, ISRO 117
4. Other Measures A. Index numbers contd.
ii. Chain Base Index
Chain base index for issues of volumes of non-fiction by a library in a
number of years
Year 1961 1962 1963 1964
Index 114.75 90.61 110.16 88.80
the change or difference is expressed as a ratio or % of a stated base or
starting date, period or quantity which is given a value of 100 points
Value in given year
Fixed base index = --------------------------------- X 100
Value in base year
Value in given year
Chain base index = ----------------------------------- X 100
Value in previous year
An item in the index is given its due weight in accordance with its importance
in the whole index
Price in given year X Qty in base year
Base year weighted index = ---------------------------------------------------------- X 100
Price in base year X Qty in base year
Price in given year X Qty in given year
Given year weighted index = ------------------------------------------------------X 100
Price in base year x Qty in given year
Research Methodology 8 M S Sridhar, ISRO 118
4. Other Measures A. Index numbers contd.
• Index number is a special type of average used to measure the level of a given
phenomenon as compared to the level of the same phenomenon at some standard date
I.o.w reducing the figure to a common base (eg: converting the series into a series of
index numbers) to study the chances in the effect of such factors which are incapable of
being measured directly
• They are approximate indicators & give only a fair idea of changes.
• index numbers prepared for a purpose cannot be used for other purposes or same
purpose at other places. Cchances of error also remain in them.
Examples:
1. Library use index = 1/100 no. of pages of xerox copies of reading material taken during a
year + 2 times no. of documents borrowed through ILL + 5 times no. of visits to library
during 3months sample seat occupancy study + mean no. of documents borrowed during
the year (both circulation sample and collection sample)
2. Library interaction index = No. of documnts sugested + no. of documents indented + no.
of documents reserved + 2times no. of literature search service availed + no of short
range ref. Queries placed
YEAR 1 2 3 4 5
Chain base 100 103 107 110 115
Fixed base 100 100x103 103x107 110.2x110 121.2x115
100 100 100 100
100 =103 = 110.2 = 121.2 =139.4
Research Methodology 8 M S Sridhar, ISRO 119
4. Other Measures B. Time series analysis contd.
B. Time series analysis
Time series: Series of successive observations of a phenomenon over a
period of time
– When individual variable is time in a cause and effect relationship of
regression analysis type it is time series analysis
– It helps to estimate/ predict the future
Components of time series
1. Secular or long term trend (T)
2. Short term oscillations : (i) Cyclical variations(C) (usually more than a
year) (ii) Seasonal variations(S) (usually within a year)
3. Irregular or erratic variations (I) Random fluctuations & completely
unpredictable like riots, natural calamities, etc.
Research Methodology 8 M S Sridhar, ISRO 120
4. Other Measures B. Time series analysis contd.
Methods of isolating and
measuring trend
1. Free hand method
2. Semi-average method
3. Method of moving average
4. Method of least squares
Method of moving averages
¾ By smoothening out fluctuations, helps to detect the trend
¾ Choosing appropriate period, the method can also help to find out short term
variations (ie cyclical & seasonal) as well
¾ In addition, use of seasonal index helps to account for seasonal variations
¾ Moving average helps to reduce seasonal variations while finding trend
Example: 1. The following are daily
issues of junior non-fiction from a
library (public library)
Day Week 1 Week 2 Week 3
Mon 36 46 66
Tue 31 55 76
Wed 25 37 40
Thu 55 80 74
Fri 45 66 90
Sat 90 115 150
METHOD OF LEAST SQUARES
TAKING ‘t’ AS IND.VAR. THE EQN.FOR SECULAR TREND IS Ŷ = a + b t
NORMAL EQNS. ARE ∑ Ŷ = n a + b ∑ t
∑ t Ŷ = a ∑ t + b ∑ t2 n = NO OF YEARS
ENABLES FORCEASTING FUTURE VALUES OF Y FORM Y = a + b t
Research Methodology 8 M S Sridhar, ISRO 121
4. Other Measures B. Time series analysis contd.
Times series analysis of data of example 1
Week Day Number of issues Moving average Cyclical
variation
1. M 36
T 31
W 25
T 55 47.9 +7.1
F 45 50.7 -5.7
S 90 53.7 +36.3
2. M 46 56.8 -10.8
T 55 60.6 -5.6
W 37 64.4 -27.4
T 80 68.2 +11.8
F 66 71.6 -5.6
S 115 73.6 +41.4
3. M 66 73.3 -7.3
T 76 74.8 +1.2
W 40 79.8 -39.8
T 74
F 90
S 150
Research Methodology 8 M S Sridhar, ISRO 122
4. Other Measures B. Time series analysis contd.
Home work
Daily visitors to a
public library
Day Week 1 Week 2
Sun 900 800
Mon 400 500
Tue 500 300
Wed 600 300
Thu 300 400
Fri 700 600
Sat 1100 900
Solution:
Trend : Upward, i.e., Increasing daily issues
Cyclic variation: Difference between the moving average (expected) and
corresponding actual figure of issues are markedly high on Saturday and
very low on Wednesday
Research Methodology 8 M S Sridhar, ISRO 123
Monthly statistics of no. of searches executed on a CD-ROM database by
PG students is as follows
Year 1 Year 2 Year 3
Month No. of 12 month No. of 12 month No. of 12month
searches moving Av searches moving Av searches moving Av
JAN 50 60 68.3 50 71.7
FEB 50 50 68.3 40 71.7
MAR 50 60 68.3 70 72.5
APR 60 70 69.2 80 71.7
MAY 70 80 70.0 90 70.8
JUN 80 64.2 90 69.2 100 71.7
JUL 90 65.0 90 68.3 100
AUG 90 65.0 90 67.5 90
SEP 60 65.8 60 68.3 70
OCT 60 66.7 70 69.2 60
NOV 50 67.5 60 70.0 50
DEC 60 68.3 50 70.8 60
Secular trend: Increasing
Seasonal variation: Maximum during Mar-May (may be exam ? Seasonal
index is more useful in accounting seasonal variations)
Research Methodology 8 M S Sridhar, ISRO 124
Quarterly statistics of user visit to a special library
Year 1 Year 2 Year 3
Quarter No. of % of No. of % of No. of % of Average
User Quarterly User Quarterly User Quarterly % of
Visits Visits Visits Average Index
1 2000 80 3000 100 4000 100 93.3
2 3000 120 3500 116.7 5000 125 120.6
3 2000 80 2000 66.7 3000 75 73.9
4 3000 120 3500 116.7 4000 100 112.2
Total 10000 12000 16000
Q Average 2500 3000 4000
If the first quarter of 4th year records 6000 user visits estimate the
average quarterly visits for that year
Ans. 6000/93.3 X100 = 6431
Research Methodology 8 M S Sridhar, ISRO 125
Method of least squares
Assumes linear relation and that the past behaviour continues to
persist in future
Y = na+b∑t
Normal equations
∑Y = na + b ∑t
∑tY = a ∑t + b ∑t2
Example: ∑t = 0, ∑Y = 177, ∑tY = 171, ∑t 2 = 280, n=15
⇒ 177= 15a +b x 0
171 = a x 0 + b x 280 ⇒ a = 11.8
b = 0.61
Hence the trend regression line is Y = 11.8 + 0.61t
To find out the t=9 (i.e., say sales for 1986)
Y =11.8 + 0.61 x 9 = 17.29
Note: For variable ‘t’ midpoint of
time is taken as origin.
Ex: -2, -1, 0, 1, 2, (odd nos.)
-3, -2, -1, 1, 2, 3 (even nos.)
Research Methodology 8 M S Sridhar, ISRO 126
4. Other Measures B. Time series analysis contd.
Measurement of seasonal variations:
1. Ratio to trend method
2. Ratio to moving averages method
3. Link relative method
Measurement of cyclic variations:
1. Harmonic analysis
2. Spectrum analysis
Note: Residuals remaining after elimination of seasonal and
trend components be recorded & plotted graphically for
visual comparison of residual variations which are attributed
to cyclic and irregular / erratic components
Research Methodology 8 M S Sridhar, ISRO 127
ANOVA
• Testing the difference among different groups of data
for homogeneity.
• Useful to investigate
(I) Any number of factors which are hypothesised or
said to influence the dependent variable
(ii)The differences amongst various categories within
each of these factors which may have a large number
of values
Example (one way ANOVA)
The number of books stored per shelf in a library may be
of interest. If a random sample of shelves is selected
and the number of books on each shelf are counted,
the quantitative data collected can be presented in a
frequency table as shown in the figure …contd.
Research Methodology 8 M S Sridhar, ISRO 128
Frequency table showing variation of no. of books stored per shelf according to
subject category (example of one way ANOVA) contd.
Number of shelves
Books per shelf Geography X1 Law X2 production X3 total
16 1 1
17 0
18 0
19 0
20 0
21 3 3
22 0
23 1 1 4
24 0
25 4 4
26 3 3
27 0
28 ! 1 2
29 1 1
30 2 2 4
31 0
32 1 1
33 1 1 3
34 1 1
35 0
36 2 2
37 0
38 0
39 0
40 0
41 0
42 0
43 1 1
Research Methodology 8 M S Sridhar, ISRO 129
ANOVA (Example of one way ANOVA) contd.
Frequency table showing variation of number of books stored per
shelf in a random sample of shelves
Books per Number of Books per No. of shelves
shelf shelves shelf
16 1 30 4
17 0 31 0
18 0 32 1
19 0 33 3
20 0 34 1
21 3 35 0
22 0 36 2
23 4 37 0
24 0 38 0
25 4 39 0
26 3 40 0
27 0 41 0
28 2 42 0
29 1 43 1
One-way ANOVA considers one factor, i.e., No. of books per shelf
Research Methodology 8 M S Sridhar, ISRO 130
Steps (example of one way ANOVA) contd.
1. OBTAIN MEAN OF EACH SAMPLE I.E. X1, X2, X3
X1+ X2+ X3
2. FIND MEAN OF THE SAMPLE MEANS, I.E. X = ---------------
3 (k)
3. FIND SUM OF SQUARES FOR VARIANCE BETN THE SAMPLES,
I.E., SS BETWEEN = n1 (X1 – X)2 + n2 (X2 – X)2 + n3 (X3 – X)2
4. CALCULATE VARIANCE OR MEAN SQUARE BETN.
SAMPLES,
SS BETWEEN DF = k-1
I.E., MS BETWEEN = ---------------------
2 = 3 - 1 =2
5. SUM OF SQUARES FOR VARIANCE WITHIN SAMPLES
SS WITHIN = ∑ (X1i – X1)2 + ∑ (X2i – X2)2 + ∑ (Y3i – X3)2
6. VARIANCE OF MEAN SQUARE WITHIN SAMPLES
SS WITHIN DF = n - k
MS WITHIN = -------------------- n = n1 + n2 + n3 + …..
n - k
Research Methodology 8 M S Sridhar, ISRO 131
Frequency table showing variation of number of books
stored per shelf according to subject category
(example of one way ANOVA) …contd.
7. CHECK SS FOR TOTAL VARIATION
= ∑(Xij – X)2 = SS BETWEEN + SS WITHIN
AND (n-1) = (k – 1) + (n – k)
MS BETWEEN
8. F RATIO = --------------------
MS WITHIN
Note: Compare with table value of F. If it is equal or more
than table value difference is significant and hence
1. samples could not have come from the same
universe or
2. the independent variable has a significant effect on
dependant variable.
More the value of F ratio more definite and sure about the
conclusions
Research Methodology 8 M S Sridhar, ISRO 132
M S Sridhar, ISRO Testing of Hypotheses 132
1. Anderson, Jonathan, et. al. Thesis and assignment writing. New Delhi:
Wiley, 1970.
2. Best, Joel. Damned lies and statistics. California: University of California
Press, 2001.
3. Best, Joel. More damned lies and statistics; how numbers confuse public
issues. Berkeley: University of California Press, 2004
4. Body, Harper W Jr. et.al. Marketing research: text and cases. Delhi: All
India Traveler Bookseller, 1985.
5. Booth, Wayne C, et. al. The craft of research. 2 ed. Chicago: The
University of Chicago Press, 2003.
6. Chandran, J S. Statistics fdor business and economics. New Delhi:
Vikas, 1998.
7. Chicago guide to preparing electronic manuscripts: For authors and
publishers. Chicago: The University of Chicago Press, 1987.
8. Cohen, Louis and Manion, Lawrence. Research methods in education.
London: Routledge, 1980.
9. Goode, William J and Hatt, Paul K. Methods on social research. London;
Mc Graw Hill, 1981.
10. Gopal, M.H. An introduction to research procedures in social sciences.
Bombay: Asia Publishing House, 1970.
11. Koosis, Donald J. Business statistics. New York: John Wiley,1972.
References
References
Research Methodology 8 M S Sridhar, ISRO 133
M S Sridhar, ISRO Testing of Hypotheses 133
12. Kothari, C.R. Research methodology: methods and techniques. 2 ed.,
New Delhi: Vishwaprakashan, 1990.
13. Miller, Jane E. The Chicago guide to writing about numbers. Chicago:
the University of Chicago Press, 2004.
14. Rodger, Leslie W. Statistics for marketing. London: Mc-Graw Hill, 1984.
15. Salvatoe, Dominick. Theory and problems of statistics and
econometrics (Schaum’s outline series). New York: McGraw-Hill, 1982.
16. Spiegel, Murray R. Schauim’s outline of theory and problems of
statistics in SI units. Singapore: Mc Graw Hill , 1981.
17. Simpson, I. S. How to interpret statistical data: a guide for librarians
and information scientists. London: Library Association, 1990.
18. Slater, Margaret ed. Research method in library and information
studies. London: Library Association, 1990.
19. Turabian, Kate L. A manual for writers of term papers, theses, and
dissertations. 6 ed. Chicago: The University of Chicago, 1996.
20. Young, Pauline V. Scientific social surveys and research. New Delhi:
Prentice-Hall of India Ltd., 1984.
21. Walizer, Michael H and Wienir, Paul L. Research methods and analysis:
searching for relationships. New York: Harper & Row, 1978.
22. Williams, Joseph M. Style: towards clarity and grace. Chicago: The
University of Chicago Press, 1995.
References
References …
…Contd.
Contd.
Research Methodology 8 M S Sridhar, ISRO 134
About the Author
Dr. M. S. Sridhar is a post graduate in Mathematics and Business
Management and a Doctorate in Library and Information Science. He
is in the profession for last 36 years. Since 1978, he is heading the Library
and Documentation Division of ISRO Satellite Centre, Bangalore. Earlier
he has worked in the libraries of National Aeronautical Laboratory
(Bangalore), Indian Institute of Management (Bangalore) and University
of Mysore. Dr. Sridhar has published 4 books, 81 research articles, 22
conferences papers, written 19 course materials for BLIS and MLIS, made
over 25 seminar presentations and contributed 5 chapters to books. E-mail:
sridharmirle@yahoo.com, mirlesridhar@gmail.com, sridhar@isac.gov.in ;
Phone: 91-80-25084451; Fax: 91-80-25084476.

More Related Content

Similar to Statistical Techniques for Processing & Analysis of Data Part 9.pdf

Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptxChinna Chadayan
 
Action research data analysis
Action research data analysis Action research data analysis
Action research data analysis Nasrun Ahmad
 
Characteristic of a Quantitative Research PPT.pptx
Characteristic of a Quantitative Research PPT.pptxCharacteristic of a Quantitative Research PPT.pptx
Characteristic of a Quantitative Research PPT.pptxJHANMARKLOGENIO1
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptxheencomm
 
Research Methodology Unit-4 Notes.pptx
Research Methodology   Unit-4 Notes.pptxResearch Methodology   Unit-4 Notes.pptx
Research Methodology Unit-4 Notes.pptxmunnatiwari5
 
How to Easily Do the Descriptive Analysis in Case Study Writing
How to Easily Do the Descriptive Analysis in Case Study WritingHow to Easily Do the Descriptive Analysis in Case Study Writing
How to Easily Do the Descriptive Analysis in Case Study WritingHarry Brook
 
GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfStanleyChivandire1
 
IA details for IBDP Biology teachers and students
IA details for IBDP Biology teachers and studentsIA details for IBDP Biology teachers and students
IA details for IBDP Biology teachers and studentsRawda Eada
 
Syllabus- Decision Science.docx
Syllabus- Decision Science.docxSyllabus- Decision Science.docx
Syllabus- Decision Science.docxDikshaGandhi20
 
Data analysis
Data analysisData analysis
Data analysisneha147
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesAnkurTiwari813070
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxCasylouMendozaBorqui
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 

Similar to Statistical Techniques for Processing & Analysis of Data Part 9.pdf (20)

Nursing Data Analysis.pptx
Nursing Data Analysis.pptxNursing Data Analysis.pptx
Nursing Data Analysis.pptx
 
Action research data analysis
Action research data analysis Action research data analysis
Action research data analysis
 
Characteristic of a Quantitative Research PPT.pptx
Characteristic of a Quantitative Research PPT.pptxCharacteristic of a Quantitative Research PPT.pptx
Characteristic of a Quantitative Research PPT.pptx
 
Data analysis aug-11
Data analysis aug-11Data analysis aug-11
Data analysis aug-11
 
1.model building
1.model building1.model building
1.model building
 
Presentation of BRM.pptx
Presentation of BRM.pptxPresentation of BRM.pptx
Presentation of BRM.pptx
 
Data Presentation & Analysis.pptx
Data Presentation & Analysis.pptxData Presentation & Analysis.pptx
Data Presentation & Analysis.pptx
 
Research Methodology Unit-4 Notes.pptx
Research Methodology   Unit-4 Notes.pptxResearch Methodology   Unit-4 Notes.pptx
Research Methodology Unit-4 Notes.pptx
 
How to Easily Do the Descriptive Analysis in Case Study Writing
How to Easily Do the Descriptive Analysis in Case Study WritingHow to Easily Do the Descriptive Analysis in Case Study Writing
How to Easily Do the Descriptive Analysis in Case Study Writing
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Data analysis
Data analysisData analysis
Data analysis
 
GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdf
 
IA details for IBDP Biology teachers and students
IA details for IBDP Biology teachers and studentsIA details for IBDP Biology teachers and students
IA details for IBDP Biology teachers and students
 
Syllabus- Decision Science.docx
Syllabus- Decision Science.docxSyllabus- Decision Science.docx
Syllabus- Decision Science.docx
 
Data analysis
Data analysisData analysis
Data analysis
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
 
Media research
Media researchMedia research
Media research
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptx
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 

Statistical Techniques for Processing & Analysis of Data Part 9.pdf

  • 1. Research Methodology PART 8 Statistical Techniques for Processing & Analysis of Data M S Sridhar Head, Library & Documentation ISRO Satellite Centre Bangalore 560017 E-mail: sridhar@isac.gov.in & mirlesridhar@gmail.com
  • 2. Research Methodology 8 M S Sridhar, ISRO 2 Statistical techniques for processing & analysis of data Synopsis 1. Introduction to Research & Research methodology 2. Selection and formulation of research problem 3. Research design and plan 4. Experimental designs 5. Sampling and sampling strategy or plan 6. Measurement and scaling techniques 7. Data collection methods and techniques 8. Testing of hypotheses 9. Statistical techniques for processing & analysis of data 10. Analysis, interpretation and drawing inferences 11. Report writing 1. Introduction Statistics: what, why and characteristics 2. Statistic Types Quantitative & Qualitative (Variable & Attribute) data Descriptive & Inferential statistics 3. Processing & Analysis of data ¾ Processing: 1. Editing 2. Coding 3. Classification 4. Tabulation ¾ Analysis 1. Descriptive & inferential 2. Correlational, causal & multivariate …contd.
  • 3. Research Methodology 8 M S Sridhar, ISRO 3 Statistical Techniques for Processing & Analysis of Data:contd. 4. Some processing techniques ¾Tally sheet / chart ¾ Presentation of data 9 Textual or descriptive 9 Tabular 9 Diagrammatic/ graphical 5. Univariate analysis/ measures ¾ Central tendency ¾ Dispersion ¾ Asymmetry (skewness) 6. Bivariate & Multivariate analysis/ measures Synopsis 1. Introduction to Research & Research methodology 2. Selection and formulation of research problem 3. Research design and plan 4. Experimental designs 5. Sampling and sampling strategy or plan 6. Measurement and scaling techniques 7. Data collection methods and techniques 8. Testing of hypotheses 9. Statistical techniques for processing & analysis of data 10. Analysis, interpretation and drawing inferences 11. Report writing
  • 4. Research Methodology 8 M S Sridhar, ISRO 4 Statistics •Science of statistics cannot be ignored by researcher •Statistics is both singular and plural. As plural it means numerical facts systematically collected and as singular it is the science of collecting, classifying and using statistics •It is a tool for designing research, processing & analysing data and drawing inferences / conclusions •It is also a double edged tool easily lending itself for abuse and misuse Abuse⇒ Poor data + Sophisticated techniques = Unreliable Result Misuse⇒Honest facts (Hard data) + Poor techniques = Impressions Examples: Percentage for very small sample Using wrong average Playing with probability Scale & origin and proportion between ordinate & abscissa Funny correlation One-dimensional figure Unmentioned base
  • 5. Research Methodology 8 M S Sridhar, ISRO 5 Characteristics of Statistics 1. Aggregates of facts 2. Affected by multiple causes 3. Numerically expressed 4. Collected in a systematic manner 5. Collected for a predetermined purpose 6. Enumerated or estimated according to reasonable standard of accuracy 7. Statistics must be placed in relation to each other (context)
  • 6. Research Methodology 8 M S Sridhar, ISRO 6 What statistics does? 1. Enables to present facts on a precise definite form that helps in proper comprehension of what is stated. Exact facts are more convincing than vague statements 2. Helps to condense the mass of data into a few numerical measures, i.e., summarises data and presents meaningful overall information about a mass of data 3. Helps in finding relationship between different factors in testing the validity of assumed relationship 4. Helps in predicting the changes in one factor due to the changes in another 5. Helps in formulation of plans and policies which require the knowledge of further trends and hence statistics plays vital role in decision making
  • 7. Research Methodology 8 M S Sridhar, ISRO 7 Statistic types • Deductive statistics describe a complete set of data • Inductive statistics deal with a limited amount of data like a sample • Descriptive statistics ( & causal analysis) is concerned with development of certain indices from the raw data and causal analysis. Measures of central tendency and measures of dispersion are typical descriptive statistical measures • Inferential (sampling / statistical) analysis: Inferential statistics is used for (a) estimation of parameter values (point and interval estimates) (b) testing of hypothesis (using parametric / standard tests and non-parametric / distribution-free tests) and (c) drawing inferences
  • 8. Research Methodology 8 M S Sridhar, ISRO 8 Descriptive Statistics (Techniques) 1. Uni-dimension analysis (Mostly one variable) (I) Central tendency - Mean, median, mode, GM & HM (ii) Dispersion - variance, standard deviation , mean deviation & range (iii) Asymmetry (Skewness) & Kurtosis (iv) Relationship - Pearson’s product moment correlation, spearman’s rank order correlation, Yule's coefficient of association (v) Others - One way ANOVA, index numbers, time series analysis, simple correlation & regression analysis
  • 9. Research Methodology 8 M S Sridhar, ISRO 9 Descriptive Statistics (Techniques) …contd. 2. Bivariate analysis (I) Simple regression & correlation (ii) Association of attributes (iii)Two-way ANOVA 3. Multivariate analysis (i)Multiple regression & correlation/partial correlation (ii)Multiple Discriminate Analysis: Predicting an entity’s possibility of belonging to a particular group based on several predictors (iii)Multi-ANOVA: Extension of two-way ANOVA; ratio of among group variance to within group variance (iv)Canonical analysis : Simultaneously predicting a set of dependent variables (both measurable & non measurable) (v)Factor analysis, cluster analysis, etc.
  • 10. Research Methodology 8 M S Sridhar, ISRO 10 Quantitative and Qualitative (Variable and Attribute) Data • Quantitative (or numerical) data an expression of a property or quality in numerical terms data measured and expressed in quantity enables (i) precise measurement (ii) knowing trends or changes over time, and (iii) comparison of trends or individual units On the other hand, • Qualitative (or categorical ) data involves quality or kind with subjectivity Variables data are quality characteristics that are measurable values, i.e., they are measurable, normally continuous and may take on any value Attribute data are quality characteristics that are observed to be either present or absent, conforming or not conforming, i.e., they are countable, normally discrete and integer
  • 11. Research Methodology 8 M S Sridhar, ISRO 11 Processing and Analysis of Qualitative Data When feel & flavour of the situation become important, researchers resort to qualitative data (some times called attribute data) Qualitative data describe attributes of a single or a group of persons that is important to record as accurately as possible even though they cannot be measured in quantitative terms. More time & efforts are needed to collect & process qualitative data. Such data are not amenable for statistical rules & manipulations. However, Scaling techniques help converting qualitative data into quantitative data. Usual data reduction, synthesis and plotting trends are required but differ substantially and extrapolation of finding is difficult. It calls for sensitive interpretation & creative presentation. Examples: Quotation from interview, open remarks in questionnaire, case histories bringing evidence, content analysis of verbatim material, etc. …contd.
  • 12. Research Methodology 8 M S Sridhar, ISRO 12 Process and Analysis of Qualitative Data …contd. Note: Identifying & coding recurring answers to open ended questions help categorise key concepts & behaviour. May even count & cross analyse (requires pattern discerning skill); even unstructured depth interviews can be coded to summarise key concepts & present in the form of master charts Qualitative coding involves classifying data which are (i) not originally created for research purpose and (ii) having very little order STEPS: 1. Initial formalisation with issues arising (build themes & issues) 2. Systematically describing the contents (compiling a list of key themes) 3. Indexing the data (note reflections for patterns, links, etc.) in descriptions; interpreting in relation to objective; checking the interpretation 4. Charting the data themes 5. Refining the charted material 6. Describing & discussing the emerging story
  • 13. Research Methodology 8 M S Sridhar, ISRO 13 Processing and Analysis of Quantitative Data ¾ Quantitative data are numbers representing counts, ordering or measurements can be described, summarised (data reduction), aggregated, compared and manipulated arithmetically & statistically ¾ Levels of measurement (ie., nominal, ordinal, interval & ratio) determine the kind of statistical techniques to be used ¾ Use of computer is necessary in many situations 1. Organisation and classification of data 2. Presentation of data 3. Analysis of data Inferential (Sampling / Statistical) Analysis is concerned with process of generalisation through estimation of parameter values and testing of hypotheses 4. Interpretation of data Inference: Data processing, analysis, presentation (presenting in table, chart or graph) & interpretation (interpreting is to expound the meaning) should lead to drawing inference, i.e., (i) Validation of hypotheses and (ii) Realisation of objectives with respect to (a) Relationship between variables (b) Discovering a fact (c) Establishing a general or universal law
  • 14. Research Methodology 8 M S Sridhar, ISRO 14 Processing and Analysis of Quantitative Qata …contd. STEPS: 1. Data reduction : Reduce large batches & data sets (a) to numerical summaries, tabular & graphical form (b) to enable to ask questions about observed patterns 2. Data presentation 3. Exploratory data analysis 4. Looking for relationships & trends 5. Graphical presentation PROCESSING (Aggregation & compression): 1. Editing : (i) Field editing (ii) Central editing 2. Coding: Assigning to a limited number of mutually exclusive but exhaustive categories or classes 3. Classification: arranging data in groups or classed on the basis of common characteristics (i) By attributes ( statistics of attributes) (ii) By class intervals (statistics of variables) Note:class limits, class intervals,magnitude, determination of frequencies & number of classes (normally 5-15; size of class interval, i = R / 1+3.3 log N Where R = Range & N = No. of items to be grouped) are discussed later
  • 15. Research Methodology 8 M S Sridhar, ISRO 15 Processing and Analysis of Quantitative Data 4. Tabulation/ Tabular Presentation : To make voluminous data readily usable and easily comprehensible three forms of presentation are possible A. Textual (descriptive) presentation: When the quantity of data is not too large and no difficulty in comprehending while going through, textual presentation helps to emphasise certain points. E.g. There are 30 students in the class and of which 10 (one-third) are female students. B.Tabular presentation: Summarising and displaying data in a concise / compact and logical order for further analysis is the purpose of tabulation. It is a statistical representation presenting as a simple or complex table for summarising and comparing frequencies, determining bases for and computing percentages, etc. Note: While tabulating responses to questionnaire that problems concerning the responses like ‘Don’t know’ and not answered responses, computation of percentages, etc. have to be handled carefully. C. Diagrammatic presentation
  • 16. Research Methodology 8 M S Sridhar, ISRO 16 Tabular Presentation of Data Table organises data presenting in rows and columns with cells containing data for further statistical treatment and decision making. Four kinds of classification used in tabulation are: i) Qualitative classification based on qualitative characteristics like status, nationality and gender ii) Quantitative classification based on characteristics measured quantitatively like age, height and income (assigning class limits for the values forms classes) iii) Temporal classification : Categorised according to time (with time as classifying variable). E.g., hours, days, weeks, months, years iv) Spatial classification: Place as a classifying variable. E.g. Village, town, block, district, state, country
  • 17. Research Methodology 8 M S Sridhar, ISRO 17 Parts of Table Table is conceptualised as data presented in rows and columns along with some explanatory notes. Tabulation can be one-way, two-way, or three-way classification depending upon the number of characteristics involved i) Table number for identification purpose at the top or at the beginning of the title of the table; Whole numbers are used in ascending order; Subscripted numbers are used if there are many tables ii) Title, usually placed at the head, narrates about the contents of the table; Clearly, briefly and carefully worded so as to make interpretations from the table clear and free from ambiguity iii) Captions or column heading: are column designations to explain figures of the column
  • 18. Research Methodology 8 M S Sridhar, ISRO 18 Parts of Table …contd. iv) Stab or row leadings (stab column) are designations of the rows v) Body of the table contains the actual data vi) Unit of measurement: stated along with the title; does not change throughout the table unless stated when different units are used for rows and columns; if stated figures are large, they are rounded up and indicated vii)Source Note at the bottom of the table to indicate the source of data presented viii)Foot Note is the last part of the table; explains the specific feature of the data content, which is not self explanatory and has not been explained earlier
  • 19. Research Methodology 8 M S Sridhar, ISRO 19 Preparation of Frequency Distribution Table 1. Deciding number of classes: The rule of thumb is to have 5 to 15 classes. Know the range and variations in variable’s value. Range is the difference between the largest and the smallest value of the variable (i.e., It is the sum of all class intervals or the number of classes multiplied by class interval) (Class interval is the various intervals of the variable chosen for classifying data) 2. Deciding size of each class : 1 and 2 are inter-linked 3. Determining the class limit : Choose a value less than the minimum value of the variable as the lower limit of the first class and a value greater than the maximum value of the variable is the upper class limit for the last class. Note: It is important to choose class limit in such a way that mid-point or class mark of each class coincides, as far as possible, with any value around which the data tend to be concentrated, i.e., Class limits are chosen in such a way that midpoint is close to average
  • 20. Research Methodology 8 M S Sridhar, ISRO 20 Class Intervals in Frequency Tables 11 12 13 14 16 17 18 19 5 UNITS 5 UNITS 10 15 20 LOWER MID-POINT UPPER LIMIT LIMIT 11 12 13 14 2.5 UNITS 2.5 UNITS 10 12.5 20 LOWER MID-POINT UPPER LIMIT LIMIT A B 15 Even class-interval & its mid point Odd class-interval & its mid point
  • 21. Research Methodology 8 M S Sridhar, ISRO 21 Preparation of Frequency Distribution contd. Two methods for class limits: Exclusive & inclusive type class intervals for determination of frequency of each class (see tally sheet example given later) (i) Exclusive method: Upper class limit of one class equals the lower class limit of the next class. Suitable in case of data of a continuous variable and here the upper class limit is excluded but the lower class limit of a class is included in the interval (ii) Inclusive method: Both class limits are parts of the class interval. An adjustment in class interval is done if we found ‘gap’ or discontinuing between the upper limit of a class and the lower limit of the next class. Divide the difference between the upper limit of first class and lower limit of the second class by 2 and subtract it from all lower limits and add it to all upper class limits. Adjusted class mark = (Adjusted upper limit + Adjusted lower limit) /2 This adjustment restores continuity of data in the frequency distribution
  • 22. Research Methodology 8 M S Sridhar, ISRO 22 Preparation of Frequency Distribution contd. 4. Find the frequency of each class (i.e., how many times that observation occurs in the row data) by tally marking. Frequency of an observation is the number of times a certain observations occurs. Frequency table gives the class intervals and the frequencies associated with them Loss of information: Frequency distribution summarises raw data to make it concise and comprehensible, but does not show the details that are found in raw data. Bivariate Frequency distribution is a frequency distribution of two variables (e.g.:No. of books in stock and budget of 10 libraries) Frequency Distribution with unequal classes: Some classes having either densely populated or sparsely populated observations, the observations deviate more from their respective class marks than in comparison to those in other classes. In such cases, unequal classes are appropriate. They are formed in such a way that class marks coincide, as far as possible, to a value around which the observations in a class tend to concentrate, then in that case unequal class interval is more appropriate. Frequency array: For a discrete variable, the classification of its data is known as a frequency array (e.g. No. of books in 10 libraries)
  • 23. Research Methodology 8 M S Sridhar, ISRO 23 Analysis of Data Computation of certain indices or measures, searching for patterns of relationships, estimating values of unknown parameters, & testing of hypothesis for inferences 1. Descriptive analysis : Largely the study of distributions of one variable (uni-dimension); Univariate analysis → Two variables Multivariate analysis → More than two variables 2. Inferential or statistical analysis : • Correlation & causal analysis: ™Joint variation of two or more variables is correlation analysis ™How one or more variables affect another variable is causal analysis ™Functional relation existing between two or more variables is regression analysis • Multivariate analysis: Simultaneously analysing more than two variables • Multiple regression analysis: Predicting dependent variable based on its covariance with all concerned independent variables
  • 24. Research Methodology 8 M S Sridhar, ISRO 24 Tally (tabular) sheets /charts for frequency distribution of qualitative, quantitative and grouped/ interval data I. Single variable (Univariate measures) 1. Quantitative (I) Simple data (ii) Frequency distribution of grouped / interval data 2. Qualitative (Attributes) II. Two or more variables (Bivariate & multivariate measures) 1. Quantitative / Quantitative (I) Simple (ii) Frequency distribution 2. Quantitative / Qualitative (Attributes) (I) Simple (ii) Frequency distribution 3. Qualitative / Qualitative (Attributes) examples of tabulation and tabular presentation follows
  • 25. Research Methodology 8 M S Sridhar, ISRO 25 Table 8.1 (Quantitative data) Frequency distribution of citations in technical reports No. of citations Tally Frequency (No. of tech. reports) 0 ⎟⎟ 2 1 ⎟⎟⎟⎟ 4 2 ⎟⎟⎟⎟ 5 3 ⎟⎟⎟⎟ 4 4 ⎟⎟⎟⎟ ⎟⎟ 7 5 ⎟⎟⎟⎟ ⎟⎟⎟ 8 Total 30 Table 8.2 (Qualitative data) Frequency distribution of qualification (educational level) of users Qualification Tally Frequency (No. of users) Undergraduates ⎟⎟⎟⎟ ⎟ 6 Graduates ⎟⎟⎟⎟ ⎟⎟⎟⎟ 9 Postgraduates ⎟⎟⎟⎟ ⎟⎟ 7 Doctorates ⎟⎟⎟ 3 Total 25
  • 26. Research Methodology 8 M S Sridhar, ISRO 26 Table 8.3: Frequency distribution of age of 66 users who used a branch public library during an hour (Grouped/ interval data of single variable) (Note that the raw data of age of individual users is already grouped here) Age in years (Groups/Classes) Tally Frequency (No. of users) < 11 11 11 – 20 14 21 – 30 16 31 – 40 12 41 – 50 6 51 - 60 3 > 60 4 Total 66
  • 27. Research Methodology 8 M S Sridhar, ISRO 27 Table 8.4: No. of books acquired by a library over last six years Year No. of Books acquired (Qualitative) (Quantitative) 2000 772 2001 910 2002 873 2003 747 2004 832 2005 891 Total 5025 Table 8.5: The daily visits of users to a library during a week are recorded and summarised Day Number of users (Qualitative) (Quantitative) Monday 391 Tuesday 247 Wednesday 219 Thursday 278 Friday 362 Saturday 96 Total 1593
  • 28. Research Methodology 8 M S Sridhar, ISRO 28 Table 8.6: The frequency distribution of number of authors per paper of 224 sample papers No. of Authors No. of Papers 1 43 2 51 3 53 4 30 5 19 6 15 7 6 8 4 9 2 10 1 Total 224
  • 29. Research Methodology 8 M S Sridhar, ISRO 29 Table 8.7: Total books (B), journals (J) and reports ( R) issued out from a library counter in one hour are recorded as below: B B B J B B B B J B B B B B B B B B B B B B B J B R B B B J A frequency table can be worked out for above data as shown below: Document Tally Frequency Relative Cumulative Cumulative Type (Number) frequency frequency relative frequency Books 20 0.8 20 0.8 Journal 4 0.16 24 0.96 Reports 1 0.04 25 1.0 Total 25 1.0
  • 30. Research Methodology 8 M S Sridhar, ISRO 30 Table 8.7 contd. Note: If the proportion of each type of document (category) are of interest rather than actual numbers, the same can be expressed in percentages or as proportions as shown below: Proportions of books, journals and reports issued from a library in one hour is 20:4:1 OR Type of document Proportion of each type of document (%) Books 80 Journal 16 Reports 4 Total 100
  • 31. Research Methodology 8 M S Sridhar, ISRO 31 Table 8.8: Given below is a summarized table of the relevant records retrieved from a database in response to six queries Search Total Relevant % of relevant No. Documents Documents records Retrieved Retrieved Retrieved 1 79 21 26.6 2 18 10 55.6 3 20 11 55.0 4 123 48 39.0 5 6 8 50.0 6 109 48 44.0 Total 375 146 Note:Percentage of relevant records retrieved for each query gives better picture about which query is more efficient than observing just frequencies.
  • 32. Research Methodology 8 M S Sridhar, ISRO 32 Table 8.9: Frequency distribution of borrowed use of books of a library over four years No. Times borrowed No. of Books Percentage Cumulative borrowed (Quantitative) (Quantitative) Percentage 0 19887 57.12 57.12 1 4477 12.56 69.68 2 4047 11.93 81.61 3 1328 3.81 85.42 4 897 2.57 87.99 5 726 2.02 90.01 6 557 1.58 91.68 7 447 1.28 92.96 8 348 1.00 93.96 9 286 0.92 94.78 10 290 0.84 95.62 >10 1524 4.38 100.00
  • 33. Research Methodology 8 M S Sridhar, ISRO 33 Table 8.10: The raw data of self-citations in a sample of 10 technical reports are given below: 5 0 1 4 0 3 8 2 3 0 4 2 1 0 7 3 1 2 6 0 2 2 5 7 2 Frequency distribution of self-citations of technical reports: No. of self- Frequency Less than (or equal) More than (or equal) citations (No. of reports) cumulative frequency cumulative frequency No. % % % 0 5 20 20 100 1 3 12 32 80 2 6 24 56 68 3 3 12 68 44 4 2 8 76 32 5 2 8 84 24 6 1 4 88 16 7 2 8 96 12 8 1 4 100 4 Total 25
  • 34. Research Methodology 8 M S Sridhar, ISRO 34 Table 8.11: (Qualitative Data) Responses in the form of True (T) or False (F) to a questionnaire (opinionnaire) is tabulated and given along with qualitative raw data True 17 T T T F F False 8 F T T T T No response 5 T T F F T Total 30 F T F T T T T T T F
  • 35. Research Methodology 8 M S Sridhar, ISRO 35 Grouped or Interval Data ¾So far (except in Table 8.3) only discrete data are presented and the number of cases/ items are also limited ¾As against discrete data, continuous data like heights of people, have to be collected in groups or intervals, like height between 5’ and 5’5” for a meaningful analysis ¾Even large quantity of discrete data require compression and reduction for meaningful observation, analysis and inferences ¾Table 8.12 in the next slide presents 50 observations and if we create a frequency table of these discrete data it will have 22 lines in the table as there are 22 different values 9ranging from Rs.10/- to Rs.100/-). Such large tables are undesirable as they not only take more time but also the resulting frequency table is less appealing. In such situations, we transform discrete data into grouped or interval data by creating manageable number of classes or groups. Such data compression and reduction are inevitable and worth despite some loss of accuracy (or data)
  • 36. Research Methodology 8 M S Sridhar, ISRO 36 Table 8.12 : (Grouped or interval data) Raw data of prices (in Rs.) of a set of 50 popular science books in Kannada 30 80 100 12 40 50 60 40 30 45 40 30 70 43 40 25 50 10 30 35 18 35 60 35 25 27 25 25 30 30 35 35 14 32 35 25 30 40 15 30 20 16 13 30 60 20 65 60 40 10
  • 37. Research Methodology 8 M S Sridhar, ISRO 37 Frequency distribution of grouped or interval data of Table 8.12 Price in Rs. No. of books 10 1 Mean = Rs. 35.9 12 1 Median = Rs. 33.5 13 1 Mode = Rs. 30 14 1 15 1 16 1 18 1 20 2 25 5 27 1 30 9 32 1 35 6 40 6 43 1 45 1 50 2 60 4 65 1 70 1 80 1 100 1
  • 38. Research Methodology 8 M S Sridhar, ISRO 38 Frequency distribution of grouped or interval data of price (in Rs.) of popular science books in Kannada (Table 8.12) : Price (in Rs.) (class) Frequency (f) (No. of books) 1 - 10 2 11 - 20 8 21 - 30 15 31 - 40 13 41 - 50 4 51 - 60 4 61 - 70 2 71 - 80 1 81 - 90 0 91 -100 1 Total 50
  • 39. Research Methodology 8 M S Sridhar, ISRO 39 Home work Work out a frequency table with less than cumulative and more than cumulative frequencies for the raw data of number of words per line in a book given below : 12 10 12 09 11 10 13 13 07 11 10 10 09 10 12 11 01 10 13 10 15 13 11 12 08 13 11 10 08 12 13 11 09 11 14 12 07 12 11 10
  • 40. Research Methodology 8 M S Sridhar, ISRO 40 Diagrammatic/ Graphical Presentation ¾ quickest understanding of the actual situation to be explained by data compared to textual or tabular presentation ¾ translates quite effectively the highly abstract ideas contained in numbers into more concrete and easily comprehensive from ¾ may be less accurate but more effective than table ¾ tables and diagrams may be suitable to illustrate discrete data while continuous data is better represented by graphs Note: Sample charts are constructed and presented using data from previously presented tables. Different types of data may require different modes of diagrammatic representation Three important kinds of diagrams: i) Geometric diagram (a) Bar (column) chart: simple, multiple, and component (b) Pie ii) Frequency diagram (a) Histogram (b) Frequency polygon (c) Frequency curve (d) Ogive or cumulative frequency curve iii) Arithmetic line graph
  • 41. Research Methodology 8 M S Sridhar, ISRO 41 Simple column chart for data in Table 8.2 : Qualification of users 6 9 7 3 0 1 2 3 4 5 6 7 8 9 10 U n d e r g r a d u a t e s G r a d u a t e s P o s t g r a d u a t e s D o c t o r a t e s No. of usres
  • 42. Research Methodology 8 M S Sridhar, ISRO 42 Bar chart for data from Table 8.9: Frequency distribution of borrowed use of books of a library over four years 19 8 8 7 4 4 77 4 0 4 7 13 2 8 8 9 7 72 6 557 4 4 7 3 4 8 2 8 6 2 9 0 152 4 0 5000 10000 15000 20000 25000 0 1 2 3 4 5 6 7 8 9 10 >10 No. of times borrowed No. of books
  • 43. Research Methodology 8 M S Sridhar, ISRO 43 Bar chart for data in Table 8.1 : Frequency distribution of citations in technical reports 2 4 5 4 7 8 0 2 4 6 8 10 0 1 2 3 4 5 N o. of c ita tions No. of reports
  • 44. Research Methodology 8 M S Sridhar, ISRO 44 Component bar chart
  • 45. Research Methodology 8 M S Sridhar, ISRO 45 100% component column chart
  • 46. Research Methodology 8 M S Sridhar, ISRO 46 Grouped column chart
  • 47. Research Methodology 8 M S Sridhar, ISRO 47 Comparative 100% columnar chart Chart with figures / symbols
  • 48. Research Methodology 8 M S Sridhar, ISRO 48 Histogram (frequency polygon) for data in Table 8.6: No. of authors per paper 43 51 53 30 19 15 6 4 2 1 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 No. of authors N o. of papers Frequency polygon
  • 49. Research Methodology 8 M S Sridhar, ISRO 49 Line graphs
  • 50. Research Methodology 8 M S Sridhar, ISRO 50 Line graph for data in Table 8.6 : No. of authors per paper 43 51 53 30 19 15 6 4 2 1 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 No. of authors No. of papers
  • 51. Research Methodology 8 M S Sridhar, ISRO 51 Frequency Distribution of No. of Words per Line of a Book (Home work) 2.5 7.5 12.5 20 42.5 62.5 80 95 97.5 100 0 20 40 60 80 100 120 Less than (or equal) cumulative frequency in %
  • 52. Research Methodology 8 M S Sridhar, ISRO 52 Cumulative frequency graph of reduction in no. of journals subscribed and no. of reports added over the years 0 200 400 600 800 1000 1200 1980 1985 1990 1995 2000 2002 Reports (annual intake) Journals (subscribed) Reports Journals Year (annual intake) (subscribed) 1980 1063 533 1985 936 519 1990 523 444 1995 288 416 2000 67 326 2002 29 300
  • 53. Research Methodology 8 M S Sridhar, ISRO 53 Line graph of less than or equal cumulative frequency of self-citations in technical reports(Table 8.12) 20 32 56 68 76 84 88 96 100 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 No. of self-citations No. of reports
  • 54. Research Methodology 8 M S Sridhar, ISRO 54 Line graph for more than or equal cumulative frequency of self-citations in reports (Table 8.12) 80 68 44 32 24 16 12 4 0 20 40 60 80 100 1 2 3 4 5 6 7 8 9 No. of self-citations N o . o f rep o rts
  • 55. Research Methodology 8 M S Sridhar, ISRO 55 Pie Diagram / Chart for Example 8.7: No. of books, journals and reports issued per hour Repor t s 4% Books 80% Jour nal s 16% Books Journals Reports
  • 56. Research Methodology 8 M S Sridhar, ISRO 56 Univariate Measures: A. Central Tendency Central tendency or averages are used to summarise data. It specifies a single most representative value to describe the data set. 1. The sum of the deviations of individual values of x from the mean will always add up to zero 2. The positive deviations must balance the negative deviations. 3. It is very sensitive to extreme values 4. The sum of squares of the deviations about the mean is minimum A good measure of central tendency should meet the following requisites - easy to calculate and understand - rigidly delivered - representative of data - should have sampling stability - should not be affected by extreme values
  • 57. Research Methodology 8 M S Sridhar, ISRO 57 Univariate Measures: A. Central Tendency 1. MEAN: Arithmetic mean (called statistical / arithmetic average) is the most commonly used measure. By dividing the total of items by total number of items we get mean. Characteristics of Mean ¾ most representative figure for the entire mass of data ¾ tells the point about which items have a tendency to cluster ¾ unduly affected by extreme items (very sensitive to extreme values) ¾ The positive deviations must balance the negative deviations (The sum of the deviations of individual values of x from the mean will always add up to zero) ¾ The sum of squares of the deviations about the mean is minimum
  • 58. Research Methodology 8 M S Sridhar, ISRO 58 Univariate Measures: A. Central Tendency 1. MEAN X = xi n = X1 +X2+ ….+Xn n EX: 4 6 7 8 9 10 11 11 11 12 13 X 11 102 = f1x1 +f2 x2+ ….+fn xn fi xi fi X f1+f2+ ….+fn (= n) = = = 9.27 For grouped or interval data
  • 59. Research Methodology 8 M S Sridhar, ISRO 59 Mean for grouped or interval data X = ∑ fi xi / n where n = ∑ fi = f1X1 + f2 X2 + ….+ fn Xn / f1 + f2 + ….+ fn Formula for Weighted Mean: X w = ∑ Wx Xi / ∑ Wi Formula for Mean Of Combined Sample: X = n X + m Y / n + m Formula for Moving Average ( Shortcut or Assumed Average Method): X = fi (Xi – A) / n : where n = ∑ fi NOTE: Step deviation method takes common factor out to enable simple working and uses the formula X = g + [∑ f d / n] (i)
  • 60. Research Methodology 8 M S Sridhar, ISRO 60 Price (in Rs.) (class) Frequenc y (f) (No. of books) Cumulative less than or equal frequency (cf) Distance of class from the assumed average class (d) fd d2 fd2 1 -- 10 2 2 -4 -8 16 32 11 -- 20 8 10 -3 -24 9 72 21 -- 30 15 25 -2 -30 4 60 31 -- 40 13 38 -1 -13 1 13 41 -- 50 4 42 0 0 0 0 51 -- 60 4 46 1 4 1 0 61 -- 70 2 48 2 8 4 8 71 -- 80 1 49 3 3 9 9 81 -- 90 0 50 4 4 16 0 Total 50 -56 194 Calculation of the mean (¯X ) from a frequency distribution of grouped or interval data of price (in Rs.) of popular science books in Kannada (Table 8.12) using Step deviation method is shown below: g = 46 ; ∑ƒd = - 56 ; n = 50 ; i = 10 ¯X = g + [∑ f d / n] (i) = 46 + [ -56 / 50] (10) = 34.6 Note: Compare answer with mean calculated as discrete data in Table 8.12
  • 61. Research Methodology 8 M S Sridhar, ISRO 61 Assumed average (shortcut) method & step deviation method Table: Calculation of the mean (x ) from a frequency distribution. data represent weights or 265 male freshman students at the university of Washington Class-Interval (Weight) ƒ d ƒd ƒd 90 - 99 .......... 1 -5 -5 X = g + ( i ) 100 -- 109 …….. 1 -4 -4 N 110 -- 119 …….. 9 -3 -27 99 120 -- 129 ……... 30 -2 -60 = 145 + ----- ( 10 ) 130 -- 139 …….. 42 -1 -42 265 140 -- 149 ……… 66 0 0 150 -- 159 ……… 47 1 47 = 145 + ( .3736) (10) 160 -- 169 ……… 39 2 78 = 145 + 3.74 170 -- 179 ……… 15 3 45 = 148.74 180 -- 189 ……… 11 4 44 190 -- 199 ……… 1 5 5 fi – ( Ai - A ) 200 -- 209 ……… 3 6 18 X = A + --------------------- fi N = 265 ƒd = 237 - 138 = 99
  • 62. Research Methodology 8 M S Sridhar, ISRO 62 Univariate Measures: A. Central Tendency contd.. WEIGHTED MEAN Xw = EX: MEAN OF COMBINED SAMPLE NX + MY Z = N+M MOVING AVERAGE SHORTCUT OR ASSUMED AVERAGE METHOD (Xi – A) fi (Xi – A) X = A + X = A+ n fi NOTE: Step deviation method takes common factor out to enable simple working Wi Xi Wi
  • 63. Research Methodology 8 M S Sridhar, ISRO 63 Univariate Measures: A.Central Tendency 2. Median: Middle item of series when arranged in ascending or descending order of magnitude M = VALUE OF N+1 / 2 TH ITEM EX: 11 7 13 4 11 9 6 11 10 12 8 4 6 7 8 9 10 11 11 11 12 13 1 2 3 4 5 6 7 8 9 10 11 FOR FREQUENCY DISTRIBUTION N/2 - Cf M = L + × i F L = lower limit of the median class Cf = cum. freq. of the class preceding the median class f = simple freq. of the median class i = width of the class interval of the median class Note: As a positional average does not involve values of all items and useful only in qualitative phenomenon
  • 64. Research Methodology 8 M S Sridhar, ISRO 64 Median The median in the layman language is divider like the ‘divider’ on the road that divides the road into two halves A positional value of the variable which divides the distribution into two equal parts, i.e., the median of a set of observations is a value that divides the set of observations into two halves so that one half of observations are less than or equal to the median value and the other half are greater than or equal to the median value Extreme items do not affect median, i.e., median is a useful measure as it is not unduly affected by extreme values and is specially useful in open ended frequencies For discrete data, mean and median do not change if all the measurements are multiplied by the same positive number and the result divided later by the same constant As a positional average, median does not involve values of all items and it is more useful in qualitative phenomenon The median is always between the arithmetic mean and the mode
  • 65. Research Methodology 8 M S Sridhar, ISRO 65 Median of grouped or interval data M = L + W/F (i) Where, W = [n/2] – Cf (No. of observations to be added to the cumulative total in the previous class in order to reach the middle observation in the array) L = Lower limit of the median class (the array in which middle observation lies) Cf = Cumulative frequency of the class preceding the median class i = Width of the class interval of the median class (the class in which the middle observation of the array lies) F = Frequency distribution of the median class
  • 66. Research Methodology 8 M S Sridhar, ISRO 66 Calculation of the median (M) from a frequency distribution of grouped or interval data of price (in Rs.) of popular science books in Kannada (Table 8.12) is given below Price (in Rs.) Frequency (f) Cumulative less than or equal (class) (No. of books) frequency) 1 -- 10 2 2 11 -- 20 8 10 21 -- 30 15 25 31 -- 40 13 38 41 -- 50 3 41 51 -- 60 3 44 61 -- 70 2 46 71 -- 80 1 47 81 -- 90 0 47 91 -- 100 1 48 ¾ 100 2 50 Total 50 L = 21 ; Cf = 10 ; I = 10 ; F = 15 ; W = [n/2] – Cf = [50/2] – 10 = 15 M = L + W/F (i) = 21 + 15/15 (10) = 31 Note: Compare answer with median calculated as discrete data in Table 8.12
  • 67. Research Methodology 8 M S Sridhar, ISRO 67 Median for grouped or interval data TABLE : Calculation of the median (x). data represent weights of 265 male freshman studies at the university of Washington Class – Interval Cumulative [ w = N/2 – Cf ] (Weight) ƒ ƒ “Less than” 90 - 99 …….. 1 1 100 - 109 ……… 1 2 X = / + (W/f) ( i ) 110 - 119 ……… 9 11 120 - 129 ……… 30 41 132.5 - 83 130 – 139 ………. 42 83 = 140 + -------------------- (10) 140 – 149 ………. 66 149 66 150 – 159 ……… 47 196 49.5 160 – 169 ……… 39 235 = 140 + --------- (10) 170 – 179 ……… 15 250 66 180 – 189 ……… 11 261 = 140 + (.750) (10) 190 – 199 ……… 1 262 = 140 + ( .750) (10) 200 – 209 ……… 3 265 = 140 + 7.50 N = 265 = 147.5 N /2 = 265/2 = 132.5
  • 68. Research Methodology 8 M S Sridhar, ISRO 68 Univariate Measures: A. Central Tendency 3. Mode MODE is the most commonly or frequently occurring value in a series EX. : 4 6 7 8 9 10 11 11 11 12 13 -------------- ^ For Frequency Distribution Δ1 f2 Z = L + ----------- X i OR L + --------- X i Δ1 Δ2 f2 + f1 L = Looser limit of the modal class. Δ1 = Difference in Freq. Between the modal class and the preceding class. Δ2 = Difference in Freq. Between the modal class and the succeeding class. i = Width of the class interval of the modal class. f1 = Freq. of the class preceding the modal class. f2 = Freq. of the class succeeding the modal class.
  • 69. Research Methodology 8 M S Sridhar, ISRO 69 Mode Mode is the most commonly or frequently occurring value/ observed data or the most typical value of a series or the value around which maximum concentration of items occur. In other words, the mode of a categorical or a discrete numerical variable is that value of the variable which occurs maximum number of times The mode is not affected by extreme values in the data and can easily be obtained from an ordered set of data The mode does not necessarily describe the ‘most’ ( for example, more than 50 %) of the cases Like median, mode is also a positional average and is not affected by values of extreme items. Hence mode is useful in eliminating the effect of extreme variations and to study popular (highest occurring) case (used in qualitative data) The mode is usually a good indicator of the centre of the data only if there is one dominating frequency. However, it does not give relative importance and not amenable for algebraic treatment (like median) Median lies between mean & mode. For normal distribution, mean, median and mode are equal (one and the same)
  • 70. Research Methodology 8 M S Sridhar, ISRO 70 Mode for grouped or interval data For frequency distribution with grouped (or interval) quantitative data , the model class is the class interval with the highest frequency. This is more useful when we measure a continuous variable which results in every observed value having different frequency. Modal class in Table 8.3 is age group 21-30. Please note that since the notion of the location or central tendency requires order mode is not meaningful for nominal data. Δ2 f2 Z = L + -------- (i) OR L + --------- (i) Δ2 + Δ1 f2 + f1 Where, L = Lower limit of the modal class Δ1 = Difference in frequency between the modal class and the preceding class Δ2 = Difference in frequency between the modal class and the succeeding class i = Width of the class interval of the modal class f1 = Frequency of the class preceding the modal class f2 = Frequency of the class succeeding the modal class
  • 71. Research Methodology 8 M S Sridhar, ISRO 71 Price (in Rs.) (class) Frequency (f) (No. of books) Cumulative less than or equal frequency (cf) 1 -- 10 2 2 11 -- 20 8 10 21 -- 30 15 25 31 -- 40 13 38 41 -- 50 4 42 51 -- 60 4 46 61 -- 70 2 48 71 -- 80 1 49 81 -- 90 0 50 Total 50 Calculation of the mode (Z ) from a frequency distribution of grouped or interval data of price (in Rs.) of popular science books in Kannada (Table 8.12) is shown below: L = 41 ; i = 10 ; f1 = 13 ; f2 = 4 Z = L + [f1 / f1 + f2] (i) OR L + [Δ2 / Δ1 + Δ2] (i) Z = 41 + [13 / 13 + 4] (10) = 48.65 The value 48.65 lies in the class 41-50 and hence the modal class is 41-50 in the grouped data. Note: Compare answer with mode calculated as discrete data in Table 8.12
  • 72. Research Methodology 8 M S Sridhar, ISRO 72 Table: Calculation of the mode (X). Data represent weights of 265 freshman students at the university of Washington Class –Interval (Weight) ƒ 90 - 99 . . . . . . . . . . . . 1 ƒ2 100-109 . . . . . . . . . . .. . 1 X = l + ---------- (i) 110 -119 . . . . . . . . . . . . 9 ƒ1 + ƒ2 120 -129. . . . . . . . . . . .. .. 30 130 -139. . . . . . . . . . . . .. 42 47 140 -149. . . . . . . . . . . .. . 66 = 140 + ----------- (10) 150-159 . . . . . . . . . . . . . 47 47 + 42 160-169 . . . . . . . . . . . .. . 39 170-179 . . . . . . . . . . . .. . 15 = 140 + 47/89 (10) 180-189 . . . . . . . . . . . . . 11 = 140 + 5.3 190-199 . . . . . . . . . . . . . 1 = 145.3 200-209. . . . . . . . . . . . . . 3 Z = L + Δ1 / Δ1 Δ2 Χi = 140 + 24/43 Χ 10 = 145 . 5 Mode for grouped or interval data
  • 73. Research Methodology 8 M S Sridhar, ISRO 73 Univariate Measures: A. Central Tendancy 4. GM & 5. HM Harmonic Mean : 1. Has limited application as it gives largest weight to the smallest item and smallest weight to the largest item 2. Used in cases where time and rate are involved (ex: time and motion study) Note: 1. Median and mode could also be used in qualitative data 2. Median lies between mean & mode 3. For normal distribution mean= median =mode 4. Geometric Mean nth Root of the product of the values of n items G.M. = n ∏ xi X n x1 .x2 ….xn Ex. 4 6 9 GM = 3 4 x 6 x 9 = 6 NOTE : 1. Log is used to simplify 2. GM is used in the preparation of indexes (I.e., determining Average Percent of change) and dealing with ratios 5. Harmonic Mean Reciprocal of the average of reciprocals of the values of items in series n Σ fi H M = -------------------------- = ------- 1/x1 + 1/x2 + …fi/ xn Σ fi/xi for frequency distribution Ex. : 4 5 10 3 HM = ----------------- = 60/1 = 5.45 1/4 +1/5 + 1/10
  • 74. Research Methodology 8 M S Sridhar, ISRO 74 Univariate Measures: B. Dispersion Central tendency measures do not reveal the variability present in the data. To understand the data better, we need to know the spread of the values and quantify the variability of the data. Dispersion is the scatter of the values of items in the series around the true value of average. Dispersion is the extent to which values in a distribution differ from the average of the distribution. 1. Range: The difference between the values of the extreme items of a series, I.e., difference between the smallest and largest observations Example: 4 6 7 8 9 10 11 11 11 12 13 Range = 13 - 4 = 9 • Simplest and most crude measure of dispersion • As it is not based on all the values, it is greatly/ unduly affected by the two extreme values and fluctuations of sampling. The range may increase with the size of the set of observations though it can decrease • Gives an idea of the variability very quickly
  • 75. Research Methodology 8 M S Sridhar, ISRO 75 Univariate Measures: B. Dispersion 2. Mean Deviation : The average of difference of the values of items from some average of the series (ignoring negative sign), I.e. the arithmetic mean of the differences of the values from their average Note: 1. MD is based on all values and hence cannot be calculated for open-ended distributions. It uses average but ignores signs and hence appears unmethodical. 2. MD is calculated from mean as well as from median for both ungrouped data using direct method and for continuous distribution using assumed mean method and short-cut-method 3. The average used is either the arithmetic mean or median _ Σ | xi – x | δx = ------------- n Example: 4 6 7 8 9 10 11 11 11 12 13 14 – 9.271 + 16-9.271+………+113 – 9.271 24.73 δx = ----------------------------------------------------- = ----------- = 2.25 11 11 Coefficient of mean deviation: Mean deviation divided by the average. It is a relative measure of dispersion and is comparable to similar measure of other series, i.e., Coeff. of MD = δx / x (Ex: 2.25/9.27 = 0.24) . M.D. & its coefficient are used to judge the variability and they are better measure than range _ For grouped data Σ fi | xi – x | δx = ------------- n
  • 76. Research Methodology 8 M S Sridhar, ISRO 76 Univariate Measures: B. Dispersion 3. Standard Deviation: The square root of the average of squares of deviations (based on mean), I.e., the positive square root of the mean of squared deviation from mean Σ (xi – x )2 Σ fi (xi – x)2 σ = ------------------ For grouped data σ = -------------- √ n √ Σ fi Example: 4 6 7 8 9 10 11 12 13 (4-9.27)2 + (6-9.27)2 +……+ (13 –9.27)2 σ = --------------------------------------------------------- = 2.64 √ 11 Coefficient of S D is S D divided by mean. Example: 2.64 / 9.27 = 0.28 Variance : Square of S D i.e., VAR = Σ (xi – x)2 / n Example: (2.64)2 = 6.97 Coefficient of variation is Coefficient of SD multiplied by 100 Example : 0.28 x 100 = 28 Note: Coefficient of SD is a relative measure and is often used for comparing with similar measure of other series
  • 77. Research Methodology 8 M S Sridhar, ISRO 77 Univariate Measures: B. Dispersion 3. Standard Deviation ¾ SD is very satisfactory and most widely used measure of dispersion ¾ amenable for mathematical manipulation ¾ it is independent of origin, but not of scale ¾ If SD is small, there is a high probability for getting a value close to the mean and if it is large, the value is father away from the mean ¾ does not ignore the algebraic signs and it is less affected by fluctuations of sampling ¾ SD is calculated using (i) Actual mean method , (ii) Assumed mean method (iii) Direct method (iv) Step deviation method For frequency of grouped or interval data σ = √ [∑ fi (x i – ⎯x)2 / ∑ f i ] Indirect method uses assumed average formula σ = {√ [(∑ƒd2 / n) - (∑ƒd )2) / n2] } Where, d = Distance of class from the assumed average class n = ∑ fi , i.e., σ =√ fi (xi – A)2 /Σ fi - Σ fi (xi – A)2 /Σ fi For discrete data assumed average formula is σ = √ Σ (xi – A)2 / n - Σ (xi – A) 2 / n
  • 78. Research Methodology 8 M S Sridhar, ISRO 78 Price (in Rs.) (class) Frequency (f) (No. of books) Cumulative less than or equal frequency (cf) Distance of class from the assumed average class (d) fd d2 fd2 1 -- 10 2 2 -4 -8 16 32 11 -- 20 8 10 -3 -24 9 72 21 -- 30 15 25 -2 -30 4 60 31 -- 40 13 38 -1 -13 1 13 41 -- 50 4 42 0 0 0 0 51 -- 60 4 46 1 4 1 0 61 -- 70 2 48 2 8 4 8 71 -- 80 1 49 3 3 9 9 81 -- 90 0 50 4 4 16 0 Total 50 -56 194 Calculation of the SD (σ) from a frequency distribution of grouped or interval data of price (in Rs.) of popular science books in Kannada (Table 8.12) using assumed average method is shown below: n = 50 ∑ƒd = - 56 ∑ ƒd2 = 194 i = 10 σ = {√ [(∑ƒd2 / n) - (∑ ƒd )2) / n2] } (i) = {√ [(194 / 50) - (-56)2) / 502] } (10) = {√ [(3.88) - (1.2544)] } (10) = {√ 2.6256 } (10) = {1.6204} (10) = 16.204
  • 79. Research Methodology 8 M S Sridhar, ISRO 79 TABLE: Calculation of the standard deviation (σ) Data represent weights of 265 male freshman students at the university of Washington Class –Interval (Weight) ƒ d ƒd ƒde 90 - 9 . . . . . . . 1 -5 -5 25 Σ ƒd2 Σ ƒd 2 100 -109 . . . . . . . 1 -4 -4 16 σ = ------- - ------- (i) 110 - 119 . . . . . . . 9 -3 -27 81 √ N N 120 – 129 . . . . . . . 30 -2 -60 120 130 - 139 . . . . . . . 42 -1 -42 42 931 99 2 140 - 149. . . . . . . 66 0 0 0 = ------ - ….. (10) 150 - 159 . . . . . . . 47 1 47 47 √ 265 265 160 - 169 . . . . . . . 39 2 78 156 170 - 179 . . . . . . . 15 3 45 135 = ( √ 3.5132 - .1396 )(10) 180 - 189 . . . . . . . 11 4 44 176 = (1.8367) (10) 190 - 199 . . . . . . . 1 5 5 25 = 18.37 or 18.4 200 - 209. . . . . . . 3 6 18 108 N= 265 Σƒd = 99 Σ ƒd2 = 931 D = (Xi – A) N = Σ fi SD for grouped data (indirect method using assumed average)
  • 80. Research Methodology 8 M S Sridhar, ISRO 80 TABLE : Means, standard deviation, and coefficients of variation of the age distributions of four groups of mothers who gave birth to one or more children in the city of minneapolis: 1931 to 1935. CLASSIFICATION X σ C V Resident married………... 28.2 6.0 21.3 Non-resident married…… 29.5 6.0 20.3 Resident unmarried……... 23.4 5.8 24.8 Non-resident unmarried… 21.7 3.7 17.1 SD for grouped data (indirect method using assumed average) …contd.
  • 81. Research Methodology 8 M S Sridhar, ISRO 81 Absolute and relative measures of dispersion The absolute measures give the answers in the units in which original values are expressed. They may give misleading ideas about the extent of variation especially when the averages differ significantly The relative measures (usually expressed in percentages) overcome the above drawbacks. Some of them are: i) Coefficient of range = (L – S) / (L + S) (L is largest value and S is smallest value) ii) Coefficient quartile deviation iii) Coefficient of MD iv) Co-efficient of variation Note: Relative measures are free from the units in which the values have been expressed. They can be compared even across different groups having different units of measurement Lorenz curve is a graphical measure of dispersion. It uses the information expressed in a cumulative manner to indicate the degree of variability. It is specially useful in comparing the variability of two or more distributions
  • 82. Research Methodology 8 M S Sridhar, ISRO 82 Univariate Measures: B. Dispersion 4. Quartiles There are some positional measures of non-central location where it is necessary to divide the data into equal parts. They are quartiles, deciles and percentiles (The quartiles & the median divide the array into four equal parts, deciles into ten equal groups, and percentiles into one hundred equal groups) Quartiles : Measures dispersion when median is used as average Lower quartile: Value in the array below which there are one quarter of the observations Upper quartile: Value in the array below which there are three quarters of the observations Interquartile range: Difference between the quartiles Interquartile range can be called a positional measure of variability While range is overly sensitive to the number of observations, the interquartile range can either decrease or increase when further observations are added to the sample Useful as a measure of dispersion to study special collections of data like salaries of employees Example: 4 6 7 8 9 10 11 11 12 13 Lower quartile is 7; Upper quartile is 11; Interquartile range is 4
  • 83. Research Methodology 8 M S Sridhar, ISRO 83 Normal Distribution To understand skewness (asymmetry), testing of hypotheses (Part 9) and interpretation of data (part 10) it is necessary to know about normal distribution. The normal frequency distribution is developed from frequency histogram with large sample size and small cell intervals. The normal curve being a perfect symmetrical curve (symmetrical about µ), the mean, median and the mode of the distribution are one and the same (µ = M = Z). The curve is uni-modal and bell-shaped and the data values concentrate around the mean. The sampling distributions based on a parent normal distributions are manageable analytically. The normal curve is not just one curve but a family of curves which differ only with regard to the values of μ and σ , but have the same characteristics in all other respects. Height is maximum at the mean value and declines as we go in either direction from the mean and tails extend indefinitely on both sides. The first and the third quartiles are equidistant from the mean. The height is given by an equation 1 Y = --------------- e -1/2(X-µ/σ)2 √σ2π
  • 84. Research Methodology 8 M S Sridhar, ISRO 84 Normal Distribution …contd. It is a special continuous distribution. Great many techniques used in applied statistics are based on this. Many populations encountered in the course of research in many fields seems to have a normal distribution to a good degree of approximation (I.o.w., nearly normal distributions are encountered quite frequently). Sampling distributions based on a parent normal distributions are manageable analytically Definition: The random variable x is said to be normally distributed if density function is given by F(x) OR n (x) = 1 e- (x - μ )2 / 2 σ2 √2Πσ Where ∞⁄ ∞ n (x) dx = 1 and - ∞ < x < ∞ (Since n(x) is given to be a density function, it implied that n(x) dx = 1) When the function is plotted for several values of σ (standard deviation) , a bell shaped curve as shown below can be seen. Changing µ (mean) merely shifts the curves to the right or left without changing their shapes. The function given actually represents a two- parameter family of distributions, the parameters being µ and σ2 (mean and variance)
  • 85. Research Methodology 8 M S Sridhar, ISRO 85 Normal Distribution …contd. The experimenter musts know, at least approximately, the general form of the distribution function which his data follow. If it is normal, he may use the methods directly; if it is not, he may transform his data so that the transformed observations follow a normal distribution. When experimenter does not know the form of his population distribution, then he must use other more general but usually less powerful methods of analysis called non- parametric methods An important property of normal distribution for researchers is that if x follows normal distribution and the area under the normal curve is taken as 1, then, the probability that x is within 1 Standard deviation of the mean is 68% 2 “ 95% 3 “ 97.7%
  • 86. Research Methodology 8 M S Sridhar, ISRO 86 Normal Curves
  • 87. Research Methodology 8 M S Sridhar, ISRO 87 Z-score or standardised normal deviation The area under the normal curve bounded by the class interval for any given class represents the relative frequency of that class. The area under the curve lying between any two vertical lines at points A and B along the X-axis represents the probability that the random variable x takes on value in that interval bounded by A and B. By finding the area under the curve between any two points along the X-axis we can find the percentage of data occurring within these two points. The computed value Z is also known as the Z-score or standardised normal deviation. Actually, the value of Z follows a normal probability distribution with a mean of zero and standard deviation of one. This probability distribution is known as the standard normal probability distribution. This allows us to use only one table of areas for all types of normal distributions. The standard table of Z scores gives the areas under the curve between the standardised mean zero and the points to the right of the mean for all points that are at a distance from the mean in multiples of 0.01σ. It should be noted that only the areas are to be subtracted or added. Do not add or subtract the Z scores and then find the area for the resulting value.
  • 88. Research Methodology 8 M S Sridhar, ISRO 88 Z-score or standardised normal deviation …contd. TABLE – Normal Distribution Z Prob. Z Prob. Z Prob. 3.0 .999 0.8 .788 -1.4 .081 2.8 .997 0.6 .726 -1.6 .055 2.6 .995 0.4 .655 -1.8 .036 2.4 .992 0.2 .579 -2.0 .023 2.2 .986 0.0 .500 -2.2 .014 2.0 .977 -.2 .421 -2.4 .008 1.8 .964 -.4 .345 -2.6 .005 1.6 .945 -.6 .274 -2.8 .003 1.4 .919 -.8 .212 -3.0 .001 1.2 .885 -1 .159 1.0 .841 -1.2 .115 The Standardised normal often used is obtained by assuming mean as zero (µ = 0) and SD as one (σ = 1). Then, x scale µ-3σ µ -2σ µ-σ µ µ+σ µ+2σ µ +3σ z scale -3 -2 -1 0 +1 +2 +3 z = (xi - µ) / σ
  • 89. Research Methodology 8 M S Sridhar, ISRO 89 Univariate Measures: C. Measure of Asymmetry (Skewness) Example: 4 6 7 8 9 10 11 11 11 12 13 Skewness = 9.27-11 = -1.73 or 9.27-10 = -0.73 j = -1.73/2.64 = -0.66 or 9-0.73) X 3 / 2.64 = - 0.83 Hence negatively skewed. Check the following for positive skewness 7, 8, 8, 9, 9, 10, 12, 14, 15, 16, 18 Normal Distribution of items in a series is perfectly symmetrical. Curve drawn from normal distribution which is bell shaped, shows no asymmetry (skewness), i.e., X = M = Z for a normal curve. Asymmetrical distribution which has skewness to the right, i.e., curve distorted on the right is positive skewness (Z> M> X ) and the curve distorted to the left is negative skewness (Z > M> X) (see figure) Skewness: The difference between the mean, median or mode, i.e., Skewness = X – Z OR X – M Coeff. of skewness (J) = X – Z / σ OR 3 ( X – M ) / σ Skewness shows the manner in which the items are clustered around the average; Useful in the study of formation of series and gives idea about the shape of the curve Kurtosis is a measure of flat-topped ness of a curve i.e, humped ness Indicates the nature of distribution of items in the middle of a series(Mesokurtic: Kurtic in the centre, i.e. normal curve, Leptokurtic:More peaked than the normal curve, Platykurtic: More flat than the normal curve)
  • 90. Research Methodology 8 M S Sridhar, ISRO 90 Normal Curve and Skewness
  • 91. Research Methodology 8 M S Sridhar, ISRO 91 Relationship Between Measures of Variability (M D, S D and Semi-interquartile Range)
  • 92. Research Methodology 8 M S Sridhar, ISRO 92 Summary of Examples Summary of examples: 4 6 7 8 9 10 11 11 11 12 13 Univariate Measures: A. Central Tendency 1. Mean 9.27 2. Median M 10 3. Mode Z 11 4. G.M. 5. H.M B. Dispersion 1. Range 9 2. Mean deviation 2.25 3. Coefficient of MD 0.24 4. Standard deviation2.64 5. Coefficient of SD 0.28 6. Coefficient of variation 28 7. Variance 6.97 8. Lower quartile 7 9. Upper quartile 11 10. Inter quartile range 4 C. Asymmetry 1. Skewness w.r.t. Mode 1.73 w.r.t. Median 0.73 2. Coefficient of Skewness w.r.t. Mode 0.66 w.r.t. .Median 0.8 Home work: 7, 8, 8, 9, 9, 10, 12, 14, 15, 16, 18
  • 93. Research Methodology 8 M S Sridhar, ISRO 93 Bivariate & Multivariate Measures A. Relationship ¾To find relation of 2 or more variables ¾If related, directly or inversely & degree of relation ¾Is it cause and effect relationship ? ¾If so, degree and direction 1. Association (Attributes) (I) Cross tabulation (ii) Yule’s co-efficient of association (iii) Chi- square test (iv) Co-efficient of mean square contingency 2. Correlation (Quantitative) (I) Spearman’s (Rank) coefficient of correlation (ordinal) (ii) Pearson’s coefficient of correlation (iii) Cross tabulation and scatter diagram 3. Cause and Effect (Quantitative) (I) Simple (linear) & regression (ii) Multiple (complex correlation & regression (iii) Partial correlation B. Other Measures / Techniques 1. Index number 2. Time series analysis 3. Anova 4. Anocova 5. Discriminant analysis 6. Factor analysis 7. Cluster analysis 8. Model building
  • 94. Research Methodology 8 M S Sridhar, ISRO 94 Measure Measure 1. Pearson product 1. Pearson product moment moment 2. Rank order or 2. Rank order or Kendall’s Kendall’s tau tau 3. Correlation ratio, 3. Correlation ratio, ( (eta eta) ) 4. 4. Intraclass Intraclass 5. 5. Biserial Biserial, , Point Point biserial biserial 6. Phi coefficient 6. Phi coefficient 7. Partial Correlation 7. Partial Correlation Nature of Variables Nature of Variables Two continuous variables; interval or ratio scale Two continuous variables; ordinal scale One variable continuous, other either continuous or discrete One variable continuous, other discrete; interval or ratio scale One variable continuous, other a) Continuous but dichotomised, or b) true dichotomy Two true dichotomises; nominal or ordinal series Three or more continuous variables Comment Comment Relationship linear Relationship nonlinear Purpose: to determine within- group similarity Index of item discrimination (used in item analysis) Purpose: to determine relationship between two variables, with effect of the held constant 8. Multiple 8. Multiple correlation correlation 9.Kendall’s 9.Kendall’s coefficient of coefficient of concordance concordance Three or more continuous variables Three or more continuous variables.; ordinal series Purpose: to predict one variable from a linear weighted combination of two or more independent variables Purpose; to determine the degree of (say, interrater) agreement Common Measures of Relationship
  • 95. Research Methodology 8 M S Sridhar, ISRO 95 Measures / Tests of Association 1. Cross Tabulation 9Useful in finding relationship in nominal data 9But not a powerful form of measure / test 9Classify each variable into two or more categories 9Begin with a two-way table to see whether there is interrelationship between variables 9Then cross classify the variables in subcategories to look for interaction between them (I) Symmetrical relationship: Two variables vary together, but neither is due to the other (assumed) (ii) Reciprocal relationship: Two variables mutually influence or reinforce each other (iii) Asymmetrical relationship: If one (individual) variable is responsible for change in the other (dependent) variable 9 Attempt can also be made to see / find the conditional relationships by introducing the third factor and cross-classifying the three variables. Ie. To see whether X affects Y only when Z is held constant 9 Cross tabulate a dependent variable (of importance) to one or more independent variable 9 Show the percentages in the cells of cross tabulation 9 Look for valid (not spurious) explanations 9 Ask whether differences are statistically significant?
  • 96. Research Methodology 8 M S Sridhar, ISRO 96 Example: Given below is the data regarding reference queries received by a library. Is there a significant association between gender of user and type of query ? L R S R Total query query Male users 17 18 35 Female users 3 12 15 Total 20 30 50 Expected frequencies are worked out like E11 = 20X35 / 50 = 14 Expected frequencies are: L S Total M 14 21 35 W 6 9 15 Total 20 30 50 Cells Oij Eij (Oij - Eij) (Oij - Eij )2 / Eij 1,1 17 14 3 9/14 = 0.64 1,2 18 21 -3 9/21 = 0.43 2,1 3 6 -3 9/6 = 1.50 2,2 12 9 3 9/9 = 1.00 Total (∑) χ2 = 3.57 df = (C-1) (r-1) = (2-1) (2-1) = 1 Table value of χ2 for 1 df at 5 % significance is 3.841. Hence association is not significant.
  • 97. Research Methodology 8 M S Sridhar, ISRO 97 2. Association : Yule’s Coefficient of association QAB = (AB) (ab) – (AB) (aB) (AB) (ab) + (AB) (aB) (AB) = Freq. of class AB in which aA and B are present. (Ab) = Freq. of class Ab in which aA is present but B is absent QAB takes values between + 1 and –1 indicates degree of association. IF (AB) > (A) (B) expected Freq. Then AB are positively associated. N IF (AB) < (A) (B) expected Freq. Then A& B are independent. N I.e., QAB = 0 IMMUNITY Ex : PRESENT ABSENT PRESENT A. INOCULATION ABSENT Total 5 X 4 - 2 X 1 18 QAB = ---------------------- = -------- = 0 . 82 5 X 4 + 2 X 1 22 (A) (B) 7 x 6 (AB) = 5 > -------------- = ---------- = 3.5 N 12 (AB) (Ab) (ab) (ab) 5 2 7 1 4 5 6 6 12 The association of A and B in the population may be due to attribute C. In such a case partial association (as against total association) between A and B is determined by Qabc = (ABC) (abC) – (ABC) (aBc) / (ABC) (abC) – (ABC) (aBc) Illusory Association : there is no real association between A & B but both are associated with third attribute. Reasons : (i) A and B are not properly defined (ii) A and B are not properly / correctly recorded
  • 98. Research Methodology 8 M S Sridhar, ISRO 98 Attribute A A1 A2 A3 A4 B1 (A1B1) (A2B1) (A3B1) (A4B1) (B1) B2 (A1B2) (A2B2) (A3B2) (A4B2) (B2) B3 (A1B3) (A2B3) (A3B3) (A4B3) (B3) B4 (A1B4) (A2B4) (A3B4) (A4B4) (B4) Total (A1) (A2) (A3) (A4) N Total Attribute B N (a) (A) Total (B) (b) (aB) (a b) (AB) (A b) B b a A Total Attribute Attribu te Reduced to 2x2 Table Note: Larger than 2X2 tables have to be reduced to 2X2 by combining some classes to use this method 4 X 4 Contingency Table 2. Association : Yule’s Coefficient of association contd.
  • 99. Research Methodology 8 M S Sridhar, ISRO 99 Yule’s Coefficient of Association contd. Example 1 : The number of books issued on random sample of days in 2005 and 2006 are as follows 2005 2006 36 37 34 78 28 97 89 89 32 37 22 34 39 33 44 22 27 114 49 33 114 35 33 17 Example 2 : Data on the number of books issued from a library during the course of a week (both actual and expected) Day Actual Expected Mon 39 42.17 Tue 14 42.17 Wed 21 42.17 Thu 47 42.17 Fri 36 42.17 Sat 96 42.17 Total 253 253(.02)
  • 100. Research Methodology 8 M S Sridhar, ISRO 100 Yule’s Coefficient of Association contd. Example 3 : In 1984 - 5, a library authority spent Rs.550 000 on books and Rs.140 000 on other items. In 1987- 8, the authority spent Rs.810 000 on staff, Rs.330 000 on books and Rs.210 000 on other items. Did the pattern of expenditure change significantly between 1984-5 and 1987-8 ? The observed data can be compiled into a contingency table as shown : Contingency table of observed frequencies Expenditure (‘000s) Year Staff Books Other Total 1984-5 550 230 140 920 1987-8 810 330 210 1350 Total 1360 560 350 2270 A table of expected frequencies can be deduced as shown : Contingency table of expected frequencies Expenditure (‘000s) Year Staff Books Other Total 1984-5 551.19 226.96 141.85 920 1987-8 508.81 333.04 208.15 1350 Total 1360 560 350 2270
  • 101. Research Methodology 8 M S Sridhar, ISRO 101 2. Correlation : i. Cross Tabulation, Correlation Table & Scatter Diagram Frequency of use of a number of documents of different ages Doc. Age of Frequency No. Doc. of use (years) (times /year) 1 1 40 2 3 18 3 2 30 4 4 21 5 3 26 6 5 10 7 4 13 8 3 35 Correlation table of age and frequency of use of documents Freq. of use Age of doc.(yrs) Total (times / year) 1 2 3 4 5 1-10 1 1 11-20 1 1 2 21-30 1 1 1 3 31-40 1 1 2 Total 1 1 3 2 1 8 Monthly totals of books, Journals and Reports issued from a library Month Reps Bks Jls Total Jan 465 3216 713 4394 Feb 513 3215 686 4414 Mar 425 3126 996 4547
  • 102. Research Methodology 8 M S Sridhar, ISRO 102 2. Correlation : i. Cross Tabulation, Correlation Table & Scatter Diagram contd.
  • 103. Research Methodology 8 M S Sridhar, ISRO 103 Correlation Scatter Diagrams
  • 104. Research Methodology 8 M S Sridhar, ISRO 104 ii. Spearman’s Coefficient of (Rank Order) Correlation Only between two variables which are ordinal in nature; helps to decide whether two sets of ranks differ and to the extent they offer 6Σ di 2 di = O. b. betn ranks of 6th pair of the two variables rs = 1 - -------------- n = No.of pairs of observations n (n2 – 1) Example: Boys and girls were questioned about their reading interest and asked to put various types of novel into their order of performance, with the following results: Type of novel Rank orders di di 2 Boys Girls Animal stories 4 2 2 4 Historical novels 3 3 0 0 Romances 5 1 4 16 War stories 1 5 -4 16 Westerns 2 4 -2 4 Σ di 2 40 6 x 40 rs = l - ------------- = 1 - 2 = -1 5 x 24 (that means perfect negative correlation) rs varies from +1 to -1 rs = 0 indicates that two sets of rankings are dissimilar / independent
  • 105. Research Methodology 8 M S Sridhar, ISRO 105 ii. Spearman’s Coefficient of (Rank Order) Correlation contd. Homework: Given below is the mean scores on a 5 point scale about the nature & type of information required by a group of physicists and another group of mechanical engineers. Find the correlation of their rankings ? (carryout t-test for 5% significance level) Physicists Mech. Engrs. A. State of the art 2.60 1.17 B. Theoretical background 2.98 2.71 C. Experimental results 2.67 2.34 D. Methods, processes & procedures 2.62 2.07 E. Product, material & equipment information 2.45 2.23 F. Computer programs & model building info. 2.00 0.85 G. Standard & patent spec. 0.93 2.15 H. Physical, technical & design data 3.05 2.65 I. S & T news 2.29 2.53 J. General information 1.21 0.92
  • 106. Research Methodology 8 M S Sridhar, ISRO 106 ∑ (Xi—X) (Yi—Y) r = ---------------------------- n --- x . y ∑ Xi Yi - n. X . Y r = --------------------------------- ASSUMING ZERO AS MEAN √ ∑ Xi 2 - n X2 √ ∑ Yi 2 – n Y2 ∑ d xi .dyi - ∑ d xi . ∑ .dyi WITH ASSUMED AVERAGES n n Ax and A y r = --------------------------------------------- ∑ d xi = ∑ (Xi – A x) ∑ d xi 2 ∑ d xi 2 ∑ d yi 2 ∑ d yi 2 ∑ d yi = ∑ (Yi – A y) √ n n n n ∑ d xi 2 = ∑ (Xi – A x)2 ∑ d yi 2 = ∑ (Yi – A y)2 ∑ d xi . Σ d yi 2 = ∑ (Xi – A x) (Yi – A y) √ iii. Pearson’s (Product Moment) Coefficient of Correlation (Simple Correlation)
  • 107. Research Methodology 8 M S Sridhar, ISRO 107 Most widely used method and assumes (i) linear relationship (ii) variables are not causally related (iii) a 2 distribution of observations of booth variables ∑ (Xi - X) (Yi -Y) r = -------------------------------------------------- ∑ ( Xi - X ) 2 . √ ∑ ( Yi 2 –Y ) 2 √ EXAMPLE : The following table gives the approximate number of abstracts (in thousand) in a selection of volumes of abstracts together with the cost of each volume: umber of abstracts (X) Cost (Y) (thousands) (£) 36.7 115 8.5 52 12.5 75 3.9 31 0.5 9 1.3 12 4.1 20 19.4 56 4.3 24 91. 2 39. 4 X = 10.1 Y = 4.4 iii. Pearson’s (Product Moment) Coefficient of Correlation (Simple Correlation)
  • 108. Research Methodology 8 M S Sridhar, ISRO 108 iii. Pearson’s Coefficient of Correlation (Simple Correlation) Example contd. GIVEN BELOW ARE THE AVERAGE NO.OF YEARS OF EXPERIENCE (X) AND THE AVERAGE NO.OF BOOKS BORROWED PER MONTH (Y) FIND THE PRODUCT MOMENT CORRELATION CETWEEN THE TWO X Y XY X2 Y2 105 – 5 X 3. 4 X 5. 24 1 2 2 1 4 r = ---------------------------------------- 2 4 8 4 16 √ 71 – 5 (3.4)2 √ 158 – 5 (5. 2 )2 4 5 20 16 25 5 7 35 25 49 105 – 88.4 16 . 6 5 8 40 25 64 = ----------------- = -------------- Total ------- ------- ------- ------- -------- 3.63 X 4.77 17 . 315 17 26 105 71 158 = + 0 . 96 X = 17/5 = 3 . 4 Y = 26/5 = 5.2 ∑ X2 = 71 ∑ Y2 = 158 ∑ XY = 105 NOTE : 1. Most widely used method 2. It assumes (i) linear relationship (ii) normal distribution (iii) variables are causally related. but does not indicate a cause and effect relationship (iv) it is neutral to chance in scale and origin 3. Value of r varies from +1 (perfect positve correlation) to –1 (perfect negative correlation). zero indicates the absence of association / relationship.
  • 109. Research Methodology 8 M S Sridhar, ISRO 109 iii. Pearson’s Coefficient of Correlation For Grouped Data contd FOR GROUPED DATA ∑ ∑ f ji ((Xi - X ) (Yf – Y) R = -------------------------------------------- ∑ fi ((Xi - X 2) ∑ ff (Yj - Y 2) SHORT CUT APPROACH WITH ASSUMED MEANS ∑ f j dxi . dyi - ∑ fi dxi . ∑ fi dyi n n n R = ----------------------------------------------------------------------- ∑ f j dxi 2 ∑ f j dx i 2 - ∑ fjdyj 2 - ∑ fjdy j 2 n n √ n n HOMEWORK : following are the average no.of ref. queries asked (x) and the average no.of books inoensed (y) during a study of library users for three months. find the correlation coefficient between the two. carry out a t-test for 5% significance level. X Y t-TEST FOR SIGNIFICANCE 5 4 6 3 n – 2 n - 2 1 2 t = r ----------- t = rs ------------- 4 6 √ 1 – r2 √ 1 – rs 2 2 3 (ANS r = + 0.41) EX: t = (0.96) (5 – 2)/ 1 – (0 . 96)2 = 5. 939 Df = n – 1 = 5 – 1 = 4; Tabulated value of t for 4 d.f. for two tabled test out 0.5% significance level is 4.264 . Hence r is significant of 0.5% significance …contd
  • 110. Research Methodology 8 M S Sridhar, ISRO 110 Prefrence for violent TV in the 3rd grade Prefrence for violent TV in the 3rd grade 0.05 0.01 0.21 -0.05 0.38 0.31 Aggression in the 3rd grade Aggressio n in the 13th grade The Correlation Between a Preference for Violent Television and Peer-reted Aggression for 211 Boys Over a 10-Year lag
  • 111. Research Methodology 8 M S Sridhar, ISRO 111 3. Cause & Effect Relationships • Visual inspection of correlational table and scatter diagram indicates existence and direction of relation • Correlation coefficient shows the magnitude as well as direction of relationship • Regression analysis shows the cause and effect relationship, ie., independent variable (X) is the cause and dependent variable (Y) is the effect i. Regression Analysis (Simple / Liner) ¾ Describes in quantitative terms the underlying (cause & effect) relationship or correlation between two sets of data (two variables) ¾ Helps predicting value of dependent variable for a given / known value of independent variable regression equation of Y on X (simple / liner) Y = a + bX Y = Estimated value of Y for a give value of X(a and b are constants) a = Parameter which tells at what value the straight line cuts the Y axis b = Slope or grdient of the regression line, i.e., unit change in X produces a change of b in Y Note : 1. The relationship between X & Y may take any form but here it is assumed to be linear ie., straight line 2. Practical data may fit near or closer to straight line 3. The objective is to fit a regression line with minimum error values (difference between the observed values and expected values) 4. To find the ‘best’ fit the largest square method is used
  • 112. Research Methodology 8 M S Sridhar, ISRO 112 3. Cause & Effect Relationships : I. Regression analysis contd TO FIND THE ‘REST’ FIT THE LEAST SQUARE METHOD IS USED Y Y Y X X X X X X X X X X X X X NO RELATIONSHIP ACTUAL VALUES BEST FIT THE LEAST SQUARE METHOD PROVIDES TWO NORMAL EQNS. TO DETERMINE CONSTANTS a AND b ∑ Y = n a + b ∑ X ∑ XY = a ∑ X + b ∑ X2 TSS = ∑ (Y – Y ) 2 RSS = ∑ (Y – Y ) 2 THE BASIC EQN IS Y = a + b x +E ESS = ∑ (Y – ŷ ) 2 TSS = RSS + ESS RSS / 1 F = -------------- ESS / n –2 EXAMPLE : Given below are the estimated use of a library (Y) for a corresponding expenditure on promotion and user orientation (x). Fit best regression line. Estimate the use (I.e., predict) for an expenditure of Rs.8000 . If the library would like to reach a level of use of 70,000 what should be the expenditure on promotion and user-orientation. …contd.
  • 113. Research Methodology 8 M S Sridhar, ISRO 113 X Ŷ (Estimate) X Ŷ X2 Y (EXPECTED) ERROR ( In thousands (In ten thousands) of Rs. 5 4 20 25 4 . 02 -0 . 02 6 3 18 36 4 . 32 -1 . 32 1 2 2 1 2 . 82 -0 . 82 4 6 24 16 3 . 72 +2 . 28 2 3 6 4 3 . 12 -0 . 12 Tot 18 18 70 82 18 0 ∑ X ∑ Ŷ ∑ X Ŷ ∑ X2 NOTE: IF r IS CALCULATED ∑X , ∑ Ŷ, ∑X Y & ∑ X2 ARE READILY AVAILABLE 18 = 15a + 18 b ⇒ a = 2.52 ⇒ y = 2.52 + 0.3 x 70 = 18a + 82 b b = 0.3 (i) X = 8 (Rs 80000/-) ⇒ Y = 2.52 + 0.3 x 8 = 4.92 i.e Rs 49200/- (ii) Y = 7 (70,000) ⇒ 7 = 2.52 + 0.3 x = x ⇒ 14.93 i.e Rs 14930/- REGRESSION COEFFICIENT & COEFFICIENT OF DETERMINATION CONSTANT b IS CALLED REGRESSION COEFFIENT REGRESSION OF X ON Y IS X= α + βY RSS b β = r2 COEFFICIENT OF DETERMINATION r2 = -------- VARIES BTEWEEN ZERO AND + 1 TSS AS r2 TEND CLOSER TO + 1 IND. VAR EXPLAINS THE MOVEMENTS IN T HE DEP.VAR r IS THE CORRELATION COEFFICIENT & VARIES FROM –1 TO +1 Homework : No. of Age Books X Y 5 23 6 35 8 41 12 58 15 75 3. Cause & Effect Relationships : i. Regression Analysis contd..
  • 114. Research Methodology 8 M S Sridhar, ISRO 114 INVOLVES TWO OR MORE INDEPENDANT VARIABLES Ŷ = a + b1 x1 + b2 x2 NORMAL EQUATIONS ARE ∑ yi = n a + b1 ∑ x 1i + b1 ∑ x 2i ∑ x1i Yi = a ∑ x 1i + b1 ∑ x1i 2 + b2 ∑ x1i x2i ∑ x2i yi = a ∑ x2i + b1 ∑ x1i x2i + b2 ∑ x2i 2 PROBLEM OF MULTICOLLINEARITY REGRESSION COEFFICIENTS b1 AND b2 BECOME LESS RELIABLE IF THERE IS A HIGH DECREE OF CORRELATION BETWEEN IND. VAR. X1 AND X2 .THE COLLECTIVE EFFECT OF INO. VAR X1 AND X2 IS GIVEN BY THE COEFFICIENT OF MULTIPLE CORRELATION b1 ∑ xi x1i - n y x1 + b2 ∑ yi x2i - n y x2 Ry. X1 x2 = --------------------------------------------------- √ ∑ Yi – n Y x1i = (x1i – x1) b1 ∑ x1i yi + b2 ∑ x2i yi x2i = (x2i – x2) OR ------------------------------ y i = ( yi – y) √ ∑ Yi2 ii. Multiple Correlations and (Non-Linear or Complex) Regressions
  • 115. Research Methodology 8 M S Sridhar, ISRO 115 iii. Partial Correlation (iii) PARTIAL CORRELATION measures, separately, the relationship betn two variables (i.e. dep. and a particular ind. variables) by holding all other variables constant FIRST SIMPLE COEFFICIENTS OF CORRELATIONS BETN EACH PAIR OF VARIABLES HAVE TO BE CALCULATED FOR EXAMPLE, FIRST ORDER COEFFICIENT (OF PARTIAL CORRELATION) MEASURING EFFECT OF X ON Y IS GIVEN BY R2 y. x1x2 – r2 y x2 r yx1. x2 = ------------------------- 1 - r2 yx1 ryx1 – ryx2 . rx1x2 OR ---------------------------- √ 1- r2 yx2 √ 1- r2 x1x2
  • 116. Research Methodology 8 M S Sridhar, ISRO 116 4. Other Measures : A. Index numbers A. Index numbers Index number is a device to measure the magnitude of (I) Change in the price, quantity or value of an item or more, usually a group of items over time or (ii) Difference between the two similarly measured quantities Example Total number of issues of volumes of non-fiction by a library in a number of years Year 1960 1961 1962 1963 1964 Number of issues 8094 9288 8416 9271 8233 i. Fixed Base Index Simple indexes for issues of volumes of non-fiction by a library in a number of years (1960 = 100) Year 1960 1961 1962 1963 1964 Index 100 114.75 103.98 114.54 101.72
  • 117. Research Methodology 8 M S Sridhar, ISRO 117 4. Other Measures A. Index numbers contd. ii. Chain Base Index Chain base index for issues of volumes of non-fiction by a library in a number of years Year 1961 1962 1963 1964 Index 114.75 90.61 110.16 88.80 the change or difference is expressed as a ratio or % of a stated base or starting date, period or quantity which is given a value of 100 points Value in given year Fixed base index = --------------------------------- X 100 Value in base year Value in given year Chain base index = ----------------------------------- X 100 Value in previous year An item in the index is given its due weight in accordance with its importance in the whole index Price in given year X Qty in base year Base year weighted index = ---------------------------------------------------------- X 100 Price in base year X Qty in base year Price in given year X Qty in given year Given year weighted index = ------------------------------------------------------X 100 Price in base year x Qty in given year
  • 118. Research Methodology 8 M S Sridhar, ISRO 118 4. Other Measures A. Index numbers contd. • Index number is a special type of average used to measure the level of a given phenomenon as compared to the level of the same phenomenon at some standard date I.o.w reducing the figure to a common base (eg: converting the series into a series of index numbers) to study the chances in the effect of such factors which are incapable of being measured directly • They are approximate indicators & give only a fair idea of changes. • index numbers prepared for a purpose cannot be used for other purposes or same purpose at other places. Cchances of error also remain in them. Examples: 1. Library use index = 1/100 no. of pages of xerox copies of reading material taken during a year + 2 times no. of documents borrowed through ILL + 5 times no. of visits to library during 3months sample seat occupancy study + mean no. of documents borrowed during the year (both circulation sample and collection sample) 2. Library interaction index = No. of documnts sugested + no. of documents indented + no. of documents reserved + 2times no. of literature search service availed + no of short range ref. Queries placed YEAR 1 2 3 4 5 Chain base 100 103 107 110 115 Fixed base 100 100x103 103x107 110.2x110 121.2x115 100 100 100 100 100 =103 = 110.2 = 121.2 =139.4
  • 119. Research Methodology 8 M S Sridhar, ISRO 119 4. Other Measures B. Time series analysis contd. B. Time series analysis Time series: Series of successive observations of a phenomenon over a period of time – When individual variable is time in a cause and effect relationship of regression analysis type it is time series analysis – It helps to estimate/ predict the future Components of time series 1. Secular or long term trend (T) 2. Short term oscillations : (i) Cyclical variations(C) (usually more than a year) (ii) Seasonal variations(S) (usually within a year) 3. Irregular or erratic variations (I) Random fluctuations & completely unpredictable like riots, natural calamities, etc.
  • 120. Research Methodology 8 M S Sridhar, ISRO 120 4. Other Measures B. Time series analysis contd. Methods of isolating and measuring trend 1. Free hand method 2. Semi-average method 3. Method of moving average 4. Method of least squares Method of moving averages ¾ By smoothening out fluctuations, helps to detect the trend ¾ Choosing appropriate period, the method can also help to find out short term variations (ie cyclical & seasonal) as well ¾ In addition, use of seasonal index helps to account for seasonal variations ¾ Moving average helps to reduce seasonal variations while finding trend Example: 1. The following are daily issues of junior non-fiction from a library (public library) Day Week 1 Week 2 Week 3 Mon 36 46 66 Tue 31 55 76 Wed 25 37 40 Thu 55 80 74 Fri 45 66 90 Sat 90 115 150 METHOD OF LEAST SQUARES TAKING ‘t’ AS IND.VAR. THE EQN.FOR SECULAR TREND IS Ŷ = a + b t NORMAL EQNS. ARE ∑ Ŷ = n a + b ∑ t ∑ t Ŷ = a ∑ t + b ∑ t2 n = NO OF YEARS ENABLES FORCEASTING FUTURE VALUES OF Y FORM Y = a + b t
  • 121. Research Methodology 8 M S Sridhar, ISRO 121 4. Other Measures B. Time series analysis contd. Times series analysis of data of example 1 Week Day Number of issues Moving average Cyclical variation 1. M 36 T 31 W 25 T 55 47.9 +7.1 F 45 50.7 -5.7 S 90 53.7 +36.3 2. M 46 56.8 -10.8 T 55 60.6 -5.6 W 37 64.4 -27.4 T 80 68.2 +11.8 F 66 71.6 -5.6 S 115 73.6 +41.4 3. M 66 73.3 -7.3 T 76 74.8 +1.2 W 40 79.8 -39.8 T 74 F 90 S 150
  • 122. Research Methodology 8 M S Sridhar, ISRO 122 4. Other Measures B. Time series analysis contd. Home work Daily visitors to a public library Day Week 1 Week 2 Sun 900 800 Mon 400 500 Tue 500 300 Wed 600 300 Thu 300 400 Fri 700 600 Sat 1100 900 Solution: Trend : Upward, i.e., Increasing daily issues Cyclic variation: Difference between the moving average (expected) and corresponding actual figure of issues are markedly high on Saturday and very low on Wednesday
  • 123. Research Methodology 8 M S Sridhar, ISRO 123 Monthly statistics of no. of searches executed on a CD-ROM database by PG students is as follows Year 1 Year 2 Year 3 Month No. of 12 month No. of 12 month No. of 12month searches moving Av searches moving Av searches moving Av JAN 50 60 68.3 50 71.7 FEB 50 50 68.3 40 71.7 MAR 50 60 68.3 70 72.5 APR 60 70 69.2 80 71.7 MAY 70 80 70.0 90 70.8 JUN 80 64.2 90 69.2 100 71.7 JUL 90 65.0 90 68.3 100 AUG 90 65.0 90 67.5 90 SEP 60 65.8 60 68.3 70 OCT 60 66.7 70 69.2 60 NOV 50 67.5 60 70.0 50 DEC 60 68.3 50 70.8 60 Secular trend: Increasing Seasonal variation: Maximum during Mar-May (may be exam ? Seasonal index is more useful in accounting seasonal variations)
  • 124. Research Methodology 8 M S Sridhar, ISRO 124 Quarterly statistics of user visit to a special library Year 1 Year 2 Year 3 Quarter No. of % of No. of % of No. of % of Average User Quarterly User Quarterly User Quarterly % of Visits Visits Visits Average Index 1 2000 80 3000 100 4000 100 93.3 2 3000 120 3500 116.7 5000 125 120.6 3 2000 80 2000 66.7 3000 75 73.9 4 3000 120 3500 116.7 4000 100 112.2 Total 10000 12000 16000 Q Average 2500 3000 4000 If the first quarter of 4th year records 6000 user visits estimate the average quarterly visits for that year Ans. 6000/93.3 X100 = 6431
  • 125. Research Methodology 8 M S Sridhar, ISRO 125 Method of least squares Assumes linear relation and that the past behaviour continues to persist in future Y = na+b∑t Normal equations ∑Y = na + b ∑t ∑tY = a ∑t + b ∑t2 Example: ∑t = 0, ∑Y = 177, ∑tY = 171, ∑t 2 = 280, n=15 ⇒ 177= 15a +b x 0 171 = a x 0 + b x 280 ⇒ a = 11.8 b = 0.61 Hence the trend regression line is Y = 11.8 + 0.61t To find out the t=9 (i.e., say sales for 1986) Y =11.8 + 0.61 x 9 = 17.29 Note: For variable ‘t’ midpoint of time is taken as origin. Ex: -2, -1, 0, 1, 2, (odd nos.) -3, -2, -1, 1, 2, 3 (even nos.)
  • 126. Research Methodology 8 M S Sridhar, ISRO 126 4. Other Measures B. Time series analysis contd. Measurement of seasonal variations: 1. Ratio to trend method 2. Ratio to moving averages method 3. Link relative method Measurement of cyclic variations: 1. Harmonic analysis 2. Spectrum analysis Note: Residuals remaining after elimination of seasonal and trend components be recorded & plotted graphically for visual comparison of residual variations which are attributed to cyclic and irregular / erratic components
  • 127. Research Methodology 8 M S Sridhar, ISRO 127 ANOVA • Testing the difference among different groups of data for homogeneity. • Useful to investigate (I) Any number of factors which are hypothesised or said to influence the dependent variable (ii)The differences amongst various categories within each of these factors which may have a large number of values Example (one way ANOVA) The number of books stored per shelf in a library may be of interest. If a random sample of shelves is selected and the number of books on each shelf are counted, the quantitative data collected can be presented in a frequency table as shown in the figure …contd.
  • 128. Research Methodology 8 M S Sridhar, ISRO 128 Frequency table showing variation of no. of books stored per shelf according to subject category (example of one way ANOVA) contd. Number of shelves Books per shelf Geography X1 Law X2 production X3 total 16 1 1 17 0 18 0 19 0 20 0 21 3 3 22 0 23 1 1 4 24 0 25 4 4 26 3 3 27 0 28 ! 1 2 29 1 1 30 2 2 4 31 0 32 1 1 33 1 1 3 34 1 1 35 0 36 2 2 37 0 38 0 39 0 40 0 41 0 42 0 43 1 1
  • 129. Research Methodology 8 M S Sridhar, ISRO 129 ANOVA (Example of one way ANOVA) contd. Frequency table showing variation of number of books stored per shelf in a random sample of shelves Books per Number of Books per No. of shelves shelf shelves shelf 16 1 30 4 17 0 31 0 18 0 32 1 19 0 33 3 20 0 34 1 21 3 35 0 22 0 36 2 23 4 37 0 24 0 38 0 25 4 39 0 26 3 40 0 27 0 41 0 28 2 42 0 29 1 43 1 One-way ANOVA considers one factor, i.e., No. of books per shelf
  • 130. Research Methodology 8 M S Sridhar, ISRO 130 Steps (example of one way ANOVA) contd. 1. OBTAIN MEAN OF EACH SAMPLE I.E. X1, X2, X3 X1+ X2+ X3 2. FIND MEAN OF THE SAMPLE MEANS, I.E. X = --------------- 3 (k) 3. FIND SUM OF SQUARES FOR VARIANCE BETN THE SAMPLES, I.E., SS BETWEEN = n1 (X1 – X)2 + n2 (X2 – X)2 + n3 (X3 – X)2 4. CALCULATE VARIANCE OR MEAN SQUARE BETN. SAMPLES, SS BETWEEN DF = k-1 I.E., MS BETWEEN = --------------------- 2 = 3 - 1 =2 5. SUM OF SQUARES FOR VARIANCE WITHIN SAMPLES SS WITHIN = ∑ (X1i – X1)2 + ∑ (X2i – X2)2 + ∑ (Y3i – X3)2 6. VARIANCE OF MEAN SQUARE WITHIN SAMPLES SS WITHIN DF = n - k MS WITHIN = -------------------- n = n1 + n2 + n3 + ….. n - k
  • 131. Research Methodology 8 M S Sridhar, ISRO 131 Frequency table showing variation of number of books stored per shelf according to subject category (example of one way ANOVA) …contd. 7. CHECK SS FOR TOTAL VARIATION = ∑(Xij – X)2 = SS BETWEEN + SS WITHIN AND (n-1) = (k – 1) + (n – k) MS BETWEEN 8. F RATIO = -------------------- MS WITHIN Note: Compare with table value of F. If it is equal or more than table value difference is significant and hence 1. samples could not have come from the same universe or 2. the independent variable has a significant effect on dependant variable. More the value of F ratio more definite and sure about the conclusions
  • 132. Research Methodology 8 M S Sridhar, ISRO 132 M S Sridhar, ISRO Testing of Hypotheses 132 1. Anderson, Jonathan, et. al. Thesis and assignment writing. New Delhi: Wiley, 1970. 2. Best, Joel. Damned lies and statistics. California: University of California Press, 2001. 3. Best, Joel. More damned lies and statistics; how numbers confuse public issues. Berkeley: University of California Press, 2004 4. Body, Harper W Jr. et.al. Marketing research: text and cases. Delhi: All India Traveler Bookseller, 1985. 5. Booth, Wayne C, et. al. The craft of research. 2 ed. Chicago: The University of Chicago Press, 2003. 6. Chandran, J S. Statistics fdor business and economics. New Delhi: Vikas, 1998. 7. Chicago guide to preparing electronic manuscripts: For authors and publishers. Chicago: The University of Chicago Press, 1987. 8. Cohen, Louis and Manion, Lawrence. Research methods in education. London: Routledge, 1980. 9. Goode, William J and Hatt, Paul K. Methods on social research. London; Mc Graw Hill, 1981. 10. Gopal, M.H. An introduction to research procedures in social sciences. Bombay: Asia Publishing House, 1970. 11. Koosis, Donald J. Business statistics. New York: John Wiley,1972. References References
  • 133. Research Methodology 8 M S Sridhar, ISRO 133 M S Sridhar, ISRO Testing of Hypotheses 133 12. Kothari, C.R. Research methodology: methods and techniques. 2 ed., New Delhi: Vishwaprakashan, 1990. 13. Miller, Jane E. The Chicago guide to writing about numbers. Chicago: the University of Chicago Press, 2004. 14. Rodger, Leslie W. Statistics for marketing. London: Mc-Graw Hill, 1984. 15. Salvatoe, Dominick. Theory and problems of statistics and econometrics (Schaum’s outline series). New York: McGraw-Hill, 1982. 16. Spiegel, Murray R. Schauim’s outline of theory and problems of statistics in SI units. Singapore: Mc Graw Hill , 1981. 17. Simpson, I. S. How to interpret statistical data: a guide for librarians and information scientists. London: Library Association, 1990. 18. Slater, Margaret ed. Research method in library and information studies. London: Library Association, 1990. 19. Turabian, Kate L. A manual for writers of term papers, theses, and dissertations. 6 ed. Chicago: The University of Chicago, 1996. 20. Young, Pauline V. Scientific social surveys and research. New Delhi: Prentice-Hall of India Ltd., 1984. 21. Walizer, Michael H and Wienir, Paul L. Research methods and analysis: searching for relationships. New York: Harper & Row, 1978. 22. Williams, Joseph M. Style: towards clarity and grace. Chicago: The University of Chicago Press, 1995. References References … …Contd. Contd.
  • 134. Research Methodology 8 M S Sridhar, ISRO 134 About the Author Dr. M. S. Sridhar is a post graduate in Mathematics and Business Management and a Doctorate in Library and Information Science. He is in the profession for last 36 years. Since 1978, he is heading the Library and Documentation Division of ISRO Satellite Centre, Bangalore. Earlier he has worked in the libraries of National Aeronautical Laboratory (Bangalore), Indian Institute of Management (Bangalore) and University of Mysore. Dr. Sridhar has published 4 books, 81 research articles, 22 conferences papers, written 19 course materials for BLIS and MLIS, made over 25 seminar presentations and contributed 5 chapters to books. E-mail: sridharmirle@yahoo.com, mirlesridhar@gmail.com, sridhar@isac.gov.in ; Phone: 91-80-25084451; Fax: 91-80-25084476.