Data Collection Preparation

D A T A P R E P A R A T I O N | F R E Q U E N C Y
D I S T R I B U T I O N | C R O S S - T A B U L A T I O N
DATA COLLECTION
PREPARATION

DATA
• Data is anything that has been produced or
created during research. Primary data is data that
you have created yourself, but your data sets can
also contain data that has been created by
other researchers.

WHAT IS DATA COLLECTION?
• It is the process of gathering and measuring
information on variables of interest, in an
established systematic fashion that enables
one to answer stated research questions,
test hypotheses, and evaluate outcomes.

METHODS OF DATA COLLECTION
• A. Interview (Direct) Method – a
method of person-to-person exchange
between the interviewer and the
interviewee.

POSITIVE
• 1) It provides consistent and more precise
information since clarification maybe given by the
interviewee.
• 2) Questions maybe repeated or maybe
modified to suit the interviewee’s level of
understanding.

NEGATIVE
• 1) Time-consuming
• 2) Expensive
• 3) Limited field coverage

• Questionnaire (Indirect) Method – in
this method written responses are
given to prepared questions. A
questionnaire is used to elicit answers
to the problems of the study.
Questionnaires may be mailed or
hand-carried.

POSITIVE
• 1) Inexpensive
• 2) Can cover a wide area in a shorter span of time.
• 3) Respondents may feel a greater sense of
freedom to express views and opinions because
their anonymity is maintained.

NEGATIVE
• 1) There’s a strong possibility of non-response,
especially when questionnaires are mailed.
• 2) Questions not easily understood may not be
answered.

• C. Registration Method – this method of
gathering information is enforced by law.
E.g.
• Registration of births
• Deaths
• Vehicles
• Licenses
• Number of tourists in a City

POSITIVE
• 1) Information is kept systematized.
• 2) Information is always made available to
the public.

• D. Observation Method – the investigator
observes the behavior of the subject/respondent.
It is used when the subjects cannot talk or write.
POSITIVE
The recording of behavior at the appropriate time
and situation is made possible.

• E. Experiment Method - this method is used when
the objective is to determine the cause-and-effect
relationship of certain phenomena under
controlled conditions. It is usually used by scientific
researches.

1. MAKE LOGISTICS ARRANGEMENTS.
• In order to make logistics arrangements, you will
have to (1) set up central local headquarters, (2)
contact local authorities where the survey will be
carried out.

2. PREPARE THE QUESTIONNAIRE AND
TRAINING MATERIALS.
• You must pre-test the translated
questionnaire in the field.

2. PREPARE THE QUESTIONNAIRE AND
TRAINING MATERIALS.
Specifically, the pre-test should answer the following
questions:
• Are respondents willing to answer questions in the way
you have asked them?
• Are any of the questions particularly difficult to answer or
do they address sensitive issues?
• Are the questions well understood by the respondents?
• Is it necessary to create new codes for common answers
which were not included in the original questionnaire?

3. CHOOSING AND PREPARING THE
EQUIPMENT
• Equipment must be purchased well in advance of
the survey. Examples are Weighing scales ,
Length/Height Boards,etc.

4. QUESTIONNAIRE CHECKING AGAIN
• Questionnaire checking involves eliminating
unacceptable questionnaires. These questionnaires
may be incomplete, instructions not followed, little
variance, missing pages, past cut-off date or
respondent not qualified.

5. COLLECTING DATA AND ANALYSIS
• Editing: Editing looks to correct illegible, incomplete,
inconsistent and ambiguous answers.
• Coding: Coding typically assigns alpha or numeric
codes to answers that do not already have them so that
statistical techniques can be applied.
• Transcribing: Transcribing data involves transferring data
so as to make it accessible to people or applications for
further processing.

• Cleaning: Cleaning reviews data for consistencies.
Inconsistencies may arise from faulty logic, out
of range or extreme values.
• Statistical adjustments: Statistical adjustments applies
to data that requires weighting and scale
transformations.
• Analysis strategy selection: Finally, selection of a data
analysis strategy is based on earlier work in designing
the research project but is finalized after consideration
of the characteristics of the data that has been
gathered.

WHAT IS DATA PREPARATION?
• Data preparation is about constructing a dataset
from one or more data sources to be used for
exploration and modeling. It is a solid practice to
start with an initial dataset to get familiar with the
data, to discover first insights into the data and
have a good understanding of any possible data
quality issues.

DATA PREPARATION
• Organizing the data correctly can save a lot of time
and prevent mistakes.
• Most researchers choose to use a database or
statistical analysis program (Microsoft Excel, SPSS)
that they can format to fit their needs in order to
organize their data effectively.
• Once the data has been entered, it is crucial that
the researcher check the data for accuracy.

STEPS IN DATA PREPARATION
• 1. Checking the Data For Accuracy
• As soon as data is received you should screen it for
accuracy. In some circumstances doing this right away will
allow you to go back to the sample to clarify any problems or
errors.
• Are the responses legible/readable?
• Are all important questions answered?
• Are the responses complete?
• Is all relevant contextual information included (e.g., data,
time, place, researcher)?

• 2. EDITING
• Editing detects error and omission correct them as
far as possible.
• Purpose:
• To ensure accuracy
• To bring about consistency with other information
• Make sure the data is uniformly entered.
• It is complete and arranged to simplify coding and
tabulation.

• 3. ENTERING THE DATA INTO THE COMPUTER
• the analyst should use a procedure called double entry.
• This double entry procedure significantly reduces entry
errors.
• An alternative is to enter the data once and set up a
procedure for checking the data for accuracy.
• EXAMPLE: you might spot check records on a random basis.
• An alternative is to enter the data once and set up a
procedure for checking the data for accuracy.
• you will use various programs to summarize the data that
allow you to check that all the data are within acceptable
limits and boundaries.

• 4. DATA TRANSFORMATIONS
• Once the data have been entered it is almost always
necessary to transform the raw data into variables that are
usable in the analyses.
• Missing values
• Many analysis programs automatically treat blank values as missing.
In others, you need to designate specific values to represent missing
values.
• Item reversals
• On scales and surveys, we sometimes use reversal items to help
reduce the possibility of a response set. When you analyze the data,
you want all scores for scale items to be in the same direction where
high scores mean the same thing and low scores mean the same
thing. In these cases, you have to reverse the ratings for some of the
scale items.

• Scale totals
• Once you've transformed any individual scale items
you will often want to add or average across
individual items to get a total score for the scale.
• Categories
• For many variables you will want to collapse them
into categories. For instance, you may want to
collapse income estimates (in dollar amounts) into
income ranges.

FREQUENCY DISTRIBUTION TABLE
• Frequency tells you how often something occurs.
The frequency of an observation in statistics tells you
the number of times the observation occurs in the
data.
• Frequency distribution tables can show
either categorical variables (sometimes called
qualitative variables) or quantitative
variables (sometimes called numeric variables). You
can think of categorical variables as being
categories (like eye color or brand of dog food)
and quantitative variables as being numbers.

GROUPED AND UNGROUPED DATA
• UNGROUPED FREQUENCY
DISTRIBUTION
• The data obtained in original form are
called raw data or ungrouped data.
• In an ungrouped frequency distribution,
the results are in order.

UNGROUPED DATA
• In each of 20 homes, people were asked how many
cars were registered to their households. The results
were recorded as follows:
• 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0
Number of
cars (x)
Tally Frequency
(f)
0 4
1 6
2 5
3 3
4 2
Table 1. Frequency table for the number of cars registered in each household

GROUPED DATA
• UNGROUPED DATA
• a moderate range of frequencies are
gathered together and compared to
a similar range.

GROUPED DATA
• GROUPED DATA
• a moderate range of frequencies are gathered together and
compared to a similar range
• CLASS FREQUENCY
• Number of observations belonging to a class interval.
• CLASS INTERVAL
• Refers to the grouping defined by a lower limit and upper limit
• CLASS BOUNDARIES
• The lower and the upper true limits
• CLASS MARKS
• Midpoint of each class interval and it is obtained by getting the
average of the lower class limit and the upper class limit
• CLASS SIZE
• difference between the upper class boundary and lower class
boundary of a class interval.

GROUPED DATA
• Thirty AA batteries were tested to determine how
long they would last. The results, to the nearest
minute, were recorded as follows:
• 423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392,
408, 431, 401, 363, 391, 405, 382, 400, 381, 399, 415,
428, 422, 396, 372, 410, 419, 386, 390

GROUPED DATA
• The lowest value is 363 and the highest is 431.
• Using the given data and a class interval of 10, the
interval for the first class is 360 to 369 and includes
363 (the lowest value). Remember, there should
always be enough class intervals so that the highest
value is included.
• * Number of class intervals (ideal nc= 5 to 20)

GROUPED DATA
Battery life, minutes (x) Tally Frequency (f)
360–369 2
370–379 3
380–389 5
390–399 7
400–409 5
410–419 4
420–429 3
430–439 1
Total 30
Table 3. Life of AA batteries, in minutes

CLASS BOUNDARY
CB CM <CF >CF
359.5-369.5
369.5-379.5
379.5-389.5
389.5-399.5
399.5-409.5
409.5-419.5
419.5-429.5
429.5-439.5
364
374
384
394
404
414
424
434
2
12
22
32
42
52
62
72
82
30
28
25
20
13
8
4
1
1

CROSS-TABULATION
• Cross tabulation is a tool that allows you compare
the relationship between two variables.
• A cross-tabulation is a two (or more) dimensional
table that records the number (frequency) of
respondents that have the specific characteristics
described in the cells of the table.

• The Chi-square statistic is the primary statistic
used for testing the statistical significance of
the cross-tabulation table. Chi-square tests
whether or not the two variables are
independent.
CROSS TABULATION WITH CHI SQUARE
ANALYSIS

CHI SQUARE ANALYSIS
• The chi-square statistic is computed by first
computing a chi-square value for each individual
cell of the table and then summing them up to form
a total Chi-square value for the table. The chi-
square value for the cell is computed as:
(Observed Value – Expected Value)2 /
(Expected Value)

REMEMBER
• The chi-square statistic, along with the associated
probability of chance observation, may be
computed for any table. If the variables are related
(i.e. the observed table relationships would occur
with very low probability, say only 5%) then we say
that the results are “statistically significant” at the
“.05 or 5% level”. This means that the variables have
a low chance of being independent.

SPSS TUTORIAL FOR CROSS
TABULATION
What is SPSS?
"SPSS is a comprehensive system for analyzing data.
SPSS is the acronym of Statistical Package for
the Social Science
SPSS can take data from almost any type of file and
use them to generate tabulated reports, charts,
and plots of distributions and trends, descriptive
statistics, and complex statistical analysis."

Data Collection Preparation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Data Collection Preparation

Similar to Data Collection Preparation (20)

Recently uploaded

Recently uploaded (20)

Data Collection Preparation