2. INTRODUCTION
• The steps of preparing the data is done through following steps
• Editing,
• Coding, and
• Tabulating
• So that it is ready for any kind of statistical analysis, in order to achieve the
research objectives we had made earlier.
3. DATA EDITING
• Data editing is the process that involves detecting and correcting errors (logical
inconsistencies) in data.
• After collection, the data is subjected to processing.
• Processing requires that the researcher must go overall the raw data forms and check
them for errors.
• While carrying out the editing the researcher needs to ensure that:
• The data obtained is complete in all respects.
• It is accurate in terms of information recorded and responses sought.
• Questionnaires are legible and are correctly understood.
• The response format is in the form that was instructed.
• The data is structured in a manner that entering the information will not be a problem.
• The editing process is carried out at two levels, the first of these is field editing and
the second is central editing.
4. DATA EDITING
• Field Editing ( While Collection of Data)
• Usually, the preliminary editing of the information obtained is done by the field
investigators or supervisors who review the filled forms for any inconsistencies,
non-response, illegible responses or incomplete questionnaires.
• Thus the errors can be corrected immediately and if need be the respondent who
filled in the form, can be contacted again.
5. DATA EDITING
• Centralized in-house Editing (After Collection of Data)
• The second level of editing takes place at the researcher’s end.
• Backtracking: The best and the most efficient way of handling unsatisfactory
responses is to return to the field, and go back to the respondents. This technique is
best used for industrial surveys but a little difficult in individual surveys.
• Allocating missing values: This is a contingency plan that the researcher might need
to adopt in case going back to the field is not possible. Then the option might be to
assign a missing value to the blanks or the unsatisfactory responses.
• Plug value: When the variable being studied is the key variable, then sometimes
the researcher might insert a plug value. Sometimes one can plug an average or a
neutral value in such cases, for example a 3 for a five-point scale
• Discarding unsatisfactory responses: If the response sheet has too many
blanks/illegible or multiple responses for a single answer, the form is not worth
correcting and editing.
6. CODING
• The process of identifying and denoting a numeral to the responses given by a
respondent is called coding.
• This is essentially done in order to help the researcher’s in recording the data in a
tabular form later.
• It is advisable to assign a numeric code even for the categorical data (e.g., gender).
• In fact, even for open-ended questions, which are in a statement form, we will try
to categorize them into numbers.
• The reason for doing this is that the graphic representation of data into charts and
figures becomes easier.
• For example, the gender of a person is one field and the codes used could be 0 for
males and 1 for females.
• The data that is entered in the spreadsheet, such as on EXCEL, is in the form of a
• data matrix.
7. CODING
• When the questions are structured and the response categories are prescribed then
one does what is called pre-coding, i.e., giving numeral codes to the designed
responses before administration.
• However, if the questions are structured and the answers are open ended, one
needs to decide on the codes after the administration of the survey. This is called
post-coding.
• Coding Closed-ended Structured Questions
• Dichotomous questions: For dichotomous questions, which are on a nominal scale, the
responses can be binary, for example:
• Do you eat ready-to-eat food? Yes = 1; no = 0
• With more than two categories, give 1,2,3….. For each category e.g. 10th=1,12th =2,
+3=3,…..
• Scaled questions: For questions that are on a scale, usually an interval scale, the
question/statement will have a single column and the coding instruction would indicate what
number needs to be allocated for the response options given in the scale
8. CODING
• Coding Open-ended Structured Questions
• The coding of open-ended questions is quite difficult as the respondents’
exact answers are noted on the questionnaire.
• Then the researcher (either individually or as a team) looks for patterns
and assigns a category code.
• The following example is an open ended question
• If you think SIP is important for a student, please specify three most
significant benefits of SIP.
• Find most repeated words and assign codes to them.
9. CLASSIFICATION AND TABULATION OF DATA
• Sometimes, the data obtained from the primary instrument is so huge that it
becomes difficult to interpret.
• In such cases, the researcher might decide to reduce the information into
homogenous categories.
• This method of arrangement is called classification of data.
• The method of arranging data into homogeneous classes according to the common
features present in the data is known as classification.
• Geographical classification, Chronological classification, Qualitative
classification, Quantitative classification
10. CLASSIFICATION AND TABULATION OF DATA
This type of classification is made on the basis of
some measurable characteristics like height, weight,
age, income, marks of students, etc.
The population can be divided on the basis of
marital status (as married or unmarried)
In such a classification, data are classified either in
ascending or in descending order with reference to
time such as years, quarters, months, weeks, etc.
When data are classified with reference to
geographical locations such as countries, states,
cities, districts, etc.
11. CLASSIFICATION AND TABULATION OF DATA
• Once the categories and codes have been decided upon, the researcher needs to
arrange the same according to some logical pattern.
• This is referred to as tabulation of data.
• This involves an orderly arrangement of data into an array that is suitable for a
statistical analysis.
• Usually, this is an orderly arrangement of the rows and columns.
• In case there is data to be entered for one variable, the process is a simple
tabulation and, when it is two or more variables, then one carries out a cross-
tabulation of data.