1
Dr. Amitabh Mishra
Editing
of
Data
Dr. Amitabh Mishra 2
Editing
• “Editing is a step where by researchers eliminate errors
or points of confusion in the raw data”.
• “Editing detects the errors, correct them when possible
and certifies that minimum data quality standards have
been achieved”.
3
Dr. Amitabh Mishra
Objectives of Editing
• The purpose of editing is to guarantee that the
data are-
1. Accurate
2. Complete
3. Uniformly entered
4. Consistent with intent of questions
5. Arranged to simplify coding and tabulation.
4
Dr. Amitabh Mishra
NEED FOR EDITING
Editing is needed because-
1. Parts of the questionnaire may be incomplete
2. The pattern of responses may indicate that the respondent did
not understood or follow the instructions
3. The responses show little variance
4. One or more pages are missing
5. The questionnaire is answered by someone who does not
qualify for participation
5
Dr. Amitabh Mishra
Stages of Editing
• Editing can be done on two stages-
1. Field editing
2. Central editing
• Field editing is responsibility of field supervisor. During data
collection field worker/respondents often use abbreviations
and special symbols. soon after data have been gathered
interviewer must review the questionnaire.
6
Dr. Amitabh Mishra
• After the field work is done trained and experienced
editors check and edit each questionnaire
thoroughly.
• Editors identify the inconsistencies between the
answers.
• Editor’s task is identify the fake interviews (Fake
interviews can be identified by checking responses of open ended
questions).
Dr. Amitabh Mishra 7
Treatment of Unsatisfactory Results
1. Returning to the Field
(The questionnaires with unsatisfactory responses may be returned to the field,
where the interviewers re -contact the respondents)
2. Assigning Missing Values
(If returning the questionnaires to the field is not feasible, the editor may assign
missing values to unsatisfactory responses.)
3. Discarding Unsatisfactory Respondents
(In this approach, the respondents with unsatisfactory responses are simply
discarded)
8
Dr. Amitabh Mishra
Coding
of
Data
9
Dr. Amitabh Mishra
Coding
“Coding means assigning a code, usually a number, to each
possible response to each question.
“Coding involves assigning numbers or other symbols to
answer so the responses can be grouped in to a limited
number of classes or categories”- Cooper & Schindler
10
Dr. Amitabh Mishra
Example
S.N. Category Code
1 Male 1
2 Female 2
Dr. Amitabh Mishra 11
S.N. Category Code
1 Male M
2 Female F
S.N. Category Code
1 Male
2 Female
Rules of Coding
1. Appropriateness- categories should be
appropriate to research problem and objectives.
2. Exhaustiveness- there should be a class for
every data item. The researcher often uses
“other” option.
12
Dr. Amitabh Mishra
3. Mutually exclusivity- specific answers should be placed in
one and only one category.
Ex- In an occupation survey non mutually exclusive classification may
be-
a) Professional
b) Managerial
c) Sales
d) Clerical
e) Craft
f) Operative
g) Unemployed
13
Dr. Amitabh Mishra
Coding close-ended questions
• Dichotomous or multiple choice questions have response
category.
• While coding such questions numerical codes are provided to
each response category.
Response
category
Codes Response
category
Codes
Yes 1 Male 1
Do not know 2 Female 2
No 3
14
Dr. Amitabh Mishra
Coding open-ended questions
• Researcher should review each open question and
establish meaning full category .
Ex- How many cup of coffee/ tea you drink in a day?
If respondents
answered
Response category Code
More than 5 cups/day Heavy consumer 1
Between 2-5 cups/day Moderate consumer 2
Less than 2 cups/ day Light consumer 3
O cups/day Non consumer 4
15
Dr. Amitabh Mishra
Tabulation
of
Data
16
Dr. Amitabh Mishra
• “A table is a systematic arrangement of statistical data
in column and rows”.
• “Tabulation is a process where by raw data on
completed questionnaire are transformed in to the “list
of needed information”.
• The purpose of table is to simplify the presentation and
facilitate comparison.
17
Dr. Amitabh Mishra
U.S. Auto Sales 2003 - 2007
18
Dr. Amitabh Mishra
Significance of Tabulation
1. It simplifies the complex data
2. It facilitates comparison
3. It gives identity to the data
4. It reveals pattern
19
Dr. Amitabh Mishra
Parts of Table
1. Table number
2. Title of table
3. Caption
4. Stub
5. Body of table
6. Head notes
7. Foot notes
20
Dr. Amitabh Mishra
Stub heading
Caption
Column
heading
Column
heading
Column
heading
Column
heading
Stub entries
Stub entries
Stub entries
Body
Title of Table
Table number
Footnote
Head note
21
Dr. Amitabh Mishra
Types of Tabulation
1. Uni-variate Tabulation
2. Bi-variate Tabulation or Multivariate tabulation
22
Dr. Amitabh Mishra
Univariate Tabulation
• “Uni-variate tabulation counts one questions answer”
• Such a tabulation results in frequency distribution of
answers. As-
– No. of people who answered in first response category
– No. of people who answered in first response category. Etc.
23
Dr. Amitabh Mishra
• What is your opinion regarding mandatory fitting of airbags,
GPRS system, & seat belts in all the vehicles in country.
– In favor of
– Indifferent towards
– Opposed to
Number Percent (%)
In favor of 55 33.4
Indifferent towards 31 19.4
Opposed to 74 46.2
Total 160 100(%)
24
Dr. Amitabh Mishra
Approach B Approach C
Number
In favor of 55
Indifferent towards 31
Opposed to 74
Total 160
Percent
In favor of 34.4
Indifferent towards 19.4
Opposed to 46.2
Total 100 %
25
Dr. Amitabh Mishra
Bi-variate Tabulation
or
Multivariate tabulation
• In Bi-variate Tabulation or Multivariate tabulation the
researcher simultaneously tabulate the responses of
two or more questions.
26
Dr. Amitabh Mishra
EXAMPLE
1. What is your gender?
a) Male
b) Female
2. How often you use credit cards when purchasing PIZZA at Dominos.
a) Regularly
b) Occasionally
c) Never
27
Dr. Amitabh Mishra
Uses of credit cards for purchase of Dominos Pizza
Usages rate Male Female
Number Percent (%) Number Percent (%)
Regularly 20 10 100 50
Occasionally 60 30 80 40
Never 120 60 20 10
Total 200 100% 200 100%
28
Dr. Amitabh Mishra
General rules
for
tabulation
Dr. Amitabh Mishra 29
• There are no hard and fast rules for preparing a statistical table.
• “In collection and tabulation, common sense is the chief requisite
and experience is the chief teacher.” - Prof. Bowley
• However, the following points should be borne in mind while
preparing a table-
1. Table must contain all the essential parts, such as, table number,
title, head note, caption, etc .
2. Table should be simple to understand.
3. It should also be compact, complete and self-explanatory.
4. Table should be of proper size
Dr. Amitabh Mishra 30
5. Indicate a zero quantity by a zero and do not use zero to indicate such
information which is not available.
6. In case of non-availability of information, one should write N.A. or
indicate it by dash (-).
7. Ditto marks (,,) should be avoided in a table. Similarly the expression
‘etc’ should not be used in a table.
8. Table should not be overloaded with details.
9. Abbreviations should be avoided, particularly in titles and sub-titles
10. In all tables the captions and stubs should be arranged in some
systematic manner. (The manner of presentation may be alphabetically,
or chronologically depending upon the requirement).
Dr. Amitabh Mishra 31
11. The unit of measurement should be mentioned in the head
note.
12. The figures should be rounded off to the nearest hundred, or
thousand or lakh.
13. There should be a proper title to each table. It should tell
what exactly the table presents
Dr. Amitabh Mishra 32
Cross-
Tabulation
Dr. Amitabh Mishra 33
• “Cross tabulation (or crosstabs) is a statistical process that
summarizes categorical data to create a contingency table”.
• “Cross tabulation is a technique for comparing data from two
or more categorical variables”.
• Cross tabulation provide a basic picture of the interrelation
between two variables and can help find interactions between
them.
Dr. Amitabh Mishra 34
• While a frequency distribution describes one variable at a
time, a cross-tabulation describes two or more variables
simultaneously.
• It helps us to understand how one variable (such as brand) is
related to another variable (gender). Example- answer to
following questions can be determined by cross tabulation.
1. How many brand loyal users are male.
2. Is familiarity with new product related to age and education level.
3. Is product use (heavy user, medium users, light users, and non users) related
to interest in outdoor activities (high, medium and low).
35
Dr. Amitabh Mishra
Significance of Cross tabulation
1. A cross-tabulation gives you a basic picture of how two
variables inter-relate.
2. It can be easily interpreted and understood by managers who
are statistically oriented.
3. A series of cross tabulation can provide greater insights into a
complex phenomenon. Etc.
Dr. Amitabh Mishra 36
Example- Gender and Internet Usage
• Suppose we are interested in determining whether internet
usage is related to gender?
• For the purpose of cross-tabulation, respondents can be
classified as-
– Male and female.
– light users (whose reported use is less than 5 hrs.) & heavy users
(whose reported use is more than 5 hrs.)
37
Dr. Amitabh Mishra
Dr. Amitabh Mishra 38
Gender & Internet usage
Row
Internet Usage Male Female Total
Light (1) 5 10 15
Heavy (2) 10 5 15
Column Total 15 1 5
Types of Cross tabulation
• Cross tabulation can be-
1. Cross tabulation with Two variable or Bi variate
cross tabulation.
2. Cross-Tabulation with Three Variables
Dr. Amitabh Mishra 39
Two Variables Cross-Tabulation
• Since two variables have been cross-classified, percentages could be
computed either
– Column wise, based on column totals, or
– Row wise, based on row totals.
• The general rule is to compute the percentages in the direction of
the independent variable, across the dependent variable.
40
Dr. Amitabh Mishra
Internet Usage by Gender: Percentage
calculation column wise
Gender
Internet Usage Male Female
Light 33.3% 66.7%
Heavy 66.7% 33.3%
Column total 100% 100%
41
Dr. Amitabh Mishra
Gender by Internet Usage:
Percentage calculation row wise
Internet Usage
Gender Light Heavy Total
Male 33.3% 66.7% 100.0%
Female 66.7% 33.3% 100.0%
42
Dr. Amitabh Mishra

Editing, Coding & Tabulation

  • 1.
  • 2.
  • 3.
    Editing • “Editing isa step where by researchers eliminate errors or points of confusion in the raw data”. • “Editing detects the errors, correct them when possible and certifies that minimum data quality standards have been achieved”. 3 Dr. Amitabh Mishra
  • 4.
    Objectives of Editing •The purpose of editing is to guarantee that the data are- 1. Accurate 2. Complete 3. Uniformly entered 4. Consistent with intent of questions 5. Arranged to simplify coding and tabulation. 4 Dr. Amitabh Mishra
  • 5.
    NEED FOR EDITING Editingis needed because- 1. Parts of the questionnaire may be incomplete 2. The pattern of responses may indicate that the respondent did not understood or follow the instructions 3. The responses show little variance 4. One or more pages are missing 5. The questionnaire is answered by someone who does not qualify for participation 5 Dr. Amitabh Mishra
  • 6.
    Stages of Editing •Editing can be done on two stages- 1. Field editing 2. Central editing • Field editing is responsibility of field supervisor. During data collection field worker/respondents often use abbreviations and special symbols. soon after data have been gathered interviewer must review the questionnaire. 6 Dr. Amitabh Mishra
  • 7.
    • After thefield work is done trained and experienced editors check and edit each questionnaire thoroughly. • Editors identify the inconsistencies between the answers. • Editor’s task is identify the fake interviews (Fake interviews can be identified by checking responses of open ended questions). Dr. Amitabh Mishra 7
  • 8.
    Treatment of UnsatisfactoryResults 1. Returning to the Field (The questionnaires with unsatisfactory responses may be returned to the field, where the interviewers re -contact the respondents) 2. Assigning Missing Values (If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses.) 3. Discarding Unsatisfactory Respondents (In this approach, the respondents with unsatisfactory responses are simply discarded) 8 Dr. Amitabh Mishra
  • 9.
  • 10.
    Coding “Coding means assigninga code, usually a number, to each possible response to each question. “Coding involves assigning numbers or other symbols to answer so the responses can be grouped in to a limited number of classes or categories”- Cooper & Schindler 10 Dr. Amitabh Mishra
  • 11.
    Example S.N. Category Code 1Male 1 2 Female 2 Dr. Amitabh Mishra 11 S.N. Category Code 1 Male M 2 Female F S.N. Category Code 1 Male 2 Female
  • 12.
    Rules of Coding 1.Appropriateness- categories should be appropriate to research problem and objectives. 2. Exhaustiveness- there should be a class for every data item. The researcher often uses “other” option. 12 Dr. Amitabh Mishra
  • 13.
    3. Mutually exclusivity-specific answers should be placed in one and only one category. Ex- In an occupation survey non mutually exclusive classification may be- a) Professional b) Managerial c) Sales d) Clerical e) Craft f) Operative g) Unemployed 13 Dr. Amitabh Mishra
  • 14.
    Coding close-ended questions •Dichotomous or multiple choice questions have response category. • While coding such questions numerical codes are provided to each response category. Response category Codes Response category Codes Yes 1 Male 1 Do not know 2 Female 2 No 3 14 Dr. Amitabh Mishra
  • 15.
    Coding open-ended questions •Researcher should review each open question and establish meaning full category . Ex- How many cup of coffee/ tea you drink in a day? If respondents answered Response category Code More than 5 cups/day Heavy consumer 1 Between 2-5 cups/day Moderate consumer 2 Less than 2 cups/ day Light consumer 3 O cups/day Non consumer 4 15 Dr. Amitabh Mishra
  • 16.
  • 17.
    • “A tableis a systematic arrangement of statistical data in column and rows”. • “Tabulation is a process where by raw data on completed questionnaire are transformed in to the “list of needed information”. • The purpose of table is to simplify the presentation and facilitate comparison. 17 Dr. Amitabh Mishra
  • 18.
    U.S. Auto Sales2003 - 2007 18 Dr. Amitabh Mishra
  • 19.
    Significance of Tabulation 1.It simplifies the complex data 2. It facilitates comparison 3. It gives identity to the data 4. It reveals pattern 19 Dr. Amitabh Mishra
  • 20.
    Parts of Table 1.Table number 2. Title of table 3. Caption 4. Stub 5. Body of table 6. Head notes 7. Foot notes 20 Dr. Amitabh Mishra
  • 21.
    Stub heading Caption Column heading Column heading Column heading Column heading Stub entries Stubentries Stub entries Body Title of Table Table number Footnote Head note 21 Dr. Amitabh Mishra
  • 22.
    Types of Tabulation 1.Uni-variate Tabulation 2. Bi-variate Tabulation or Multivariate tabulation 22 Dr. Amitabh Mishra
  • 23.
    Univariate Tabulation • “Uni-variatetabulation counts one questions answer” • Such a tabulation results in frequency distribution of answers. As- – No. of people who answered in first response category – No. of people who answered in first response category. Etc. 23 Dr. Amitabh Mishra
  • 24.
    • What isyour opinion regarding mandatory fitting of airbags, GPRS system, & seat belts in all the vehicles in country. – In favor of – Indifferent towards – Opposed to Number Percent (%) In favor of 55 33.4 Indifferent towards 31 19.4 Opposed to 74 46.2 Total 160 100(%) 24 Dr. Amitabh Mishra
  • 25.
    Approach B ApproachC Number In favor of 55 Indifferent towards 31 Opposed to 74 Total 160 Percent In favor of 34.4 Indifferent towards 19.4 Opposed to 46.2 Total 100 % 25 Dr. Amitabh Mishra
  • 26.
    Bi-variate Tabulation or Multivariate tabulation •In Bi-variate Tabulation or Multivariate tabulation the researcher simultaneously tabulate the responses of two or more questions. 26 Dr. Amitabh Mishra
  • 27.
    EXAMPLE 1. What isyour gender? a) Male b) Female 2. How often you use credit cards when purchasing PIZZA at Dominos. a) Regularly b) Occasionally c) Never 27 Dr. Amitabh Mishra
  • 28.
    Uses of creditcards for purchase of Dominos Pizza Usages rate Male Female Number Percent (%) Number Percent (%) Regularly 20 10 100 50 Occasionally 60 30 80 40 Never 120 60 20 10 Total 200 100% 200 100% 28 Dr. Amitabh Mishra
  • 29.
  • 30.
    • There areno hard and fast rules for preparing a statistical table. • “In collection and tabulation, common sense is the chief requisite and experience is the chief teacher.” - Prof. Bowley • However, the following points should be borne in mind while preparing a table- 1. Table must contain all the essential parts, such as, table number, title, head note, caption, etc . 2. Table should be simple to understand. 3. It should also be compact, complete and self-explanatory. 4. Table should be of proper size Dr. Amitabh Mishra 30
  • 31.
    5. Indicate azero quantity by a zero and do not use zero to indicate such information which is not available. 6. In case of non-availability of information, one should write N.A. or indicate it by dash (-). 7. Ditto marks (,,) should be avoided in a table. Similarly the expression ‘etc’ should not be used in a table. 8. Table should not be overloaded with details. 9. Abbreviations should be avoided, particularly in titles and sub-titles 10. In all tables the captions and stubs should be arranged in some systematic manner. (The manner of presentation may be alphabetically, or chronologically depending upon the requirement). Dr. Amitabh Mishra 31
  • 32.
    11. The unitof measurement should be mentioned in the head note. 12. The figures should be rounded off to the nearest hundred, or thousand or lakh. 13. There should be a proper title to each table. It should tell what exactly the table presents Dr. Amitabh Mishra 32
  • 33.
  • 34.
    • “Cross tabulation(or crosstabs) is a statistical process that summarizes categorical data to create a contingency table”. • “Cross tabulation is a technique for comparing data from two or more categorical variables”. • Cross tabulation provide a basic picture of the interrelation between two variables and can help find interactions between them. Dr. Amitabh Mishra 34
  • 35.
    • While afrequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously. • It helps us to understand how one variable (such as brand) is related to another variable (gender). Example- answer to following questions can be determined by cross tabulation. 1. How many brand loyal users are male. 2. Is familiarity with new product related to age and education level. 3. Is product use (heavy user, medium users, light users, and non users) related to interest in outdoor activities (high, medium and low). 35 Dr. Amitabh Mishra
  • 36.
    Significance of Crosstabulation 1. A cross-tabulation gives you a basic picture of how two variables inter-relate. 2. It can be easily interpreted and understood by managers who are statistically oriented. 3. A series of cross tabulation can provide greater insights into a complex phenomenon. Etc. Dr. Amitabh Mishra 36
  • 37.
    Example- Gender andInternet Usage • Suppose we are interested in determining whether internet usage is related to gender? • For the purpose of cross-tabulation, respondents can be classified as- – Male and female. – light users (whose reported use is less than 5 hrs.) & heavy users (whose reported use is more than 5 hrs.) 37 Dr. Amitabh Mishra
  • 38.
    Dr. Amitabh Mishra38 Gender & Internet usage Row Internet Usage Male Female Total Light (1) 5 10 15 Heavy (2) 10 5 15 Column Total 15 1 5
  • 39.
    Types of Crosstabulation • Cross tabulation can be- 1. Cross tabulation with Two variable or Bi variate cross tabulation. 2. Cross-Tabulation with Three Variables Dr. Amitabh Mishra 39
  • 40.
    Two Variables Cross-Tabulation •Since two variables have been cross-classified, percentages could be computed either – Column wise, based on column totals, or – Row wise, based on row totals. • The general rule is to compute the percentages in the direction of the independent variable, across the dependent variable. 40 Dr. Amitabh Mishra
  • 41.
    Internet Usage byGender: Percentage calculation column wise Gender Internet Usage Male Female Light 33.3% 66.7% Heavy 66.7% 33.3% Column total 100% 100% 41 Dr. Amitabh Mishra
  • 42.
    Gender by InternetUsage: Percentage calculation row wise Internet Usage Gender Light Heavy Total Male 33.3% 66.7% 100.0% Female 66.7% 33.3% 100.0% 42 Dr. Amitabh Mishra