Data Editing and Coding
Dr Raju Indukoori
Overview of the Stages of Data Analysis
Dr Raju Indukoori 2
EDITING
• The process of checking and adjusting
responses in the completed questionnaires for
omissions, legibility, and consistency and
readying them for coding and storage.
• Detects errors and omissions, corrects them
when possible, and certifies that minimum data
quality standards are achieved
Dr Raju Indukoori 3
Types of Editing
1. Field Editing
Preliminary editing by a field supervisor on
the same day as the interview to catch
technical omissions, check legibility of
handwriting, and clarify responses that are
logically or conceptually inconsistent.
2. In-house or Central Editing
Editing performed by a central office staff;
often dome more rigorously than field editing
Dr Raju Indukoori 4
Purpose of Editing
To ensure
1. Accuracy of data collected
2. For consistency between responses
3. Uniformity
4. For completeness in responses– to reduce
effects of item non-response
5. To facilitate and Simplify coding and
Tabulation.
6. To better utilize questions answered out of order
Dr Raju Indukoori 5
Editing for Completeness
1. Item Non-response
2. Plug Value
3. Impute
Dr Raju Indukoori 6
Editing for Completeness
1. Item Non-response
The technical term for an unanswered question on
an otherwise complete questionnaire resulting in
missing data.
Dr Raju Indukoori 7
Editing for Completeness
2. Plug Value
An answer that an editor “plugs in” to replace
blanks or missing values so as to permit data
analysis; choice of value is based on a
predetermined decision rule.
Dr Raju Indukoori 8
Editing for Completeness
3. Impute
To fill in a missing data point through the use of a
statistical algorithm that provides a best guess for
the missing response based on available
information.
Dr Raju Indukoori 9
Facilitating the Coding Process
• Data Clean-up
Checking written responses for any stray
marks
• Editing and Tabulating “Don’t Know” Answers
• Legitimate don’t know (no opinion)
• Reluctant don’t know (refusal to answer)
• Confused don’t know (does not understand)
Dr Raju Indukoori 10
Pitfalls of Editing
• Allowing subjectivity to enter into the editing
process.
• Data editors should be intelligent, experienced,
and objective.
• Failing to have a systematic procedure for
assessing the questionnaires developed by the
research analyst
• An editor should have clearly defined decision
rules to follow.
Dr Raju Indukoori 11
Pretesting Edit
Editing during the pretest stage can prove
very valuable for improving questionnaire
format, identifying poor instructions or
inappropriate question wording.
Dr Raju Indukoori 12
CODING
• Code is a numerical score or symbol.
• Coding is the process of identifying and
classifying each answer with a numerical
score or other character symbol
• Identifying responses with codes is
necessary if data is to be processed by
computer
Dr Raju Indukoori 13
What does coding do?
• It serves as a rule for interpreting,
classifying, and recording data
• It guides the establishment of category
sets
• Appropriate to the research problem and
purpose
• Exhaustive
• Mutually exclusive
• Derived from one classification principle
Dr Raju Indukoori 14
Contd…
1. Set Rules For Code Construction are:
2. Pre-Coding Fixed-Alternative Questions (FAQs)
3. Coding Open-Ended Questions
4. Maintaining a Code Book
5. Production Coding
6. Combining Editing and Coding
Dr Raju Indukoori 15
1. Set Rules For Code Construction
a) Coding categories should be exhaustive
b) Coding categories should be mutually exclusive and
independent
Dr Raju Indukoori 16
2. Pre-Coding Fixed-Alternative Questions (FAQs)
1. Pre-Coding Fixed-Alternative Questions (FAQs) -
Writing codes for FAQs on the questionnaire before the
data collection
2. Coding Open-Ended Questions - A 3-stage process:
(a) Perform a test tabulation, (b) Devise a coding scheme,
(c) Code all responses
Dr Raju Indukoori 17
3. Maintaining a Code Book
Code book that identifies each variable in a
study, the variable’s description, code name,
and position in the data matrix
Dr Raju Indukoori 18
Data Matrix
• It is a works sheet with coded data in a
rectangular form with data in rows and columns
• Rows representing cases and columns represent
variables.
• The data matrix is organized into fields, records,
and files
• Field: A collection of characters that represents a single
type of data
• Record: A collection of related fields, i.e., fields related
to the same case (or respondent)
• File: A collection of related records, i.e. records related to
the same sample
Dr Raju Indukoori 19
4. Production Coding
The physical activity of transferring the data
from the questionnaire or data collection form
[to the computer] after the data has been
collected. Sometimes done through a coding
sheet – ruled paper drawn to mimic the data
matrix
Dr Raju Indukoori 20
5. Combining Editing and Coding
• Finally the coding and editing is
combined for further analysis
Dr Raju Indukoori 21
AFTER CODING
1. Data Entry
2. Error Checking
3. Data Transformation
Dr Raju Indukoori 22
1. Data Entry
The transfer of codes from questionnaires (or
coding sheets) to a computer. Often
accomplished in one of three ways:
a)On-line direct data entry – Student data
base entered online by students.
b)Optical scanning – for highly structured
questionnaires.
c) Keyboarding – data entry via a computer
keyboard; often requires verification.
d) Voice Recognition
Dr Raju Indukoori 23
1. Data Entry Formats
• Full screen editors
• Spread Sheet
Dr Raju Indukoori 24
2. Error Checking
Verifying the accuracy of data entry
and checking for some kinds of obvious
errors made during the data entry. Often
accomplished through frequency
analysis.
Dr Raju Indukoori 25
3. Data Transformation
Converting some of the data from the format in
which they were entered to a format most suitable
for particular statistical analysis.
Often accomplished through re-coding, to:
• reverse-score negative (or positive) statements into
positive (or negative) statements;
• collapse the number of categories of a variable
Dr Raju Indukoori 26

Data editing and coding

  • 1.
    Data Editing andCoding Dr Raju Indukoori
  • 2.
    Overview of theStages of Data Analysis Dr Raju Indukoori 2
  • 3.
    EDITING • The processof checking and adjusting responses in the completed questionnaires for omissions, legibility, and consistency and readying them for coding and storage. • Detects errors and omissions, corrects them when possible, and certifies that minimum data quality standards are achieved Dr Raju Indukoori 3
  • 4.
    Types of Editing 1.Field Editing Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent. 2. In-house or Central Editing Editing performed by a central office staff; often dome more rigorously than field editing Dr Raju Indukoori 4
  • 5.
    Purpose of Editing Toensure 1. Accuracy of data collected 2. For consistency between responses 3. Uniformity 4. For completeness in responses– to reduce effects of item non-response 5. To facilitate and Simplify coding and Tabulation. 6. To better utilize questions answered out of order Dr Raju Indukoori 5
  • 6.
    Editing for Completeness 1.Item Non-response 2. Plug Value 3. Impute Dr Raju Indukoori 6
  • 7.
    Editing for Completeness 1.Item Non-response The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data. Dr Raju Indukoori 7
  • 8.
    Editing for Completeness 2.Plug Value An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis; choice of value is based on a predetermined decision rule. Dr Raju Indukoori 8
  • 9.
    Editing for Completeness 3.Impute To fill in a missing data point through the use of a statistical algorithm that provides a best guess for the missing response based on available information. Dr Raju Indukoori 9
  • 10.
    Facilitating the CodingProcess • Data Clean-up Checking written responses for any stray marks • Editing and Tabulating “Don’t Know” Answers • Legitimate don’t know (no opinion) • Reluctant don’t know (refusal to answer) • Confused don’t know (does not understand) Dr Raju Indukoori 10
  • 11.
    Pitfalls of Editing •Allowing subjectivity to enter into the editing process. • Data editors should be intelligent, experienced, and objective. • Failing to have a systematic procedure for assessing the questionnaires developed by the research analyst • An editor should have clearly defined decision rules to follow. Dr Raju Indukoori 11
  • 12.
    Pretesting Edit Editing duringthe pretest stage can prove very valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording. Dr Raju Indukoori 12
  • 13.
    CODING • Code isa numerical score or symbol. • Coding is the process of identifying and classifying each answer with a numerical score or other character symbol • Identifying responses with codes is necessary if data is to be processed by computer Dr Raju Indukoori 13
  • 14.
    What does codingdo? • It serves as a rule for interpreting, classifying, and recording data • It guides the establishment of category sets • Appropriate to the research problem and purpose • Exhaustive • Mutually exclusive • Derived from one classification principle Dr Raju Indukoori 14
  • 15.
    Contd… 1. Set RulesFor Code Construction are: 2. Pre-Coding Fixed-Alternative Questions (FAQs) 3. Coding Open-Ended Questions 4. Maintaining a Code Book 5. Production Coding 6. Combining Editing and Coding Dr Raju Indukoori 15
  • 16.
    1. Set RulesFor Code Construction a) Coding categories should be exhaustive b) Coding categories should be mutually exclusive and independent Dr Raju Indukoori 16
  • 17.
    2. Pre-Coding Fixed-AlternativeQuestions (FAQs) 1. Pre-Coding Fixed-Alternative Questions (FAQs) - Writing codes for FAQs on the questionnaire before the data collection 2. Coding Open-Ended Questions - A 3-stage process: (a) Perform a test tabulation, (b) Devise a coding scheme, (c) Code all responses Dr Raju Indukoori 17
  • 18.
    3. Maintaining aCode Book Code book that identifies each variable in a study, the variable’s description, code name, and position in the data matrix Dr Raju Indukoori 18
  • 19.
    Data Matrix • Itis a works sheet with coded data in a rectangular form with data in rows and columns • Rows representing cases and columns represent variables. • The data matrix is organized into fields, records, and files • Field: A collection of characters that represents a single type of data • Record: A collection of related fields, i.e., fields related to the same case (or respondent) • File: A collection of related records, i.e. records related to the same sample Dr Raju Indukoori 19
  • 20.
    4. Production Coding Thephysical activity of transferring the data from the questionnaire or data collection form [to the computer] after the data has been collected. Sometimes done through a coding sheet – ruled paper drawn to mimic the data matrix Dr Raju Indukoori 20
  • 21.
    5. Combining Editingand Coding • Finally the coding and editing is combined for further analysis Dr Raju Indukoori 21
  • 22.
    AFTER CODING 1. DataEntry 2. Error Checking 3. Data Transformation Dr Raju Indukoori 22
  • 23.
    1. Data Entry Thetransfer of codes from questionnaires (or coding sheets) to a computer. Often accomplished in one of three ways: a)On-line direct data entry – Student data base entered online by students. b)Optical scanning – for highly structured questionnaires. c) Keyboarding – data entry via a computer keyboard; often requires verification. d) Voice Recognition Dr Raju Indukoori 23
  • 24.
    1. Data EntryFormats • Full screen editors • Spread Sheet Dr Raju Indukoori 24
  • 25.
    2. Error Checking Verifyingthe accuracy of data entry and checking for some kinds of obvious errors made during the data entry. Often accomplished through frequency analysis. Dr Raju Indukoori 25
  • 26.
    3. Data Transformation Convertingsome of the data from the format in which they were entered to a format most suitable for particular statistical analysis. Often accomplished through re-coding, to: • reverse-score negative (or positive) statements into positive (or negative) statements; • collapse the number of categories of a variable Dr Raju Indukoori 26