Data Collection, Assessment of
Qualitative Data, Data Processing:
Key Issues
Bikash Sapkota
B. Optometry
Institute of Medicine, TU, Nepal
• Introduction to data
• Classification of data
• Collection of data
• Methods of data collection
• Assessment of qualitative data
• Processing of data
- Editing
- Coding
- Tabulation
- Graphical representation
Presentation Layout
What is data?
 Data are observations or evidences about the social world
 Data, the plural of datum, can be quantitative or qualitative
in nature
 ‘data is produced, not given’; that is, researchers choose
what to call data, it is not just ‘there’ to be ‘found’. (Marsh
1988)
- The Sage Dictionary of Social Research Methods
 The terms 'data' and 'information' are used interchangeably
 However the terms have distinct meanings
Data
Facts, events, transactions
which have been recorded
Input raw materials from
which information is
processed
Information
Data that have been
produced in such a way as
to be useful to the recipient
Basic data are processed in
some way to form
information
Data & Information
 The research studies in behavioral science are mainly
concerned with the characteristics or traits
 Thus, tools are administered to quantify these characteristics
- but all traits or characteristics can not be quantified
The data can be classified into two broad categories:
Data
Qualitative Data or
Attributes
Quantitative Data or
Variables
Nature of Data
Nature of Data
1.Qualitative Data or Attributes
The characteristics or traits for which numerical value
can not be assigned, are called attributes
e.g. gender, motivation, etc.
2. Quantitative Data or Variables
The characteristics or traits for which numerical value
can be assigned, are called variables
e.g. height, weight etc.
Constants
A constant is all characteristic or condition that is the same for
all the observed units or sample subjects of a study
Variables
The characteristic or the trait in the behavioral science which
can be quantified is termed as variable
Variables
Continuous variables Discrete variables
Variables
1. Continuous variables
 A characteristic whose observation can take any values over a
particular range
 It can assure either fractional or integral values
 E.g. wt. of children in kg, height of pt.
2. Discrete variables
 Are those on the other hand, which exist only in units not the
fractional value (usually units of one)
 E.g. No. of cataract pts. in a village, WBC count
Attribute vs. Variable
Attribute Variable
 A category of a characteristic,
to which a subject either
belongs or does not belong or
property that a subject either
possesses or does not possess
 The attributes are becoming
sick, describing blood group
etc.
 Variable describes a
characteristic in terms of a
numerical value, which is
expressed in units of
measurements
 The variables are height,
weight, blood pressure, age of
pts. etc.
Qualitative Data
 In such data there is no notion of magnitude of size of the
characteristic
 They are just categorized
 The data are classified by counting the individuals having the
same characteristics or attribute and not by measurement
 For examples: Gender: male/female
Disease: present/absent
Smoke: smoking/not smoking
 These data can be measured in nominal and ordinal scales
Quantitative Data
 Anything that can be expressed as a number, or quantity or
magnitude
 Describes characteristics in term of a numerical value, which are
expressed in units of measurements
 E.g. level of hemoglobin in the blood, no. of glaucoma pts., intra
ocular pressure, weight, etc.
 Quantitative observations: as each individual is represented by a
number
 These data can be measured in interval and ratio scales
Measurement Scale
 The choice of appropriate statistical technique depends
upon the type of data in question
Qualitative
Data
• Nominal Scale
• Ordinal Scale
Quantitative
Data
• Interval Scale
• Ratio Scale
Nominal Scale
 The least precise or crude of the 4 basic scales of
measurement
 Implies the classification of an item into 2 or more categories
without any extent or magnitude
 There is no particular order assigned to them
 The frequency or numbers are used to give a name to
something that may be used for determining per cent, mode
Eg. boys and girls; pass and fail; rural and urban
Ordinal Scale
 The ordinal scale is more precise scale than the nominal
scale
 The variables has been categorized or leveled with
meaningful natural order
 But there is no information about the interval
Eg. Pain: none, mild, moderate, severe
Interval Scale
 The interval scale is more precise and refined scale than
nominal and ordinal scales
 This scale has all the characteristics and relationship of the
ordinal scale, besides which distances between any two
numbers on the scale are known
 The size of interval between two observations can be
measured
Eg. The temperature of a body
Ratio Scale
 It has the same properties as an interval scale as well as a
true or absolute zero value
 The ratio scale numerals have the qualities of real
numbers, and can be added, subtracted, multiplied or
divided
Eg. Mean systolic BP
 Process of systematic gathering of data for a particular
purpose from various sources, that has been systematically
observed, recorded, organized
 It is the first step of statistical study
 There are several ways of collecting data
 The choice of procedures usually depends on the objectives
and design of the study and the availability of time, money
and personnel
Collection of Data
 To obtain information
 To keep on record
 To make decisions about important issues
 To pass information onto others
 For research study
Purpose of Data Collection
Data collection is an extremely important part of any
research because the conclusions of a study are
based on what the data reveal
How Important it is?
Nature, scope & objective of the enquiry
Sources of information
Availability of fund
Techniques of data collection
Availability of trained persons
Factors to be considered before data
collection
Example:
Documents
Creative works
Interviews
Man-made materials
Surveys
Example:
Unpublished thesis and
dissertations
Manuscript
Books
Journals
Sources of Data
Source of Data
External
Primary Data Secondary Data
Internal
Internal sources of Data
o Many institutions and
departments have information
about their regular functions ,
for their own internal
purposes
o When those information are
used in any survey is called
internal sources of data
o Eg. social welfare society
External sources of data
o When information is collected
from outside agencies is called
external sources of data
o Such types of data are either
primary or secondary
o This type of information can
be collected by census or
sampling method by
conducting survey
Internal & External Sources of Data
 Data collected by investigator from personal experimental
studies for a specific research goal is called primary data
 The data are collected specially for a research project
 Used when secondary data are unavailable and inappropriate
 Data are to be unique, original, reliable and accurate in nature
 Primary data hahe not been changed or altered by human
beings, therefore its validity is greater than secondary data
Primary Data
Demerits
Evaluated cost
Time consuming
More number of resources
are required
Inaccurate feedback
Required lot of skill with
labor
Targeted issues are
addressed
Data interpretation is better
Merits
High accuracy of data
Greater control
Address as specific research
issues
Primary Data
Interview (direct/indirect)
Schedule
Questionnaires survey
Focus group discussion (FGD)
Community forums and public hearings
Observation
Case studies
Key informants interview
Internet/E-mail/SMS
Primary Data Collection
Techniques
 The data is collected by the investigator personally, he/she
must be a keen observer
 He/she asks or cross-examines the informant and collects
necessary information
 It is original in character
Direct personal observation
Direct personal observation is adopted in the following cases
Where greater accuracy is needed
Where the field of enquiry is not large
Where confidential data are to be collected
Where sufficient time is available
Suitability of direct personal observation
Merits
Original data
True and reliable data
Encouraging response
because of personal
approach
A high degree of accuracy
Direct personal observation
Demerits
Unsuitable in large area
Expensive & time-consuming
Untrained investigator brings
worst results
Collection of information
according to the ease of the
informant
 The investigator approaches the witness or third parties,
who are in touch with the informant
 The enumerator interviews the people, who are directly or
indirectly connected with the problem under the study
 Generally this method is employed by different enquiry
committees and commissions
 The police department generally adopts this method to
get clues of thefts, riots , murders, etc.
Indirect oral interview
 It is more suitable when the area to be studied is large
 It is used when direct information cannot be obtained
 This system is generally adopted by governments
Suitability of indirect oral interview
Merits
 Simple and convenient
 Saves time, money and labor
 Useful in investigation of a large area
 Adequate information can be had
Demerits
 Information can’t be relied as absence of direct contact
 Interview with an improper man will spoil the results
 To get real data, a sufficient no. of people are to be interviewed
 Careless attitude of informant affects the degree of accuracy
Indirect oral interview
 The local agents or correspondents will be appointed, they
collect the information and transmit it to the office or person
 They do according to their own ways and tastes
 Adopted by newspapers, agencies, etc.
 The informants are generally called correspondents
 Suitable in those cases where the information is to be
obtained at regular intervals from a wide area
Information through agencies
Merits
Demerits
 Extensive information can be had
 It is the most cheap and economical method
 Speedy information is possible
 It is useful where information is needed regularly
 The information may be biased
 Degree of accuracy cannot be maintained
 Uniformity cannot be maintained
 Data may not be original
Information through agencies
 The questionnaires is sent to the respondents, there are blank
spaces for answers
 A covering letter is also sent along with the questionnaire,
requesting the respondent to extend their full cooperation
 Adopted by research workers, private individuals, non-officials
agencies and government
 Appropriate in cases where informants are spread over a wide
area
Mailed questionnaires
Merits
 Of all the methods, the mailed questionnaire is the most
economical
 It can be widely used, when the area of investigation is large
 It saves money, labor and time
Demerits
 Cannot be sure about the accuracy and reliability of the data
 There is long delay in receiving questionnaires duly filled in
Mailed questionnaires
 Very similar to the questionnaire method
 The main difference is that a schedule is filled by the
enumerator who is specially appointed for the purpose
 Enumerator goes to the respondents, asks them the
questions from the Performa in the order listed, and records
the responses in the space provided
 Enumerators must be trained in administering the schedule
Data Collection Through Schedules
 A detailed study of geographical area to gather data,
attitudes, impressions, opinions, satisfaction level etc., by
polling a section of the population
Census Survey
• Conducted
regularly at large
interval of time
Continuous
Survey
• Conducted
regularly and
frequently
Ad-hoc Survey
• Conducted at
specific times for
specific need
• ‘as and when’
required
Survey
Types
Merits
Cover large population
Less expensive
Information is accurate
Demerits
On small scale survey
avoided
Time consuming
Information does not
penetrate deeply
Researcher must have
good knowledge
Survey
 It is the method of comprehensive study of social unit which
may be a person, a family, an institution, an organization or a
community
Merits
Direct behavioral study
Real & personal
experience record
Make possible the
study of social change
Increase analysis
ability & skills
Demerits
One case almost different
from another case
Personal bias
Use only in limit sphere
More time & money
consuming
Case Study
 Useful to further explore a topic, providing a broader
understanding of why the target group may behave or
think in a particular way
 And assist in determining the reason for attitudes and
beliefs
 Conducted with a small sample of the target group and
 Used to stimulate discussion and gain greater insights
Focus Group Discussion
Merits
 Useful when exploring cultural values and health beliefs
 Can be used to explore complex issues
 Can be used to develop hypothesis for further research
 Do not require participants to be literate
Demerits
 Lack of privacy/anonymity
 Potential for the risk of ‘group think’
 Potential for group to be dominated by one or two people
 Group leader needs to be skilled at conducting focus groups, dealing
with conflict, drawing out passive participants
 Time consuming to conduct and analyse
Focus Group Discussion
 Application and combination of several research methods in the
study of the same phenomenon
 Researchers can hope to overcome the weakness or intrinsic biases
and the problems that come from single method, single-observer
and single-theory studies
 The purpose of triangulation in qualitative research is to increase
the credibility and validity of the results
Triangulation
Types (Denzin
1978)
Data
Triangulation
Investigator
Triangulation
Theory
Triangulation
Methodological
Triangulation
Beating the Bias
 Secondary data are those data which have been already
collected and analysed by some earlier agency for its own
use and later the same data are used by a different agency
Published Sources Unpublished
Sources
Sources of
Secondary Data
Secondary Data
Various governmental, international and local agencies
publish statistical data, and chief among them are:
 International publications: They are UNO, WHO, Nature, etc.
 Official publications of Government: Department of Drug
Administration, Central Bureau of Statistics
 Semi-Official publications: Semi-Govt. institutions like
Municipal Corporation, District Board, etc. publish reports
Published Sources
 Publications of Research Institutions: Nepal Development
Research Institute, Nepalese Journal of Ophthalmology etc.
publish the finding of their research program
 Journals and Newspapers: Current and important materials
on statistics and socio-economic problems can be obtained
from journals and newspapers like, Swasthya Khabar Patrika,
Health Today Magazine, The Sight, etc.
Published Sources
 Records maintained by various government and private
offices
 Researches carried out by individual research scholars in
the universities or research institutes
According to Prof. Bowley “It is never safe to take published statistics
at their face value without knowing their meaning and limitations and
it is always necessary to criticize arguments that can be based on
them.”
Unpublished Sources
Before using the secondary data, the investigators should
consider the following factors:
Precautions in the use of Secondary Data
Suitability of data
Adequacy of data
Reliability of data
Reliability of data – may be tested by checking:
Who collected the data?
What were the sources of the data?
Was the data collected properly?
Suitability of data
Data that are suitable for one enquiry may not be necessarily
suitable in another enquiry
Objective, scope and nature of the original enquiry must be studied
Adequacy of data – data is considered inadequate, if they are related
to area which may be either narrower or wider than the area of the
present enquiry
Secondary Data must possess the following
characteristics
Primary data
o Real time data
o Sure about sources of data
o Help to give results/ finding
o Costly and time consuming
process
o Avoid biasness of response
data
o More flexible
Secondary data
o Past data
o Not sure about of sources of
data
o Refining the problem
o Cheap and no time
consuming process
o Can not know in data
biasness or not
o Less flexible
 The characteristics or traits for which numerical value can
not be assigned, are called qualitative data (attributes)
e.g. gender, color, honesty etc.
 Methods of collecting qualitative data
Methods of Qualitative
Data Collection
Direct
Observation
In-depth
Interview
Case Study Triangulation
Use of
Secondary
Data
Assessment of Qualitative Data
 Classification of Qualitative data
Qualitative
Data
Geographical
Classification
Chronological
Classification
Qualitative
Classification
Assessment of Qualitative Data
Tabulation of Qualitative Data
 Qualitative data values can be organized by a frequency
distribution
 A frequency distribution lists
– Each of the categories
– The frequency/counts for each category
Assessment of Qualitative Data
Frequency Table
 A simple data set is: cataract, cataract, keratoconus, glaucoma,
glaucoma, cataract, glaucoma, cataract
 A frequency table for this qualitative data is
 The most commonly occurring eye condition is cataract
Eye condition Frequency
Cataract 4
Keratoconus 1
Glaucoma 3
Assessment of Qualitative Data
What Is A Relative Frequency?
 The relative frequencies are the proportions (or percents)
of the observations out of the total
 A relative frequency distribution lists
– Each of the categories
– The relative frequency for each category
 Relative frequency = Frequency/Total
Assessment of Qualitative Data
Relative Frequency Table
 A relative frequency table for this qualitative data is
 A relative frequency table can also be constructed with
percents (50%, 12.5% and 37.5% for the above table)
Refractive Error Relative Frequency
Cataract .500 (=4/8)
Keratoconus .125 (=1/8)
Glaucoma .375 (=3/8)
Assessment of Qualitative Data
 Graphical representation Of Qualitative Data
Bar Diagram
Pie or Sector
Diagram
Line Diagram
Pictogram
Map Diagram or
Cartogram
Assessment of Qualitative Data
Data Processing
 The data, after collection, has to be prepared for analysis
 Collected data is raw and it must undergo some processing
before analysis
 The result of the analysis are affected a lot by the form of
the data
 So, proper data processing is must to get reliable result
Data Processing
 Checking the questionnaires and schedules
 Reduction of mass data to manageable proportion
 Sum up the materials so as to prepare tables, charts,
graphs and various groupings and breakdowns for
presenting the result
 Minimizing the errors which may creep in at various stage
of the survey
Objectives of Data Processing
1. Manual Data Processing
 Involves human intervention
 Implies many chances for errors, such as delays in data
capture, high amount of operator misprints
 Implies higher labor expenses in regards to spending for
equipment and supplies, rent, etc.
Types of Data Processing
2. Mechanical Data Processing
 Different calculations and processing are performed
using mechanical machines like calculators etc.
 The use of mechanical machines makes data processing
easier and less time- consuming
 The chances of errors also become far less than manual
data processing
Types of Data Processing
3. Electronic Data Processing
 Processing of data by use of computer and its programs
Types of Data Processing
4. Real Time Processing
 There is a continual input, process and output of data
 Data has to be processed in a small stipulated time period
(real time)
 Eg, when a bank customer withdraws a sum of money from
his or her account it is vital that the transaction be processed
and the account balance updated as soon as possible
Types of Data Processing
5. Batch Processing
 In a batch processing group of transactions collected over a
period of time is collected, entered, processed and then the
batch results are produced
 Batch processing requires separate programs for input, process
and output
 It is an efficient way of processing high volume of data
 Eg, Payroll system, examination system and billing system
Types of Data Processing
QUESTIONNAIRE
CHECKING EDITING CODING CLASSIFICATION
TABULATION
GRAPHICAL
REPRESENTATION
DATA CLEANINGDATA ADJUSTING
The processing of data involves activities such as
Important Steps in Data Processing
 When the data is collected through questionnaires, the first
steps of data process is to check the questionnaires if they
are accepted or not
Not accepted if:
 Gives the impression that respondent could not
understand the questions
 Incomplete partially or fully
 Answered by a person who
has inadequate knowledge
Questionnaire Checking
 Process of examining the data collected in
questionnaires/schedules
to detect errors and omissions
to correct these when possible
to make sure the schedules are ready for tabulation
Data Editing
 Editor is responsible for seeing that the data are;
Accurate as possible
Consistent with other facts secured
Uniformly entered
As complete as possible
Acceptable for tabulation and arranged to facilitate
coding tabulation
Data Editing
• Data form complete
• Free of bias, errors,
inconsistency and dishonesty
Editing for quality
• Modification to facilitate
tabulation,
• Ignoring extremely high/low
Editing for
tabulation
• Translating or rewriting
Field editing
• Wrong and replacement
Central editing
Types of Editing
 To gather information
 To make data relevant and appropriate for analysis
 To find errors and modify them
 To ensures that the information provided is accurate
 To establish the consistency of data
 To determine whether or not the data are complete
 To obtain the best possible data available
Necessity of Editing
 Process of assigning numerals or other symbols to answers so
that responses can be put into limited number of categories
or classes
 Translating answers into numerical values or assigning
numbers to the various categories of a variable to be used in
data analysis
 Coding is done by using a code book, code sheet, and a
computer card
 Coding is done on the basis of the instructions given in the
codebook
 The codebook gives a numerical code for each variable
Coding of Data
72
• A codebook contains coding instructions and the necessary
information about variables in the data set
• A codebook generally contains the following information:
- column number
- record number
- variable number
- variable name
- question number
- instructions for coding
Codebook
 To organize data code
 To form structure for coding
 For interpretation of data
 For conclusions of data coded
 To translating answers into numerical values
 To assign no. to the various categories for data analysis
 It is necessary for efficient analysis
Necessity of Coding
 The process of arranging the primary data in a definite
pattern and presenting it in a systematic way
 The crude data obtained from experiment or survey is
classified according to their properties
 Classification cab be done by qualitatively or quantitatively
Classification of Data
 The classified data is more easily understood
 It presents the facts into a simpler form
 It facilitates quick comparison
 It helps for further statistical treatment such as
average, dispersion etc.
 It detects the error easily
Objectives of classification
Qualitative classification
Geographical classification
Chronological classification
Qualitative classification
Quantitative classification
Discrete classification
Continuous classification
Types of classification
Geographical Classification
 Data are classified by location of occurrence (i.e. area, region)
eg cataract pts. district wise
Chronological classification
 Data are classified by time of occurrence of the observations,
events
 The categories are arranged in chronological order
eg, no. of trachoma pts. recorded from 2000 to 2010
Qualitative Classification
Qualitative classification (Classification according to attributes)
 Data are classified according to some quality such as religion,
literacy, sex, occupation etc.
Simple classification
 Classification is made into 2 classes, such as classification by
male or female
Manifold classification
 2 or more than 2 attributes are studied simultaneously
 Eg. Classification according to sex, again marital status and
again literacy
Qualitative Classification
 Process of systematic organization and recording of
long series of data for further analysis and
interpretation into rows and columns
 It is concise, logical & orderly arrangement of data in a
columns & rows
Tabulation
 It presents an overall view of findings in a simpler way
 To identify trends
 It displays relationships in a comparable way between parts
of the findings
 It conserves space and reduces explanatory and descriptive
statement to a minimum
 It facilitates the process of comparison
 It provides a basis for various statistical computations
Usefulness of Tabulation
Graphical Representation
 Graphs help to understand the data easily
 A single picture is worth a thousand words-so goes a
common saying
 The non statistical minded people also easily understands
the data and compares them
 Most common graphs are bar charts and pie charts in
qualitative study and histogram in quantitative study
Graphical Representation
Advantages
 It is easier to read
 Can show relationship between 2 or more sets of
observations in one look
 Universally applicable
 Has high communication power
 Simplifies complex data
 Has more lasting effect on brain
Presentation of Qualitative data
1. Bar Diagram
• Consists of equally spaced vertical (or horizontal)
rectangular bars of equal width placed on a common
horizontal (or vertical) base line
• The categories are placed on X-axis and their frequencies
on Y-axis
Graphical Representation
Graphical Representation
0
100
200
300
400
BPH MBBS B.Optom B.Pharma
NO.OFSTUDENTS
HEALTH PROGRAM
Health Program at IOM
Simple Bar diagram
Component Bar diagram
Multiple Bar diagram
Graphical Representation
2. Pie Chart
• Circular diagram divided into segments and each
segment represent frequency in a category
Graphical Representation
Production of health manpower
yearly
Pictogram
Line diagram
Cartogram
Graphical Representation
Presentation of Quantitative Data
1.Histogram
• Graphical representation of a set of contiguously drawn
bars
• Most popular graph for continuous variable
Graphical Representation
Frequency Polygon
Frequency Curve
Scatter Diagram Time Plot
Graphical Representation
Stem-leaf Display
Box-and-whisker Plot
 Includes consistency checks and treatment of missing
responses
 Although preliminary consistency checks have been made
during editing, the checks at this stage are more thorough
and extensive, because they are made by computer
 Computer packages like SPSS, SAS, EXCEL and MINITAB can
be programmed to identify out-of-range values for each
variable
Data Cleaning
 If any correction needs to be done for the statistical
analysis, the data is adjusted accordingly
Data Adjusting
 Data adjusting is not always necessary but it may
improve the quality of analysis sometimes
Data Analysis
• Biostatistics by Prem P. Panta
• Fundamentals of Research Methodology and
Statistics by Yogesh k. Singh
• Research Design by J. W. Creswell
• Internet
References
Thank

Data Collection (Methods/ Tools/ Techniques), Primary & Secondary Data, Qualitative & Quantitative Data, Data Processing (healthkura.com)

  • 1.
    Data Collection, Assessmentof Qualitative Data, Data Processing: Key Issues Bikash Sapkota B. Optometry Institute of Medicine, TU, Nepal
  • 2.
    • Introduction todata • Classification of data • Collection of data • Methods of data collection • Assessment of qualitative data • Processing of data - Editing - Coding - Tabulation - Graphical representation Presentation Layout
  • 3.
    What is data? Data are observations or evidences about the social world  Data, the plural of datum, can be quantitative or qualitative in nature  ‘data is produced, not given’; that is, researchers choose what to call data, it is not just ‘there’ to be ‘found’. (Marsh 1988) - The Sage Dictionary of Social Research Methods
  • 4.
     The terms'data' and 'information' are used interchangeably  However the terms have distinct meanings Data Facts, events, transactions which have been recorded Input raw materials from which information is processed Information Data that have been produced in such a way as to be useful to the recipient Basic data are processed in some way to form information Data & Information
  • 5.
     The researchstudies in behavioral science are mainly concerned with the characteristics or traits  Thus, tools are administered to quantify these characteristics - but all traits or characteristics can not be quantified The data can be classified into two broad categories: Data Qualitative Data or Attributes Quantitative Data or Variables Nature of Data
  • 6.
    Nature of Data 1.QualitativeData or Attributes The characteristics or traits for which numerical value can not be assigned, are called attributes e.g. gender, motivation, etc. 2. Quantitative Data or Variables The characteristics or traits for which numerical value can be assigned, are called variables e.g. height, weight etc.
  • 7.
    Constants A constant isall characteristic or condition that is the same for all the observed units or sample subjects of a study Variables The characteristic or the trait in the behavioral science which can be quantified is termed as variable Variables Continuous variables Discrete variables
  • 8.
    Variables 1. Continuous variables A characteristic whose observation can take any values over a particular range  It can assure either fractional or integral values  E.g. wt. of children in kg, height of pt. 2. Discrete variables  Are those on the other hand, which exist only in units not the fractional value (usually units of one)  E.g. No. of cataract pts. in a village, WBC count
  • 9.
    Attribute vs. Variable AttributeVariable  A category of a characteristic, to which a subject either belongs or does not belong or property that a subject either possesses or does not possess  The attributes are becoming sick, describing blood group etc.  Variable describes a characteristic in terms of a numerical value, which is expressed in units of measurements  The variables are height, weight, blood pressure, age of pts. etc.
  • 10.
    Qualitative Data  Insuch data there is no notion of magnitude of size of the characteristic  They are just categorized  The data are classified by counting the individuals having the same characteristics or attribute and not by measurement  For examples: Gender: male/female Disease: present/absent Smoke: smoking/not smoking  These data can be measured in nominal and ordinal scales
  • 11.
    Quantitative Data  Anythingthat can be expressed as a number, or quantity or magnitude  Describes characteristics in term of a numerical value, which are expressed in units of measurements  E.g. level of hemoglobin in the blood, no. of glaucoma pts., intra ocular pressure, weight, etc.  Quantitative observations: as each individual is represented by a number  These data can be measured in interval and ratio scales
  • 12.
    Measurement Scale  Thechoice of appropriate statistical technique depends upon the type of data in question Qualitative Data • Nominal Scale • Ordinal Scale Quantitative Data • Interval Scale • Ratio Scale
  • 13.
    Nominal Scale  Theleast precise or crude of the 4 basic scales of measurement  Implies the classification of an item into 2 or more categories without any extent or magnitude  There is no particular order assigned to them  The frequency or numbers are used to give a name to something that may be used for determining per cent, mode Eg. boys and girls; pass and fail; rural and urban
  • 14.
    Ordinal Scale  Theordinal scale is more precise scale than the nominal scale  The variables has been categorized or leveled with meaningful natural order  But there is no information about the interval Eg. Pain: none, mild, moderate, severe
  • 15.
    Interval Scale  Theinterval scale is more precise and refined scale than nominal and ordinal scales  This scale has all the characteristics and relationship of the ordinal scale, besides which distances between any two numbers on the scale are known  The size of interval between two observations can be measured Eg. The temperature of a body
  • 16.
    Ratio Scale  Ithas the same properties as an interval scale as well as a true or absolute zero value  The ratio scale numerals have the qualities of real numbers, and can be added, subtracted, multiplied or divided Eg. Mean systolic BP
  • 17.
     Process ofsystematic gathering of data for a particular purpose from various sources, that has been systematically observed, recorded, organized  It is the first step of statistical study  There are several ways of collecting data  The choice of procedures usually depends on the objectives and design of the study and the availability of time, money and personnel Collection of Data
  • 18.
     To obtaininformation  To keep on record  To make decisions about important issues  To pass information onto others  For research study Purpose of Data Collection
  • 19.
    Data collection isan extremely important part of any research because the conclusions of a study are based on what the data reveal How Important it is?
  • 20.
    Nature, scope &objective of the enquiry Sources of information Availability of fund Techniques of data collection Availability of trained persons Factors to be considered before data collection
  • 21.
    Example: Documents Creative works Interviews Man-made materials Surveys Example: Unpublishedthesis and dissertations Manuscript Books Journals Sources of Data Source of Data External Primary Data Secondary Data Internal
  • 22.
    Internal sources ofData o Many institutions and departments have information about their regular functions , for their own internal purposes o When those information are used in any survey is called internal sources of data o Eg. social welfare society External sources of data o When information is collected from outside agencies is called external sources of data o Such types of data are either primary or secondary o This type of information can be collected by census or sampling method by conducting survey Internal & External Sources of Data
  • 23.
     Data collectedby investigator from personal experimental studies for a specific research goal is called primary data  The data are collected specially for a research project  Used when secondary data are unavailable and inappropriate  Data are to be unique, original, reliable and accurate in nature  Primary data hahe not been changed or altered by human beings, therefore its validity is greater than secondary data Primary Data
  • 24.
    Demerits Evaluated cost Time consuming Morenumber of resources are required Inaccurate feedback Required lot of skill with labor Targeted issues are addressed Data interpretation is better Merits High accuracy of data Greater control Address as specific research issues Primary Data
  • 25.
    Interview (direct/indirect) Schedule Questionnaires survey Focusgroup discussion (FGD) Community forums and public hearings Observation Case studies Key informants interview Internet/E-mail/SMS Primary Data Collection Techniques
  • 26.
     The datais collected by the investigator personally, he/she must be a keen observer  He/she asks or cross-examines the informant and collects necessary information  It is original in character Direct personal observation
  • 27.
    Direct personal observationis adopted in the following cases Where greater accuracy is needed Where the field of enquiry is not large Where confidential data are to be collected Where sufficient time is available Suitability of direct personal observation
  • 28.
    Merits Original data True andreliable data Encouraging response because of personal approach A high degree of accuracy Direct personal observation Demerits Unsuitable in large area Expensive & time-consuming Untrained investigator brings worst results Collection of information according to the ease of the informant
  • 29.
     The investigatorapproaches the witness or third parties, who are in touch with the informant  The enumerator interviews the people, who are directly or indirectly connected with the problem under the study  Generally this method is employed by different enquiry committees and commissions  The police department generally adopts this method to get clues of thefts, riots , murders, etc. Indirect oral interview
  • 30.
     It ismore suitable when the area to be studied is large  It is used when direct information cannot be obtained  This system is generally adopted by governments Suitability of indirect oral interview
  • 31.
    Merits  Simple andconvenient  Saves time, money and labor  Useful in investigation of a large area  Adequate information can be had Demerits  Information can’t be relied as absence of direct contact  Interview with an improper man will spoil the results  To get real data, a sufficient no. of people are to be interviewed  Careless attitude of informant affects the degree of accuracy Indirect oral interview
  • 32.
     The localagents or correspondents will be appointed, they collect the information and transmit it to the office or person  They do according to their own ways and tastes  Adopted by newspapers, agencies, etc.  The informants are generally called correspondents  Suitable in those cases where the information is to be obtained at regular intervals from a wide area Information through agencies
  • 33.
    Merits Demerits  Extensive informationcan be had  It is the most cheap and economical method  Speedy information is possible  It is useful where information is needed regularly  The information may be biased  Degree of accuracy cannot be maintained  Uniformity cannot be maintained  Data may not be original Information through agencies
  • 34.
     The questionnairesis sent to the respondents, there are blank spaces for answers  A covering letter is also sent along with the questionnaire, requesting the respondent to extend their full cooperation  Adopted by research workers, private individuals, non-officials agencies and government  Appropriate in cases where informants are spread over a wide area Mailed questionnaires
  • 35.
    Merits  Of allthe methods, the mailed questionnaire is the most economical  It can be widely used, when the area of investigation is large  It saves money, labor and time Demerits  Cannot be sure about the accuracy and reliability of the data  There is long delay in receiving questionnaires duly filled in Mailed questionnaires
  • 36.
     Very similarto the questionnaire method  The main difference is that a schedule is filled by the enumerator who is specially appointed for the purpose  Enumerator goes to the respondents, asks them the questions from the Performa in the order listed, and records the responses in the space provided  Enumerators must be trained in administering the schedule Data Collection Through Schedules
  • 37.
     A detailedstudy of geographical area to gather data, attitudes, impressions, opinions, satisfaction level etc., by polling a section of the population Census Survey • Conducted regularly at large interval of time Continuous Survey • Conducted regularly and frequently Ad-hoc Survey • Conducted at specific times for specific need • ‘as and when’ required Survey Types
  • 38.
    Merits Cover large population Lessexpensive Information is accurate Demerits On small scale survey avoided Time consuming Information does not penetrate deeply Researcher must have good knowledge Survey
  • 39.
     It isthe method of comprehensive study of social unit which may be a person, a family, an institution, an organization or a community Merits Direct behavioral study Real & personal experience record Make possible the study of social change Increase analysis ability & skills Demerits One case almost different from another case Personal bias Use only in limit sphere More time & money consuming Case Study
  • 40.
     Useful tofurther explore a topic, providing a broader understanding of why the target group may behave or think in a particular way  And assist in determining the reason for attitudes and beliefs  Conducted with a small sample of the target group and  Used to stimulate discussion and gain greater insights Focus Group Discussion
  • 41.
    Merits  Useful whenexploring cultural values and health beliefs  Can be used to explore complex issues  Can be used to develop hypothesis for further research  Do not require participants to be literate Demerits  Lack of privacy/anonymity  Potential for the risk of ‘group think’  Potential for group to be dominated by one or two people  Group leader needs to be skilled at conducting focus groups, dealing with conflict, drawing out passive participants  Time consuming to conduct and analyse Focus Group Discussion
  • 42.
     Application andcombination of several research methods in the study of the same phenomenon  Researchers can hope to overcome the weakness or intrinsic biases and the problems that come from single method, single-observer and single-theory studies  The purpose of triangulation in qualitative research is to increase the credibility and validity of the results Triangulation Types (Denzin 1978) Data Triangulation Investigator Triangulation Theory Triangulation Methodological Triangulation Beating the Bias
  • 43.
     Secondary dataare those data which have been already collected and analysed by some earlier agency for its own use and later the same data are used by a different agency Published Sources Unpublished Sources Sources of Secondary Data Secondary Data
  • 44.
    Various governmental, internationaland local agencies publish statistical data, and chief among them are:  International publications: They are UNO, WHO, Nature, etc.  Official publications of Government: Department of Drug Administration, Central Bureau of Statistics  Semi-Official publications: Semi-Govt. institutions like Municipal Corporation, District Board, etc. publish reports Published Sources
  • 45.
     Publications ofResearch Institutions: Nepal Development Research Institute, Nepalese Journal of Ophthalmology etc. publish the finding of their research program  Journals and Newspapers: Current and important materials on statistics and socio-economic problems can be obtained from journals and newspapers like, Swasthya Khabar Patrika, Health Today Magazine, The Sight, etc. Published Sources
  • 46.
     Records maintainedby various government and private offices  Researches carried out by individual research scholars in the universities or research institutes According to Prof. Bowley “It is never safe to take published statistics at their face value without knowing their meaning and limitations and it is always necessary to criticize arguments that can be based on them.” Unpublished Sources
  • 47.
    Before using thesecondary data, the investigators should consider the following factors: Precautions in the use of Secondary Data Suitability of data Adequacy of data Reliability of data
  • 48.
    Reliability of data– may be tested by checking: Who collected the data? What were the sources of the data? Was the data collected properly? Suitability of data Data that are suitable for one enquiry may not be necessarily suitable in another enquiry Objective, scope and nature of the original enquiry must be studied Adequacy of data – data is considered inadequate, if they are related to area which may be either narrower or wider than the area of the present enquiry Secondary Data must possess the following characteristics
  • 49.
    Primary data o Realtime data o Sure about sources of data o Help to give results/ finding o Costly and time consuming process o Avoid biasness of response data o More flexible Secondary data o Past data o Not sure about of sources of data o Refining the problem o Cheap and no time consuming process o Can not know in data biasness or not o Less flexible
  • 50.
     The characteristicsor traits for which numerical value can not be assigned, are called qualitative data (attributes) e.g. gender, color, honesty etc.  Methods of collecting qualitative data Methods of Qualitative Data Collection Direct Observation In-depth Interview Case Study Triangulation Use of Secondary Data Assessment of Qualitative Data
  • 51.
     Classification ofQualitative data Qualitative Data Geographical Classification Chronological Classification Qualitative Classification Assessment of Qualitative Data
  • 52.
    Tabulation of QualitativeData  Qualitative data values can be organized by a frequency distribution  A frequency distribution lists – Each of the categories – The frequency/counts for each category Assessment of Qualitative Data
  • 53.
    Frequency Table  Asimple data set is: cataract, cataract, keratoconus, glaucoma, glaucoma, cataract, glaucoma, cataract  A frequency table for this qualitative data is  The most commonly occurring eye condition is cataract Eye condition Frequency Cataract 4 Keratoconus 1 Glaucoma 3 Assessment of Qualitative Data
  • 54.
    What Is ARelative Frequency?  The relative frequencies are the proportions (or percents) of the observations out of the total  A relative frequency distribution lists – Each of the categories – The relative frequency for each category  Relative frequency = Frequency/Total Assessment of Qualitative Data
  • 55.
    Relative Frequency Table A relative frequency table for this qualitative data is  A relative frequency table can also be constructed with percents (50%, 12.5% and 37.5% for the above table) Refractive Error Relative Frequency Cataract .500 (=4/8) Keratoconus .125 (=1/8) Glaucoma .375 (=3/8) Assessment of Qualitative Data
  • 56.
     Graphical representationOf Qualitative Data Bar Diagram Pie or Sector Diagram Line Diagram Pictogram Map Diagram or Cartogram Assessment of Qualitative Data
  • 57.
  • 58.
     The data,after collection, has to be prepared for analysis  Collected data is raw and it must undergo some processing before analysis  The result of the analysis are affected a lot by the form of the data  So, proper data processing is must to get reliable result Data Processing
  • 59.
     Checking thequestionnaires and schedules  Reduction of mass data to manageable proportion  Sum up the materials so as to prepare tables, charts, graphs and various groupings and breakdowns for presenting the result  Minimizing the errors which may creep in at various stage of the survey Objectives of Data Processing
  • 60.
    1. Manual DataProcessing  Involves human intervention  Implies many chances for errors, such as delays in data capture, high amount of operator misprints  Implies higher labor expenses in regards to spending for equipment and supplies, rent, etc. Types of Data Processing
  • 61.
    2. Mechanical DataProcessing  Different calculations and processing are performed using mechanical machines like calculators etc.  The use of mechanical machines makes data processing easier and less time- consuming  The chances of errors also become far less than manual data processing Types of Data Processing
  • 62.
    3. Electronic DataProcessing  Processing of data by use of computer and its programs Types of Data Processing
  • 63.
    4. Real TimeProcessing  There is a continual input, process and output of data  Data has to be processed in a small stipulated time period (real time)  Eg, when a bank customer withdraws a sum of money from his or her account it is vital that the transaction be processed and the account balance updated as soon as possible Types of Data Processing
  • 64.
    5. Batch Processing In a batch processing group of transactions collected over a period of time is collected, entered, processed and then the batch results are produced  Batch processing requires separate programs for input, process and output  It is an efficient way of processing high volume of data  Eg, Payroll system, examination system and billing system Types of Data Processing
  • 65.
    QUESTIONNAIRE CHECKING EDITING CODINGCLASSIFICATION TABULATION GRAPHICAL REPRESENTATION DATA CLEANINGDATA ADJUSTING The processing of data involves activities such as Important Steps in Data Processing
  • 66.
     When thedata is collected through questionnaires, the first steps of data process is to check the questionnaires if they are accepted or not Not accepted if:  Gives the impression that respondent could not understand the questions  Incomplete partially or fully  Answered by a person who has inadequate knowledge Questionnaire Checking
  • 67.
     Process ofexamining the data collected in questionnaires/schedules to detect errors and omissions to correct these when possible to make sure the schedules are ready for tabulation Data Editing
  • 68.
     Editor isresponsible for seeing that the data are; Accurate as possible Consistent with other facts secured Uniformly entered As complete as possible Acceptable for tabulation and arranged to facilitate coding tabulation Data Editing
  • 69.
    • Data formcomplete • Free of bias, errors, inconsistency and dishonesty Editing for quality • Modification to facilitate tabulation, • Ignoring extremely high/low Editing for tabulation • Translating or rewriting Field editing • Wrong and replacement Central editing Types of Editing
  • 70.
     To gatherinformation  To make data relevant and appropriate for analysis  To find errors and modify them  To ensures that the information provided is accurate  To establish the consistency of data  To determine whether or not the data are complete  To obtain the best possible data available Necessity of Editing
  • 71.
     Process ofassigning numerals or other symbols to answers so that responses can be put into limited number of categories or classes  Translating answers into numerical values or assigning numbers to the various categories of a variable to be used in data analysis  Coding is done by using a code book, code sheet, and a computer card  Coding is done on the basis of the instructions given in the codebook  The codebook gives a numerical code for each variable Coding of Data
  • 72.
    72 • A codebookcontains coding instructions and the necessary information about variables in the data set • A codebook generally contains the following information: - column number - record number - variable number - variable name - question number - instructions for coding Codebook
  • 73.
     To organizedata code  To form structure for coding  For interpretation of data  For conclusions of data coded  To translating answers into numerical values  To assign no. to the various categories for data analysis  It is necessary for efficient analysis Necessity of Coding
  • 74.
     The processof arranging the primary data in a definite pattern and presenting it in a systematic way  The crude data obtained from experiment or survey is classified according to their properties  Classification cab be done by qualitatively or quantitatively Classification of Data
  • 75.
     The classifieddata is more easily understood  It presents the facts into a simpler form  It facilitates quick comparison  It helps for further statistical treatment such as average, dispersion etc.  It detects the error easily Objectives of classification
  • 76.
    Qualitative classification Geographical classification Chronologicalclassification Qualitative classification Quantitative classification Discrete classification Continuous classification Types of classification
  • 77.
    Geographical Classification  Dataare classified by location of occurrence (i.e. area, region) eg cataract pts. district wise Chronological classification  Data are classified by time of occurrence of the observations, events  The categories are arranged in chronological order eg, no. of trachoma pts. recorded from 2000 to 2010 Qualitative Classification
  • 78.
    Qualitative classification (Classificationaccording to attributes)  Data are classified according to some quality such as religion, literacy, sex, occupation etc. Simple classification  Classification is made into 2 classes, such as classification by male or female Manifold classification  2 or more than 2 attributes are studied simultaneously  Eg. Classification according to sex, again marital status and again literacy Qualitative Classification
  • 79.
     Process ofsystematic organization and recording of long series of data for further analysis and interpretation into rows and columns  It is concise, logical & orderly arrangement of data in a columns & rows Tabulation
  • 80.
     It presentsan overall view of findings in a simpler way  To identify trends  It displays relationships in a comparable way between parts of the findings  It conserves space and reduces explanatory and descriptive statement to a minimum  It facilitates the process of comparison  It provides a basis for various statistical computations Usefulness of Tabulation
  • 81.
    Graphical Representation  Graphshelp to understand the data easily  A single picture is worth a thousand words-so goes a common saying  The non statistical minded people also easily understands the data and compares them  Most common graphs are bar charts and pie charts in qualitative study and histogram in quantitative study
  • 82.
    Graphical Representation Advantages  Itis easier to read  Can show relationship between 2 or more sets of observations in one look  Universally applicable  Has high communication power  Simplifies complex data  Has more lasting effect on brain
  • 83.
    Presentation of Qualitativedata 1. Bar Diagram • Consists of equally spaced vertical (or horizontal) rectangular bars of equal width placed on a common horizontal (or vertical) base line • The categories are placed on X-axis and their frequencies on Y-axis Graphical Representation
  • 84.
    Graphical Representation 0 100 200 300 400 BPH MBBSB.Optom B.Pharma NO.OFSTUDENTS HEALTH PROGRAM Health Program at IOM Simple Bar diagram Component Bar diagram Multiple Bar diagram
  • 85.
    Graphical Representation 2. PieChart • Circular diagram divided into segments and each segment represent frequency in a category
  • 86.
    Graphical Representation Production ofhealth manpower yearly Pictogram Line diagram Cartogram
  • 87.
    Graphical Representation Presentation ofQuantitative Data 1.Histogram • Graphical representation of a set of contiguously drawn bars • Most popular graph for continuous variable
  • 88.
  • 89.
  • 90.
     Includes consistencychecks and treatment of missing responses  Although preliminary consistency checks have been made during editing, the checks at this stage are more thorough and extensive, because they are made by computer  Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of-range values for each variable Data Cleaning
  • 91.
     If anycorrection needs to be done for the statistical analysis, the data is adjusted accordingly Data Adjusting  Data adjusting is not always necessary but it may improve the quality of analysis sometimes Data Analysis
  • 92.
    • Biostatistics byPrem P. Panta • Fundamentals of Research Methodology and Statistics by Yogesh k. Singh • Research Design by J. W. Creswell • Internet References Thank