DATA IN RESEARCH METHODOLOGY
PRESENTED BY
DR RIPIKA SHARMA
DEPARTMENT OF PUBLIC HEALTH
DENTISTRY
MRADC
CONTENTS:
 Data & types of data
 Collection of data
 What is data collection?
 Why to collect data?
 Methods of data collection
 Advantages & disadvantages of information collection tools
 Problems in data collection
 Precautions in data collection
 Conclusion
 Different methods of presentation of data
 References
WHAT IS DATA
 When ever an observation is made , it will be recorded and a collective
recording of these observations either numerical or otherwise is called
as data .
 Data are distinct pieces of information, usually formatted in a
special way.
 data is the plural of datum, a single piece of information. In
practice, however, people use data as both the singular and
plural form of the word
 observation may be collected in a simple way like recording the sex
of a person in a group or noting down the number of cases of a
disease in a community or may be done through an experiment such
as counting the total wbc in a given volume of blood, etc of an
individual.
 In each of the above cases certain observation is made of a
characteristic and these characteristic which varies from one
observation to the other is called as variable.
TYPES OF DATA AND LEVELS OF
MEASUREMENT
MEASUREMENT AND MEASUREMENT SCALES
 Measurement scales were introduced by Stevens.
 Measurement : This may be defined as the
assignment of numbers to objects or events according
to set of rules.
 The various measurement scales results from the fact
that measurement may be carried out under set of
rules.
• Why do we need to know what type of data we are
dealing with?
• The data type or level of measurement influences the
type of statistical analysis techniques that can be
used when analysing data.
.
Data types – important?
TYPES OF DATA
 In general, data can be classified according to :
1. Based on Characteristic:
2. Based on Source:
3. Based on Field:
4. Based on Content:
 Based on characteristic:
QUALITATIVE DATA
QUANTITATIVE DATA
 Qualitative (or categorical) data
• Represents a particular quality , also named as attributes.
 Consist of values that can be separated into different categories that
are distinguished by some nonnumeric characteristic.
 Qualitative variable are measured either on a nominal or ordinal
scale.
 Persons with the same characteristics are counted to form 1grp Ex
classes vaccinated, sex, religion, nationality ,color of the eyes ,on
drug ,on placebo etc.
 these characteristic are called “attributes” or “attributive variates”
or “descriptive characteristics”
 Quantitative data
 Data consist of values representing counts or measurements.
 Quantitative variables are measured on an interval or ratio scale.
 Quantitative data can be further classified into :-
 DISCRETE
 CONTINUOUS
 A DISCRETE VARIABLE:
 Is a random variable , where the variable
under observation can take only fixed
values in a given range like whole
numbers or the variable jumps from one
number to another without taking in
between values the data is called discrete
data.
THE FOLLOWING VARIABLE ARE DISCRETE:
 The number of DMF teeth. it can be any one of the 33
numbers, 0,1,2,3,4………….32.
 The size of a family.
 The number of erupted permanent teeth.
 The number of patients with osseous disease.
CONTINUOUS VARIABLE
 A random variable that can take on a range of values or
a continuum;
 Continuous variable are:
 Treatment time.
 Pocket depth.
 Amount of new bone growth.
 Concentration level of anesthesia.
 Acidity in saliva.
 Hemoglobin percentage level.
 Attribute data And Variables data
 ATTRIBUTE DATA
 Attribute data give you counts representing the presence or absence
of a characteristic or defect.
 As an example, if you are concerned with timely delivery of parts by
your store keepers, you could develop a procedure that would give
you a count of the number of supply parts they deliver on time and
the number they deliver late (defects). This would give you attribute
data, but it would not tell you how late a delivery actually was.
 VARIABLES DATA are based on measurement of a key
quality characteristic produced by the process. Such
measurements might include length, width, time, weight, or
temperature, to name a few.
 For example: the total time from receipt of the request to
delivery of the part. This measurement, time, could be used to
determine how timely or late the deliveries were.
Sources of Data
External
sources
Primary
data
Secondary
data
Internal
sources
Internal sources of Data
o Many institutions and
departments have information
about their regular functions,for
their own internal purposes.
o When those information are
used in any survey is called
internal sources of data.
o Eg…social welfare socities.
External sources of data
o When information is collected
from outside agencies is called
external sources of data.
o Such types of data are either
primary or secondary.
o This type of information can be
collected by census or sampling
method by conducting survey.
Internal & External Sources of Data
2.BASED ON SOURCE:
There are two sources of data collection
techniques.
Primary data
Secondary data collection techniques.
Primary data collection uses surveys,
experiments or direct observations.
Secondary data collection may be conducted by
collecting information from a diverse source of
documents or electronically stored information,
census and market studies are examples of a
common sources of secondary data. This is also
referred to as "data mining."
PRIMARY DATA
Primary data means original data
that has been collected specially for the
purpose in mind. It means someone
collected the data from the original
source first hand.
Data collected this way is called primary
data.
Primary data has not been published yet
and is more reliable, authentic and
objective. Primary data has not been
changed or altered by human beings;
therefore its validity is greater than
secondary data.
Merits
Targeted issued are
addressed
Data interpretation is
better
High accuracy of data
Address as specific
research issues
Greater control
Demerits
Evaluated cost
Time consuming
More number of resources
are required
Inaccurate feedback
Required lot of skill with
labour.
Primary Data
SECONDARY DATA
Secondary data is the data that has been already
collected by and readily available from other
sources.
secondary data is data that is being reused. Such
data are more quickly obtainable than the primary
data.
These secondary data may be obtained from many
sources, including literature, industry surveys,
compilations from computerized databases and
information systems, and computerized or
mathematical models of environmental processes.
Merits
Quick and cheap source
of data
Wider geographical
area
Longer orientation
period
Leading to find
primary data
Demerits
No fulfill our specific
research needs
Poor accuracy
Data are not up to date
Poor accessibility in some
cases
Secondary Data
Primary data
 Real time data
 Sure about sources of data
 Help to give results/
finding
 Costly and time consuming
process
 Avoid biasness of response
data
 More flexible
Secondary data
 Past data
 Not sure about of sources
of data
 Refining the problem
 Cheap and no time
consuming process
 Can not know the data
biasness
 Less flexible
Difference b/w primary and secondary data
3)Based on field:
 In computer database management software data is arranged in
tabular form.
 The columns are called fields and the rows are records.
 Common types of fields are :
a) Character type: eg- Name, Address etc.,
b) Numeric type :eg - Height, weight, blood sugar level, serial number
etc.,
c) Data type: eg- date of birth, date of admission, data of discharge
etc.,
d) Logical type: eg- dichotomous data like sex-male/female,
residence-urban/rural.
4)Based on contents:
ORDINAL SCALE
NOMINAL SCALE
INTERVAL SCALE
RATIO SCALE
THE NOMINAL SCALE
 The lowest measurement scale
 It consist of named categories with no implied order among
categories.
 Represents the simplest type of data
 The categories in nominal measurement scale have no
quantitative relationship to each other .
 Observations are placed into broad categories which may be
denoted by symbols or labels or names.
 Statisticians uses numbers to identify the categories, for
example,0 for females and 1 for males.
 The numbers are simply alternative labels
THE NOMINAL SCALE…… CONTI..
 The categories available cannot be placed in any order
 no judgement can be made about the relative size or distance
from one category to another
 Although attributes are labeled with numbers instead of
words, the order and the magnitude of the number do
not have any meaning at all.
 Numerical values allows to perform the data analysis and
are used only for the sake of convenience.
 Nominal Data Reflect Qualitative Differences Rather Than
Quantitative Ones
DICHOTOMOUS DATA
 Greek word meaning “cut into two”
 Variables that have only two responses i.e. Yes or No, are known as
dichotomies.
 Have no prior qualitative direction.
 Example: male/female , treatment or placebo
 Have an implied direction that is favorable
- Well/sick, living/dead, normal/abnormal
- .
- Examples:
- Yes/no response for a survey questionnaire
- Marital status
- gender
What is your
gender? (please tick)
Male
Female
ORDINAL SCALE
 Categories or observation are ranked or ordered.
 The amount of difference between the categories though
ordered, it cannot be quantified.
 Post- surgery pain can be classified according to its severity
 0 – represents no pain
 1- mild pain
 2- moderate pain
 3-severe pain
 4- extremely severe pain
ORDINAL SCALE…CONT..
 Individuals may be classified according to socio
economic status as Low,medium,high.
 Examples:
 Disease state of cancer [stage1 , stage 2,…]
 Tooth mobility
 Silness- loe gingival index
 Millers classification of root exposure.
INTERVAL MEASUREMENT
 more sophisticated scale.
 With this scale it is not only possible to order
measurements, but also the distance between any two
measurements is known, its fixed and equal.
 There is no meaningful absolute zero.
 The temperature zero degree in Celsius or Fahrenheit
does not mean the total absence of temperature.
INTERVAL MEASUREMENT.. CONT..
 Examples:
 Interval are not common as other scales
 -IQ score representing the level of intelligence
IQ score of 0 is not indicative of no intelligence
 Statistician knowledge represented by a statistics
test score.
the test score zero does not necessarily mean that
the individual has zero knowledge in statistics,
THE RATIO MEASUREMENT SCALE
 There exist a true zero
 Possesses same properties of interval scale
 Most of the measurement scales in health sciences are
ratio scale
 Weights in pound , patient waiting time in dental office.
 Zero waiting time means patient did not have to wait
 The ratio measurement scale allow us to perform all
arithmetic operations on the number .
 The resulting numerical value has sensible meaning.
 Example:
 Treatment cost
 Saliva flow rate
 Length of root canal
 Diastema
 Sugar concentration in blood
 In general interval and ratio scales contain more
information than do nominal or ordinal scale.
• Nominal data is the least complex and give a simple measure
of whether objects are the same or different.
• Ordinal data maintains the principles of nominal data but adds
a measure of order to what is being observed.
• Interval data builds on ordinal by adding more information on
the range between each observation by allowing us to measure
the distance between objects.
• Ratio data adds to interval with including an absolute zero.
Hierarchical data order
• Knowing the hierarchy of data is useful.
•Why? It is possible to recode or adjust certain types of data into
others.
• Can go from most complex (interval and ratio) to least complex
(nominal) but cannot go the other way around.
• Interval/ratio can be re-formatted to become ordinal or
nominal, ordinal can become nominal.
Hierarchical data order
COLLECTION OF DATA
WHAT IS DATA COLLECTION???
 Data Collection is nothing more than planning for and obtaining
useful information on key quality characteristics produced by
process .
 The key issue in data collection is not: How do we collect data?
Rather, it is: How do we obtain useful data?
WHY TO COLLECT DATA?
 Data Collection enables a team to formulate and test working
assumptions about a process and develop information that will lead
to the improvement of the key quality characteristics of the product
or service.
 Data Collection improves decision-making by helping us focus
on objective information about what is happening in the
process, rather than subjective opinions.
 Object and scope of the enquiry.
 Sources of information.
 Quantitative expression.
 Techniques of data collection.
 Unit of collection.
Factors to be Considered Before
Collection of Data
Methods of
collecting
primary data
Direct
Personal
Investigation
(i.e. interview
method)
Indirect oral
investigation
(i.e. through
enumerators)
Investigation
through local
reporters
questionnaire
Investigation
through
mailed
questionnaire
Investigation
through
observation
COLLECTION OF PRIMARY DATA:
 The various methods of collecting primary data, particularly in
surveys and descriptive researches are:
(i) Observation method,
Simple or uncontrolled observation
Systematic or controlled observation
Mass observation
(ii) Interview method,
(iii) Through questionnaires,
(iv) Through schedules, and
 (v) other methods which include
(A) Warranty cards;
(B) Distributor audits;
(C) Pantry audits;
(D) Consumer panels;
(e) Using mechanical devices;
(F) Through projective techniques;
(g) Depth interviews, and
(H) Content analysis.
Published
Sources
International
Government
Municipal
corporation
Institutional/
commercial
Unpublished
sources
Method of collection secondary data
COLLECTION OF SECONDARY DATA:
 Secondary data may either be published data or unpublished data.
Usually published data are available in:
 (a)various publications of the central, state and local governments;
 (b) various publications of foreign governments or of international
bodies and their subsidiary organizations;
 (c) technical and trade journals;
 (d) books, magazines and newspapers;
e) reports and publications of various associations connected with
business and industry, banks, stock exchanges, etc.;
(f) reports prepared by research scholars, universities, economists, etc.
in different fields; and
(g) public records and statistics, historical documents, and other
sources of published information
 The researcher, before using secondary data, must see that they
possess following characteristics:
1. Reliability of data: The reliability can be tested by finding out such
things about the said data:
(a) Who collected the data?
(b) What were the sources of data?
(c) Were they collected by using proper methods ?
(d) At what time were they collected?
(e) Was there any bias of the compiler?
(f) What level of accuracy was desired?
(g) Was it achieved ?
2. Suitability of data: The data that are suitable for one enquiry may
not necessarily be found suitable in another enquiry. Hence, if the
available data are found to be unsuitable, they should not be used by
the researcher.
 Similarly, the object, scope and nature of the original enquiry must
also be studied.
 If the researcher finds differences in these, the data will remain
unsuitable for the present enquiry and should not be used.
3. Adequacy of data:
If the level of accuracy achieved in data is found inadequate for the
purpose of the present enquiry, they will be considered as inadequate
and should not be used by the researcher.
 The data will also be considered inadequate, if they are related to an
area which may be either narrower or wider than the area of the
present enquiry.
Observation method:
 Observation becomes a scientific tool provided it is systematically
planned and recorded.
 The main advantage of this method is subjective bias is eliminated
 It is used in both experimental and non experimental research.
 In this method the investigator obtains data by direct observation.
 Subjective bias is
eliminated.
 Information obtained
relates to what is currently
happening.
 Observation is
independent of
respondents willingness to
answer question.
 Useful when respondent
are not capable of
answering verbal
question.
 Expensive method
 Information is very limited
 At times unforeseen
factors may interfere with
observation task.
 Rarely accessible people
to observation create
obstacles.
ADVANTAGES DISADVANTAGES
 In case the observation is characterized by a
careful definition of units to be observed, style of
recording the observed information , standardized
condition of observation, and the selection of
pertinent data of observation --- then these
observation is called as structured observation.
 But when these observation is to take place without
these characteristic to be thought of in advance, the
same is termed as unstructured observation.
PARTICIPANT OBSERVATION
NON PARTICIPANT OBSERVATION
 If the observer observes by making himself more or
less a member of the group he is observing so that
he can experience what the member of the group
experience, the observation is called as the
participant observation.
 But when the observer observers as a detached
emissary without any attempt on his part to
experience through participation what others feel, the
observation is called as non participant
observation.
 When the observer is observing in such a manner
that is presence may be unknown to the people he is
observing such observation is described as
disguised observation
PARTICIPANT OBSERVATION
 Researcher is able to
record the natural
behavior of the group.
 Could gather information
which could not be easily
obtained if he observes in
a disinterested fashion.
 The researcher can even
verify the truth of the
statements.
 The observer may lose
the objectivity to the
extents he participates
emotionally.
 The problem of
observation- control not
solved and may narrow
down researchers
range of experience.
merits demerits
1. UNCONTROLLED OBSERVATION:
2. No attempt is made to use precision instrument
3. Aim is to get a spontaneous picture of life and persons.
4. It supplies naturalness and completeness of behavior allowing
sufficient time to observe them.
5. Main pitfall is that of subjective interpretation.
6. It is resorted in case of exploratory research.
 Types of uncontrolled observation:
 A) participant observation
 B)non participant observation
 C)quasi:- participant observation
2)SYSTEMATIC OR CONTROLLED OBSERVATION:
 Uses precision or mechanical instrument as it aids to accuracy and
standardization.
 Such observations supply formalized data upon which generalization
can be built with some degree of assurance.
 Controlled observation takes place in various experiments carried
out in laboratory or under controlled conditions.
3)Mass observation:
 Here the collective behavior of the people in public places and
different situations is observed and recorded.
INTERVIEW METHOD:
Definition
‘An interview is a purposeful discussion between two
or more people’
Kahn and Cannell (1957)
 The interview method of collecting data involves presentation of
oral-verbal stimuli and reply in terms of oral-verbal responses. This
method can be used through personal interviews and, if possible,
through telephone interviews.
INTERVIEW METHOD
Personal
interview
Telephone
interview
 Personal interview method requires a person known as the
interviewer asking questions generally in a face-to-face contact to
the other person or persons. (At times the interviewee may also ask
certain questions and the interviewer responds to these, but usually
the interviewer initiates the interview and collects the information.)
 This sort of interview may be
 2 types:
1. Direct personal investigation
2. Indirect oral investigation
 Direct personal investigation
 The interviewer has to collect information personally
from the sources concerned.
 This method is particularly suited for intensive
investigation.
 Indirect oral investigation.
 Under this the investigator doesn’t collects the
information directly, instead he gets them
indirectly through those persons who know the
information and who are ready to part away with
the information they posses.
 This method is used incase where direct
contact is not possible.
FACE TO FACE (IN-PERSON) INTERVIEWS
Advantages
 There is a high response rate.
 Interviewers can make relevant
observations on sensible
variables.
 The researcher can adapt the
questions as necessary, clarify
doubt and ensure that the
responses are properly
understood.
An interactive process in which trained interviewers visit people in
their homes or work to directly collect data from them.
Disadvantages
 Travel costs for interviewers
can be high.
 The interviewers do not
always visit at times
convenient to the interviewee
and hence may have to
revisit.
 High cost to train and recruit
interviewers.
 Interviewer bias
communicated by demean or,
tone of voice and questioning
style may influence
respondents.
TYPES OR CLASSES OF INTERVIEW
STRUCTURED INTERVIEW
SEMI-STRUCTURED INTERVIEW
UNSTRUCTURED INTERVIEW
STRUCTURED INTERVIEW
 Description and/or Aim of interview:
- Normally, structured interviews are done in a face-to-face
format or via telephone using a standard set of questions to
obtain data that can be aggregated because identical questions
have been asked of each participant.
 Nature of questioning route: fixed, given order, very
standardized
 Role of probing: Little or none, perhaps only repeating or
clarifying instructions
SEMI – STRUCTURED INTERVIEW
 Description and/or aim of interview:
“More or less open-ended questions are brought to the interview
situation in the form of an interview guide” (Flick, 1998 p. 94).
The level of depth of understanding that the researcher pursues is
used to characterize this type of interview.
 Nature of questioning route: flexible, but usually a given set of
questions is covered, varying levels of standardization
 Role of probing: Get the participant to expand upon their
answer, give more details, and add additional perspectives
UNSTRUCTURED INTERVIEW
 Description and/or Aim of interview:
unstructured interviews are done in a face-to-face format and some would say
you are trying to get participants to share stories. The researcher starts from a
position of wanting to be sensitive to how participants construct their views and
perspectives of things. Therefore, a goal is to allow the participant’s structure to
dominate.
 Nature of questioning route: ask questions to get people to talk about
constructs/variables of interest to the researcher.
 Role of probing: Simply to get the participant of talk about a topic area,
normally probing questions are not directed, but rather asked to encourage the
participant to keep talking or to get back to the subject of interest.
OTHER TYPES OF INTERVIEW INCLUDE FOCUSED INTERVIEW ,
CLINICAL INTERVIEW AND THE NON DIRECTIVE INTERVIEW
Focused Interview:
Is meant to Focus attention on the given experience of the
respondent and its effects.
The interviewer has the freedom to decide the manner and
sequence in which the questions would be asked and also
the freedom to explore reasons and motives.
The main task is to confine the respondent to a discussion of issues
with which he seeks conversance.
It is used generally in the development of hypothesis
It constitutes major type of unstructured interview.
 Clinical Interview
 Concerned with broad underlying feelings or motivation
or with the course of individuals life experience.
 Non Directive Interview
Simply encourage the respondent to talk about the given
topic with a bare minimum questioning.
The interviewer often acts as a catalyst to a comprehensive
expression of the respondent.
MERITS
 More information and in greater depth.
 Interviewer can overcome resistance of respondents.
 Greater flexibility.
 Observation method can as well be applied to
recording verbal answers to various questions.
 Personal information can as well be obtained easily.
 Samples can be covered completely with repeated
visits.
 Interviewer may catch the informant off guard and thus
may secure the most spontaneous reactions.
 Language of interview can be adopted to the ability or
educational level of the person interviewed.
 Interviewer can control which persons will answer the
questions.
 Interviewer can collect supplementary information.
 Desired information can be collected at one point .
 Provides accurate data for calculation of various rates
and ratio.
DEMERITS
 Very expensive
 Possibility of bias (interviewer bias refers to the extend to
which an answer is altered in meaning by some action or
attitude on the part of the interviewer.)
 Certain types of respondents such as important officials
may not be easily approachable
 More time consuming specially when the sample is large
and recalls upon the respondents are necessary
 Presence of interviewer on the spot may over stimulate the
respondent.
Pre-requisites and basic tenets of interviewing:
 For successful implementation of the interview method, interviewers
should be carefully selected, trained and briefed. They should be
honest, sincere, hardworking, impartial and must possess the
technical competence and necessary practical experience.
 Occasional field checks should be made to ensure that interviewers
are neither cheating, nor deviating from instructions given to them
for performing their job efficiently.
 The approach should be friendly, courteous, conversational
and unbiased.
 Interviewer should not show disapproval or surprise of a
respondents answer but he must keep the direction of interview in
his own hand, discouraging irrelevant conversation and must
make all possible effort to keep respondent on the track
 In addition, some provision should also be made in advance so
that appropriate action may be taken if some of the selected
respondents refuse to cooperate or are not available when an
interviewer calls upon them.
TELEPHONE INTERVIEWS
Advantages
 Possible coverage of wide
geographic area.
 It is quicker and less
expensive than the face-to-
face method.
 Random digital dialing can be
used to make sampling easy.
 High response rate possible.
 Interviewer can control
questioning sequence.
 No field staff is required.
Disadvantages
 Only people with telephones can be
interviewed.
 High costs involved for long distance
calls; may need several call backs.
 Respondents can terminate interview by
hanging up the phone.
 Anonymity is limited.
This involves trained interviewers calling persons to collect data.
Basic steps in interview:
A. ESTABLISHING CONTACT
B. STARTING AN INTERVIEW
C. SECURING RAPPORT
D. RECALL
E. PROBE QUESTIONS
F. ENCOURAGEMENT
G. GUIDING THE INTERVIEW
H. RECORDING
I. CLOSING THE INTERVIEW
J. REPORT
 QUESTIONNAIRE
 Definition:
 A questionnaire is simply a list of
mimeographed or printed questions that is
completed by or for a respondent. An
interview schedule is a list of more or less
structured questions that are read out or
verbalized by an interviewer (with or without
probing) in interrogating a respondent. The
interviewer then records the respondent
replies either verbatim for (open-ended
questions) or according to prespecified (or
even precoded) answers or categories.
 COLLECTION OF DATA THROUGH QUESTIONNAIRES:
 This method of data collection is quite popular, particularly in case
of big enquiries. It is being adopted by private individuals, research
workers, private and public organizations and even by governments.
 In this method a questionnaire is sent (usually by post) to the persons
concerned with a request to answer the questions and return the
questionnaire.
 Care needs to be taken to ensure that the questions elicit a useful and
unbiased response
TYPES OF QUESTIONS
 Open-ended Questions – They are used in qualitative interviews where the
respondent is made to explain why certain things is done.
 Free Response Questions – They are asked in such a way that the
respondent does not limit the scope of his answers or responses.
 Multiple Choices – It is the most commonly used type of questioning. It is a
list of a number of answers provided for every question.
 Scaled Response – The respondents are given a range of categories in
which to express their feelings or opinions.
 Checklist – This is a form of multiple choice questions from which the
respondents chooses one or more response categories.
 Ranking Questions – This refers to an opinion question where the
respondent is asked to rank comparatively the items listed either in
ascending or descending order.
 Dichotomous Question – There are only two possible answers to the
questions like the Yes – No type.
 Main aspects of a questionnaire:
 A) General form:
 B) Question sequence:
 C) Question formulation and wording:
1. General form:
 The general form of a questionnaire, it can either be structured or
unstructured questionnaire.
 Structured questionnaires are those questionnaires in which there are
definite, concrete and pre-determined questions. The questions are
presented with exactly the same wording and in the same order to all
respondents.
 Resort is taken to this sort of standardization to ensure that all
respondents reply to the same set of questions.
 Structured questionnaires may also have fixed alternative questions
in which responses of the informants are limited to the stated
alternatives.
 Thus a highly structured questionnaire is one in which all questions
and answers are specified and comments in the respondent’s own
words are held to the minimum.
2.QUESTION SEQUENCE:
 In order to make the questionnaire effective and to
ensure quality to the replies received, a researcher
should pay attention to the question-sequence in
preparing the questionnaire.
 Questions should proceed in logical sequence moving from
easy to more difficult questions.
 Question sequence should always go from the general to the
more specific.
 The answer given to a question is a function not only of
specific question but of all previous questions as well.
 The opening questions should be such as to arouse human
interest.
 A proper sequence of questions reduces considerably the chances of
individual questions being misunderstood.
 The question-sequence must be clear and smoothly-moving, thereby
that the relation of one question to another should be readily
apparent to the respondent, with questions that are easiest to answer
being put in the beginning.
QUESTIONS TO BE AVOIDED:
 Questions that put too great a strain on the memory or
intellect of the respondent.
 Questions of a personal character.
 Technical and vague expressions capable of different
interpretations should be avoided.
 Questions related to personal wealth etc.
 3. Question formulation and wording:
 Should be simple.
 Should be easily understood.
 Should be concrete and should conform to the
respondent’s way of thinking.
 With regard to this aspect of questionnaire, the researcher
should note that each question must be very clear for any
sort of misunderstanding.
 Question should also be impartial in order not to give a
biased picture of the true state of affairs. Questions should
be constructed with a view to their forming a logical part
of a well thought out tabulation plan.
Closed/Multiple
choice
Open ended
Open ended
questionnaire
Closed ended
questionnaire
Accuracy of
response
Easier to express
complex situations
Difficult to
investigate
complex
situations
Coverage May pick up
anticipated
situation
Will miss areas
not anticipated
Size of
questionnaire
May need fewer
lines of text
May need many
pages of text
Subject recall Reduced Enhanced
Analysis More complex Simpler
 There should be some control questions in the
questionnaire which indicate reliability of the
respondent.
 There should be provision for indications of uncertainty.
.
 Essentials of a good questionnaire:
 To be successful, questionnaire should be comparatively short and
simple i.e., the size of the questionnaire should be kept to the
minimum. Questions should proceed in logical sequence moving
from easy to more difficult questions.
 Personal and intimate questions should be left to the end. Technical
terms and vague expressions capable of different interpretations
should be avoided in a questionnaire.
 Questions may be dichotomous (yes or no answers), multiple choice
(alternative answers listed) or open-ended.
 The latter type of questions are often difficult to analyze and hence
should be avoided in a questionnaire to the extent possible.
 There should be some control questions in the questionnaire which
indicate the reliability of the respondent.
MERITS
 Low cost.
 Free from bias of the interviewer.
 Respondents have adequate time to give well thought out
answers.
 Respondents who are not easily approachable, can also
be reached conveniently.
 Large samples can be made use of and thus the results
can be made more dependable.
DEMERITS
 Low rate of return of duly filled in questionnaires.
 Can be used only when the respondents are educated
and co-operating.
 The control over questionnaire may be lost once it is
sent.
 There is inbuilt inflexibility because of the difficulty
of amending the approach once questionnaires have
been dispatched.
 COLLECTION OF DATA THROUGH SCHEDULES:
 This method of data collection is very much like the collection of data
through questionnaire, with little difference which lies in the fact that
schedules (Proforma containing a set of questions) are being filled in by the
enumerators who are specially appointed for the purpose.
 These enumerators along with schedules, go to respondents, put to them the
questions from the Proforma in the order the questions are listed and record
the replies in the space meant for the same in the Proforma.
 In certain situations, schedules may be handed over to respondents
and enumerators may help them in recording their answers to
various questions in the said schedules.
 Enumerators explain the aims and objects of the investigation and
also remove the difficulties which any respondent may feel in
understanding the implications of a particular question or the
definition or concept of difficult terms.
1. It is filled by the interviewer & is
never mailed to the respondent.
2. To collect data through schedule is
relatively more expensive.
3. Non response is generally very low
in case of schedules because they
are filled by interview.
4. In case of schedule we can identify
the respondent
5. It is usually used where the survey
is to be conducted of a relatively
small geographical area.
6. It is useful to illiterate people.
7. Wording is not in the form of
question here.
8. Along with schedules, observation
is also possible.
1. It is filled by respondent himself and
usually mailed to him
2. To collect data here is relatively
cheap & economical.
3. Non response rate is usually high as
many people dont respond & return
with semi filled questionnaire.
4. in this case it is not always clear as
to who replies.
5. Its generally used where the field of
enquiry is large & questionnaire can
be posted to different places.
6. It is not useful to illiterate people.
7. Wording is in the form of
questionnaire.
8. Observation is not possible in this
method.
SCHEDULE QUESTIONNAIRE
OTHER METHODS:
1. Warranty cards:
Warranty cards are usually postal sized cards which are used by
dealers of consumer durables to collect information regarding
their products. The information sought is printed in the form of
questions on the ‘warranty cards’ which is placed inside the
package along with the product with a request to the consumer to
fill in the card and post it back to the dealer.
2. Distributor or store audits:
Distributor or store audits are performed by distributors as well as
manufactures through their salesmen at regular intervals.
 Distributors get the retail stores audited through salesmen and use
such information to estimate market size, market share, seasonal
purchasing pattern and so on.
 The data are obtained in such audits not by questioning but by
observation.
3. Pantry audits:
 Pantry audit technique is used to estimate consumption of the
basket of goods at the consumer level.
 In pantry audit data are recorded from the examination of
consumer’s pantry.
 The usual objective in a pantry audit is to find out what types of
consumers buy certain products and certain brands, the assumption
being that the contents of the pantry accurately portray consumer’s
preferences.
 4. Consumer panels:
 An extension of the pantry audit approach on a regular basis is
known as ‘consumer panel’,
 A consumer panel is essentially a sample of consumers who are
interviewed repeatedly over a period of time.
 Two types: transitory and continuing
A. Transitory consumer panel:
 It is set up to measure the effect of a particular
phenomenon.
 Such a panel is conducted before and after basis.
 Initial interviews are conducted before the phenomenon
takes place to record the attitude of the consumer.
 A second set of interviews is carried out afterwards to
find out the consequent changes that might have
occurred in the consumers attitude.
 It is a favorite tool of advertising and social research.
B. Continuing consumer panel:
 Often set up for an indefinite period with a view to
collect data on a particular aspect of consumer behavior
over time, generally at periodic intervals or may be
meant to serve as a general purpose panel for
researchers on a variety of subjects.
 Used in the area of consumer expenditure, public
opinion etc.
5. Use of mechanical devices:
 The use of mechanical devices has been widely made to collect
information by way of indirect means.
 Eye camera, Pupilometric camera Psycho galvanometer, Motion
picture camera and Audiometer are the principal devices so far
developed and commonly used by modern big business houses,
mostly in the developed world for the purpose of collecting the
required information.
Eye camera
• To record the focus of eyes of a respondent on a
specific portion of a sketch or diagram or written
material.
• Useful in designing advertising material.
Psychogalvanometer
• Used for measuring the extent of body excitement as a
result of visual stimulus.
Motion picture cameras
• Used to record movements of body of a buyer while
deciding to buy a consumer good from a shop or big
store.
• Packaging and information label will stimulate the
buyer to perform certain physical movements.
Pupillometer
• Used to record dilation of the pupil as a result of
visual stimulus.
• The extent of dilation show the degree of interest
aroused by the stimulus.
Audiometers
• Used by some TV concerns to find out the type of
programmes as well as stations preferred by
people.
• A device is fitted in the TV instrument itself to
record these changes.
• Such data is used to find out market share of
competing TV stations.
6. Projective techniques:
 Projective techniques for the collection of data have been developed
by psychologists to use projections of respondents for inferring about
underlying motives, urges, or intentions which are such that the
respondent either resists to reveal them or is unable to figure out
himself.
 In projective techniques the respondent in supplying
information tends unconsciously to project his own attitudes
or feelings on the subject under study. Projective techniques
play an important role in motivational researches or in
attitude surveys.
 The use of these techniques requires intensive specialized training.
 In such techniques, the individual’s responses to the stimulus-
situation are not taken at their face value.
 The stimuli may arouse many different kinds of reactions. The nature
of the stimuli and the way in which they are presented under these
techniques do not clearly indicate the way in which the response is to
be interpreted.
 The stimulus may be a photograph, a picture, an inkblot and so on.
Responses to these stimuli are interpreted as indicating the
individual’s own view, his personality structure, his needs, tensions,
etc. in the context of some pre-established psychological
conceptualization of what the individual’s responses to the stimulus
mean.
Word association tests
Sentence completion tests
Story completion tests
Verbal projection tests
Pictorial techniques
Play techniques
Quizzes, tests and examinations
Sociometry
PICTORIAL TECHNIQUES
Thematic appreciation
test
Asked to describe set of pictures dealing with day to day
situations.
Draw inferences about personality, attitudes.
Rosenweig test
Uses a cartoon format where in a series of cartoons with words inserted
in balloons are present.
Asked to put in his own words in an empty balloon space provided for the
picture.
 Consists of 10 cards having inkblots.
Design happens to be symmetrical but
meaningless.
 Asked to describe what they perceive in
those symmetries.
Rorschach test
Holtzman inkblot test
Consists of 45 inkblot cards based on color, movement,
shading and other factors.
 Designed for group administration. consists of 25
plates each containing 3 sketches that may be
arranged in different ways to portray sequence of
events.
Tomkins horn picture
arrangement test
7. Depth interviews:
 Depth interviews are those interviews that are designed to discover
underlying motives and desires and are often used in motivational
research.
 Such interviews are held to explore needs, desires and feelings of
respondents. In other words, they aim to elicit unconscious as also
other types of material relating especially to personality dynamics
and motivations.
8. Content-analysis:
Content-analysis consists of analyzing the contents of documentary
materials such as books, magazines, newspapers and the contents of
all other verbal materials which can be either spoken or printed.
 VARIABLE:
 A characteristic ,which may take on different values,
that is which may vary in different persons, places
or things is called variable.
 It is any characteristic of an object that can be
measured or categorized.
 If a variable can assume a number of different
values such that any particular value is obtained
purely by chance , it is called a random variable.
VARIABLE
DEPENDENT VARIABLE
INDEPENDENT VARIABLE
 DEPENDENT VARIABLE :
 It is the outcome of interest, which should change in
response to some intervention.
 also known as a criterion variable, it is a variable or
construct the researcher hopes to understand, explain
and/or predict.
 INDEPENDENT VARIABLE:
 It is the intervention , or what is being manipulated.
 also called a predictor variable, it is a variable or
construct that influences or explains the dependent
variable either in a positive or negative way.
• Moderator variable = a variable that has an effect on the
independent – dependent variable relationship. The presence
of a moderator variable modifies the original relationship
between the independent and dependent variables by
interacting with the independent variable to influence the
strength of the relationship with the dependent variable.
• Mediating variable = also known as an intervening variable,
it is a variable that surfaces as a function of the independent
variable and explains the relationship between the dependent
and independent variables. Moderator variables specify when
certain effects will occur whereas mediators speak to how or
why such effects occur. Moreover, mediators explain how
external events take on internal psychological significance.
Price
Purchase
Likelihood
Price
Purchase
Likelihood
Independent Dependent
Variable Variable
Independent Dependent
Variable Variable
• Discount Level
• Restrictions
Moderator Variable
Price
Purchase
Likelihood
Independent Dependent
Variable Variable
Perceived
Value
Mediator Variable
(full mediation)
Price
Perceived
Value
Purchase
Likelihood
Mediator Variable
(partial mediation)
 More generally ,if we say one variable changes in
response to the other , we say that dependent variable is
the one that changes in response to the independent
variable.
 For example :
Tobacco causes oral cancer.
PRECAUTIONS IN DATA COLLECTION:
 1)standardization:
Using standard and universally accepted methods and
techniques, reduces the problem of comparison of data
collection with similar studies.
2)Training:
 Training of the personnel involved in data collection ensures
uniformity , accuracy and completeness of the data collection.
3)Pretesting:
 It is a mock trial of the exercise of data collection , but on a smaller scale. It
avoids ambiguities , inaccuracies and uncertainties.
4)Storage:
 If the time gap between data collection and analysis is long then ,some sort
of durable storage like registers , cards , folders , punch cards and
computers should be used.
 Care should be taken to protect the data.
 METHODS OF DEALING WITH NON RESPONDENTS:
 Unless non-response is confined to a small proportion of the whole
sample , the results cannot claim any general validity. Every effort
must be made to reduce non-response to negligible proportions.
 Non-response is sometimes more serious in case of postal
questionnaires.
 The first step is to send a follow up letter, but if this does not produce
any desired effect , the possibility of using more intensive methods
such as telephone calls and personal visits must be considered.
 In interviewer surveys , the amount of deliberate nonresponse is
usually very small.
 Revisits by the investigators may be tried.
 Call backs are also required
CONCLUSION
 Each method of data collection has its uses and none is superior in
all situations. For instance, telephone interview method may be
considered appropriate (assuming telephone population) if funds are
restricted, time is also restricted and the data is to be collected in
respect of few items with or without a certain degree of precision.
 In case funds permit and more information is desired, personal
interview method may be said to be relatively better.
 In case time is ample, funds are limited and much information is to
be gathered with no precision, then mail-questionnaire method can
be regarded more reasonable.
 Thus, the most desirable approach with regard to the selection of the
method depends on the nature of the particular problem and on the
time and resources (money and personnel) available along with the
desired degree of accuracy.
 But, over and above all this, much depends upon the ability and
experience of the researcher.
REFERENCES:
 C.R.Kothari, Methods of data collection, Research methodology
methods and techniques ,2nd edition,112-130
 B.K. Mahajan ,Sources and Presentation of Data, Methods in
Biostatistics, 6th edition,10-13
 K. Park, Medicine and Social Sciences, Park’s text book of
Preventive and Social Medicine. 21th edition,643-644
 JP Baride, Types of data ,Data collection , Manual of
Biostatistics,1st edition,4-9
P.S.S.Sundar Rao, Introduction to Research Methods,
Introduction to Biostatistics & Research Methods,4th
edition,182-187
T.Bhaskara Rao, Methods of Data Collection, Methods in
Medical Research,1st edition,131-152
Biostatistics: a quick guide to the use and choice of graphs
and charts
 There are several methods of presenting
data-
tables, charts, diagrams, graphs, pictures
and special row.
PRESENTATION OF DATA
Tabular Graphical
Simple table complex table For quantitative data For qualitative data
1. Histogram 1. Bar chart
2. Frequency polygon 2. Pictogram
3. Frequency curve 3. Pie chart
4. Line chart 4. Map diagram
5. Normal distribution curve
6. Cumulative distribution curve
7.Scatter diagram
 Def : “A table is a systematic arrangement of data
into vertical columns and horizontal rows”
Tabulation :
The process of arranging data into rows and columns
is called tabulation .
 Tabulation is the first step before the data is used for analysis or
interpretation.
 A table can be simple or complex ,there are certain general
principles which should be borne in mind in designing tables:
(a) the tables should be numbered e.g., table 1, table 2, etc.
(b) a title must be given to each table . The title must be brief
and self explanatory.
 (c) The headings of columns or rows should be clear and
concise,
 (d) the data must be presented according to size or
importance; chronologically
STATISTICAL TABLE
Statistical table has four parts
 The title
 The stub
 The box head
 The body
 in addition some tables have
 Prefatory note
 Foot note
 Source note
(i) Table Number:
Each table must be given a number.
(ii) Title of the Table:
It should be short & clear. It is either placed
just below the table number or at its right.
(iii) Caption:
Caption refers to the headings of columns.
(iv) Stub:
Stub refers to the headings of rows.
(v) Body
This is the most important part of a table. It
contains a number of cells. Cells are formed
due to the intersection of rows and column.
Data are entered in these cells.
vi) Head Note:
The head-note (or prefactory note) contains the unit of measurement of data.
It is usually placed just below the title or at the right hand top corner of the table.
(vii) Foot Note
A foot note is given at the bottom of a table. It helps in clarifying the point which is
not clear in the table. A foot note may be keyed to the title or to any column or to
any row heading. It is identified by symbols such as *,+,@,£ etc.
Source Note:
The source note shows the source of the data presented in the table. Reliability and
accuracy of data can be tested to some extent from the source note. It shows the
name of the author, title, volume, page, publisher’s name, year and place of
publication of the book or journal from which data are complied.
TYPES OF TABLES:
ON THE BASIS OF THE NUMBER OF
CHARACTERISTICS, TABLES MAY BE CLASSIFIED AS
FOLLOWS:
Simple or
one-way
Table
Two-way
Table
Manifold
Table
Simple or one-way Table:
A simple or one-way table is
the simplest table which
contains data of one
characteristic only. A simple
table is easy to construct
and simple to follow.
 Two-way Table: A table,
which contains data on
two characteristics, is
called a two way table.
In such case, therefore,
either stub or caption is
divided into two co-
ordinate parts.
 Manifold table: A table,
which has more than two
characteristics of data is
considered as a manifold
table.
 Manifold tables, though
complex are good in
practice as these enable full
information.
 Not more than four
characteristics should be
represented in one table to
avoid confusion.
FREQUENCY DISTRIBUTION TABLE
 In the frequency distribution table, the data is first split up into
convenient groups (class interval) and the number of items
(frequency) which occur in each group is shown in adjacent columns.
 Hence it is a table showing the frequency with which the values are
distributed in different groups or classes with some defined
characteristics.
RULES FOR CONSTRUCTION OF FREQUENCY TABLE
1) The class interval should not be too large or too small
2) The number of classes to be formed more than 8 and
less than 15
3) The class interval should be equal and uniform through
out the classification.
4) After construction of table, proper and clear heading
should be given to it
5) The base or source of data should be mentioned with the
pattern of analysis in footnote at the end of table
To present raw data, discrete or continuous in the form of a frequency
distribution, we must divide the range of the measurements in the data
into a number of non-overlapping intervals (or classes).
The intervals need not have the same width, but typically they are
constructed to have equal width this will make it easier to make
comparisons among different classes.
So how many intervals we should have?
Some authors suggest that there should be 10-20
intervals
Let n denote the total number of measurements or data
points.
The number of intervals =√n
Since √90=9.49, for the bacterial colony data, we will need
about 9 or 10 intervals to construct a frequency distribution
Use of zeros:
Zero should not be used in a table. When no case have been
found to exist or when the value of an item is zero, this is
indicated by means of dots (…..) or short dashes (-----).
Raw data:
Def. “collected data which have not been organized numerically
are called raw data”.
“An arrangement of raw numerical data in ascending
or descending order of magnitude is called an
Array”
 RELATIVE FREQUENCY AND CUMULATIVE
FREQUENCY
 To facilitate the interpretation of a frequency
distribution, it is often helpful to express the
frequency for each interval as a proportion or a
percentage of the total number of observations.
 A relative frequency distribution shows the
proportion of the total number of measurements
associated with each interval.
 Absolute frequency for a particular interval
Total number of measurements.
Relative frequencies are useful for comparing
different sets of data containing an unequal number
of observations.
 The cumulative relative frequency for an interval is
the proportion of the total number of measurements
that have a value less than the upper limit of the
interval.
 The cumulative relative frequency is computed by
adding all the previous relative frequencies and the
relative frequency for the specified interval.
 The cumulative relative frequency is also useful for
comparing different sets of data with an unequal
number of observations.
 The relative frequency
for the class (162.5,
212.5) is 19/90≈ 0.21, or
19/90 × 100% 21.0%.
 For example, the
cumulative relative
frequency for the interval
(262.5, 312.5) is the sum,
0.02+ 0.21 + 0.06 + 0.30
= 0.59, or 59%. This
means that 59% of the
total number of
measurements is less
than 312.5
CLASS LIMITS AND CLASS BOUNDARIES:
Def: “each class is defined by two numbers, these numbers are called
class limits. The smaller number is called lower class limit and larger
number is called upper class limit”
Example: 45 and 49
Lower class limit Upper class limit
 As measurements are seldom are exact so,
45kg is interpreted as (weight lying between 44.5kg & 45.5kg)
Similarly,
49kg is interpreted as (weight lying between 48.5kg & 49.5kg)
 The values 44.5 and 49.5 are called true class limits or class boundaries.
FREQUENCY DISTRIBUTION TABLE
Table 3
Age distribution of polio patients
Age Number of patients
O-4 35
5-9 18
10-14 11
15-19 8
20-24 6
FREQUENCY DISTRIBUTION:
DISCRETE DATA
 Discrete data: possible values are countable
Example: An
advertiser asks 200
customers how many
days per week they
read the daily
newspaper.
Number of days
read
Frequency
0 44
1 24
2 18
3 16
4 20
5 22
6 26
7 30
Total 200
RELATIVE FREQUENCY
Relative Frequency: What proportion is in each
category?
Number of days
read
Frequency
Relative
Frequency
0 44 .22
1 24 .12
2 18 .09
3 16 .08
4 20 .10
5 22 .11
6 26 .13
7 30 .15
Total 200 1.00
.22
200
44

22% of the
people in the
sample report
that they read
the newspaper
0 days per week
Class Frequency, f
1 – 4 4
5 – 8 5
9 – 12 3
13 – 16 4
17 – 20 2
Frequency Distributions
A frequency distribution is a table that shows classes or
intervals of data with a count of the number in each
class. The frequency f means the number of times a
certain value of variable is repeated.
Frequencies
Class Frequency, f
1 – 4 4
5 – 8 5
9 – 12 3
13 – 16 4
17 – 20 2
Class width
The class width is the distance between lower (or
upper) limits of consecutive classes.
The class width is 3.
4 – 1 = 3
8 – 5 = 3
12 – 9 =
313-16=3
Constructing a Frequency Distribution
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Example:
The following data represents the ages of 30 students in a
statistics class. Construct a frequency distribution that
has five classes.
Ages of Students
Constructing a Frequency Distribution
Example continued:
250 – 57
342 – 49
434 – 41
826 – 33
1318 – 25
Tally Frequency, fClass
30f 
Number
of
students
Ages
Check that
the sum
equals the
number in
the sample.
Ages of Students
Midpoint
The midpoint of a class is the sum of the lower and
upper limits of the class divided by two. The midpoint is
sometimes called the class mark.
Midpoint = (Lower class limit) + (Upper class
limit) 2
Frequency, fClass Midpoint
41 – 4
Midpoint =
1
2
4 5
2
 2.5
2.5
Relative Frequency
Class Frequency, f
Relative
Frequency
1 – 4 4
The relative frequency of a class is the portion or
percentage of the data that falls in that class. To find the
relative frequency of a class, divide the frequency f by
the sample size n.
Relative frequency =Class frequency
Sample size
Relative frequency 8
4
1
 0.222
0.222
f
n

18f 
f
n

Cumulative Frequency
The cumulative frequency of a class is the sum of the
frequency for that class and all the previous classes.
30
28
25
21
13
Total
number
of
students
+
+
+
+50 – 57 2
3
4
8
13
42 – 49
34 – 41
26 – 33
18 – 25
Frequency, fClass
30f 
CumulativeF
requency
Ages of Students
REVIEW OF BAR GRAPHS
 A bar graph is a visual display used to compare
the amounts or frequency of occurrence of
different characteristics of data.
 This type of display allows us to:
 compare groups of data, and
 to make generalizations about the data quickly.
BAR GRAPH
 The data presented is categorical
 Data is presented in the form of rectangular bar of
equal breadth.
 Each bar represent one variant /attribute.
 Suitable scale should be indicated and scale starts
from zero.
 The width of the bar and the gaps between the bars
should be equal throughout.
 The length of the bar is proportional to the
magnitude/ frequency of the variable.
 The bars may be vertical or horizontal.
RULES OF MAKING SIMPLE BAR CHATS.
1
• Vertical bars are used to represent data classified on
quantitative or chronological basis .
2
• Horizontal bars are used to represent data classified on
qualitative or geographical basis.
3
• The bar should neither be short and wide nor very long and
narrow.
4
• Bars should be separated by spaces which are not less than
half the width of a bar and greater than the width of a bar.
PARTS OF A BAR GRAPH Graph Title--The graph title gives an
overview of the information being
presented in the graph. The title is given
at the top of the graph.
 Axes and their labels--Each graph has
two axes. The axes labels tell us what
information is presented on each axis.
 Grouped Data Axis--The grouped data
axis is always at the base of the bars.
This axis displays the type of data being
graphed.
 Frequency Data Axis--The frequency
axis has a scale that is a measure of
the frequency or amounts of the
different data groups.
 Axes Scale-- Scale is the range of
values being presented along the
frequency axis.
 Bars--The bars are rectangular blocks
that can have their base at either
vertical axis or horizontal axis (as in this
example). Each bar represents the data
for one of the data groups.
 Key or legend explains any additional
information is found on the graph.
GRAPH
TITLE
Grouped
Data Axis-
KEY
Frequency
Data Axis-
Axes
Scale-
While tables are more exact in their presentation of data,
they do not allow the quick visual view of the data. Bar
graphs provide one way to present data so that we can
get an overview at a glance
 Represent qualitative data.
 Only one variable also called as one variable bar
chart.
 Each category of the variable is represented by a
bar.
Limitation:
 Represents only one classification.
 Cannot be used for comparison
Simple bar diagram
0
10
20
30
40
50
60
FIRST SECOND THIRD FOURTH
NOOFSUBJECTS
SHOWING DISTRIBUTION OF SUBJECTS YEAR
WISE
NO.OF SUBJECTS
204TOTAL
39FOURTH
53THIRD
55SECOND
FIRST 57
B. D. S.
YEAR
DISTRIBUTION OF STUDY BDS SUBJECTS,
YEAR WISE
SIMPLE BAR DIAGRAM
TOTAL NO.
OF
SUBJECTS
EXAMPLE
0 200 400 600 800 1000 1200
CHINA
INDIA
INDONESIA
JAPAN
PAKISTAN
1088
816
175
123
106
POPULATION(MILLION)
country
population (million)
 It is also called as grouped bar chart, compound
bar chart
 Used to display information from tables
containing two or three variables.
 An example of a grouped bar chart can be
demonstrated by the variable “gender”, which
has two categories male and female.
 Bars within a group are usually joined.
 There must be a legend to indicate what
categories the bar represent.
 Length corresponds to the frequency.
Multiple bar diagram
MULTIPLE BAR CHARTS
 Also called compound bar charts
 Also called as stacked bar graph.
 Represent qualitative data.
 Both, the number of cases in major groups as well as the
subgroups simultaneously.
 In a stacked bar chart, the bar represents the total number
of cases that occurred in a category, the segments in the
bar graph represents the frequency of cases within the
category.
 Each rectangle is divided according to no in the subgroups.
Component bar diagram
 Stacked bar chart should be used with caution
because they are very difficult to interpret.
 Except for the bottom category, the categories do
not rest on a flat baseline, when category of the
variable ends the next begin.
 Stacked bar chart are deceptive, so they are often
used to exaggerate or hide information.
 Represent qualitative data.
 100% component bar chart is a variant of the stacked bar chart.
 All the bars are of the same height and show the variable
categories as percentages of the total rather than the actual values.
 A set of 100% bar charts can be used instead of multiple pie
charts, because its easier to make comparisons between bars than
between pies.
Proportional bar diagram
COMPONENT OR PROPORTIONAL BAR DIAGRAM
10 15
10
30
80
55
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Poor Community Rich Community
Proportion of energy intake obtained from various food
stuff by poor and rich community
% of energy obtained Fats
% of energy obtained
Protein
% of energy obtained
Carbohdrate
DOT PLOTS
 Another variant of bar chart
that is particularly useful when
there are many categories is
the dot plot.
 Instead of a bar, just a heavy
dot is placed where the end of
the bar would be.
 When there are many labels,
smaller dots that extend back
to the labeled axis are often
used to make the chart easier
to read.
PARETO BAR CHART
 Bar chart arranged in
descending order of height from
left to right
 This means the categories
represented by the tall bars on
the left are relatively more
significant than those on the
right.
 Is a special form of vertical
graph which help us to
determine which problem to be
solve in what order
REVIEW OF CIRCLE GRAPHS
PIE CHART
 Circle graphs, also called pie charts, are a type of
graph used to represent a part to whole
relationship.
 PROPERTIES OF CIRCLE GRAPHS:
 They are circular shaped graphs with the entire
circle representing the whole.
 The circle is then split into parts, or sectors.
 Each sector represents a part of the whole.
 Each sector is proportional in size to the amount
each sector represents, therefore it is easy to make
generalizations and comparisons.
 Expressed in percentages.
 Angle at the centre of the circle is equal to 3600
 Class frequency
Total observations
X 3600
 Graph Title--A graph title
gives an overview of the
information displayed in
the graph. The title is
given at the top of the
graph.
 Sectors--Each sector
represents one part of
the whole. The size of
each sector represents
its fraction of the whole.
 Sector Labels--The label
of each sector indicates
the category of
information it refers to,
and may also give
numeric data (often a
percentage) so we know
the size of each sector.
Graph Title-
Sectors-
Sector
Labels-
 Circle graphs/pie charts should be used sparingly
for two reasons.
 First, they are best used for displaying statistical
information when there are no more than six
components only—otherwise, the resulting picture
will be too complex to understand
 Second, circle graphs/pie charts are not useful
when the values of each component are similar
because it is difficult to see the differences between
slice sizes.
 Impress the frequency of the occurrence of events to
common man such as attacks, deaths, number operated,
admitted, discharged, accidents etc in a population.
 It is a popular method of presenting to the “man in the
street”
 Most useful way of representing data to those people
who cannot understand.
 Small pictures or symbols are used to present the data.
 In essence pictogram are a form of bar chart.
Pictogram
PICTURE OF DOCTOR TO REPRESENT THE POPULATION
PER PHYSICIAN
 To show geographical distribution of frequencies
of characteristic.
 Also called as cartogram.
Map diagram or spot map
HISTOGRAM
 Used for Quantitative, Continuous, Variables.
 It is similar to bar graph except it is used with
interval or ratio variables.
 It is used to present variables which have no
gaps e.g age, weight, height, blood pressure, blood
sugar etc.
 It consist of a series of blocks. The class intervals
are given along horizontal axis and the frequency
along the vertical axis.
 To draw a histogram (for equal class intervals)
 ,class boundaries are marked along the x-axis
and
frequencies are marked along on y-axis.
THE FOLLOWING ARE A FEW GENERAL COMMENTS ABOUT
HISTOGRAMS:
 Histograms serve as a quick and easy check o the
shape of a distribution of the data.
 The construction of the graphs is subjective.
 The shape of the histograms depends on the width and
the number of class intervals.
 Histograms could be misleading.
 Histograms display grouped data. Individual
measurements are not shown in the graphs.
 Histograms can adequately handle data sets that are
widely dispersed.
 A frequency polygon is a short hand way of
presenting histogram by putting a dot at the centre
of the top of each bar and connecting these dot with
a line.
 In this way a graph called frequency polygon is
created.
 Shape of the distribution is easily seen in
frequency polygon than in histogram
FREQUENCY POLYGON:
 A frequency polygon is a many sided closed figure. it is
constructed by plotting the class marks(mid-points) and
then joining the resulting points by means of straight lines
 A frequency polygon can also be obtained by joining the
mid points of the tops of rectangles in the histogram
Frequency polygon for frequency distribution of weight of
120 students
Weight (kg)
RELATIVE FREQUENCY HISTOGRAM AND
RELATIVE FREQUENCY POLYGON
 Graphic representation of relative frequency distribution
can be obtained from the histogram or frequency
polygon simply by changing the y-axis from frequency to
relative frequency on a graph
 The resulting graphs are called relative frequency
polygon or percentage frequency polygon respectively
CUMULATIVE FREQUENCY POLYGON OR OGIVE
 A graph showing the cumulative frequencies plotted
against the upper class boundaries is called a cumulative
frequency polygon or an ogive
 Cumulative frequency is used to determine the number of
observations that lie above (or below) a particular value.
 If we use relative cumulative frequencies in place of
cumulative frequencies, the resulting graph is called a
relative cumulative frequency polygon or percentage
ogive
 The graphs corresponding to a “less than” and an “or
more” cumulative frequency distributions are called “less
than "and “or more” ogives respectively
 Less than ogive: Here the cumulative frequencies are
plotted against the upper boundary of respective class
interval.
 Greater than ogive: Here the cumulative frequencies are
plotted against the lower boundaries of respective class
intervals.
0
20
40
60
80
100
120
140
Ogive for the less than commulative
frequency distribution of weight of 120
students
COMMULATIVE
FREQUENCY
FREQUENCY CURVE
 Smoothed curve by joining the lowest and highest points
of frequency polygon is frequency curve
0
5
10
15
20
25
30
47 52 57 62 67 72 77 82 87 92 97
noofstudent
(frequency)
frequency curve for the frequency
distribution of weight of 120 students
COMMON SHAPES OF FREQUENCY CURVE
COMMON SHAPES OF FREQUENCY CURVE
 The frequencies curves arising in
practice take on certain
characteristics shapes and are
generally classified as
1) Symmetrical or bell shaped curve
2) Moderately asymmetrical curve
3) J shaped and reverse j shaped
curves
4) U shaped curves
5) Bimodal & multimodal curve
 Symmetrical or bell shaped curve (observations are
equidistant from the central maximum)
 Moderately asymmetrical curve (in these curves ,
the tail of the curve to one side of the central
maximum is longer than that to the other)
 Positively skewed distributions have a relatively
large number of low scores and a small number of
very high scores.
 Negatively skewed distributions have relatively
large number of high scores and a small number of
low scores.
 J shaped and reverse j shaped curve (a j shaped curve
starts at a low point on the left hand and goes higher and
higher towards extreme right and reverse j shaped curve
starts with a high point on the right and goes to the
extreme left)
J-shaped
curve
Reverse
J-shaped
curve
 U shaped curve (a frequency curve with a low spot
on middle and high spots at both curves)
 Bimodel and multimodel frequency curve (a bimodel
curves has 2 maximas while a multimodel frequency
curve has more than 2 maximas)
STEM LEAF PLOTS
The steps for histogram are;
1.Rank order the data
2.Find the range
3.Choose an appropriate width to yield about 10-20
Intervals
4.Make a new table consisting of intervals, their
Midpoints, the count and a cumulative total.
5.Turn this into a histogram.
6.Lose some information along the way, consisting
of the exact values.
Tukey (1977) devised stem and leaf plot that consisted
of only 3 steps
 Stem and leaf plot is a method of organizing data
that uses of part of the data as “stem” and part as a
“leaves”.
 Series of no in a column called “stem”,which is
the major part of the observed value
 Remaining trailing digits in the row called the
“leaves”
ADANTAGES
 An important advantage over a histogram is that
it provides all information contained in the
histogram while preserving the value of the
individual observation.
 It is quick
 It is easy to determine the median and range of the
data from the plot.
 It can be used to quickly organize a large list of
data values.
 Outliers, data clusters, or gaps are easily visible.
 DISADVANTAGES:
 A stem and leaf plot is not very informative for a
small set of data.
THE JOINT DISTRIBUTION GRAPH
2/23/2016 232
 When we study the relationship between two variables we refer to the
data as bivariate.
 One graphical technique we use to show the relationship between
variables is called a scatter diagram.
 To draw a scatter diagram we need two variables. We scale one
variable along the horizontal axis (X-axis) of a graph and the other
variable along the vertical axis (Y-axis).
4-233
 The correlation between two variables, labeled x and y, can
range from nonexistent to strong.
 If the value of y increases as x increases, the correlation is
positive;
 If y decreases as x increases, the correlation is negative.
 The graph, however, does not reveal the probability that such
a relationship could have occurred by chance
 The graph does not provide quantitative information about
how strong the association is (although it looks strong to the
eye).
ADDITIONAL NOTES ABOUT
SCATTERPLOTS
 If the relationship is thought to be a causal one, then the
independent variable is represented along the x-axis
and the dependent variable on the y-axis
 A scatterplot can show that there is a positive, negative,
constant, or no relationship (correlation) between the
variables.
 Positive: As the value of one variable increases, so does the
other.
 Negative: As the value of one variable increases, the other
decreases.
 Constant: As the value of one variable increases (or
decreases), the other remains constant.
 No relationship: There is no pattern to the points.
2/23/2016
236
A scatter plot is often employed to identify potential associations between two variables.
ADVANTAGES OF SCATTERPLOTS
 A scatter plot is one of the best ways to determine if two
characteristics are related.
 A scatterplot may be used when there are multiple trials
for the same input variable in an experiment.
DISADVANTAGES OF SCATTERPLOTS
 When a scatterplot shows an association between two
variables, there is not necessarily a cause and effect
relationship.
 Both variables could be related to some third variable
that explains their variation or there could be some other
cause. Alternatively, an apparent association could
simply be a result of chance.
LINE GRAPHS
 Line graph is used to illustrate the
relationship between two variables.
 Line chart or line graph is a type of chart
which displays information as a series of data
points called 'markers' connected by straight
line segments.
 It is similar to a scatter plot except that the
measurement points are ordered (typically by
their x-axis value) and joined with straight line
segments.

Parts of a Graph (line)
 Line charts show how a particular data changes at
equal intervals of time.
 A line chart is often used to visualize a trend in data over
intervals of time – a time series – thus the line is often
drawn chronologically
 Use percentages on Y axis when more than one distribution is
to be shown.
 Works well when
 the data is paired (bi-variate);
 the data is continuous.
ADVANTAGES OF LINE GRAPHS
 The line graph is especially useful when there are a large
number of values to be ploted, that is, when you have a
continuous variable with an unlimited number of possible
points.
 It also allows the presentation of several sets of data on one
graph.
 A line graph is a way to summarize how two pieces of
information are related and how they vary depending on one
another.
 DISADVANTAGES OF LINE GRAPH
 Changing the scale of either axes can dramatically change the visual
impression of the graph.
LINE GRAPH EXAMPLE
65
66
67
68
69
70
71
72
73
74
75
1991 1992 1993 1994 1995
John'sWeightinKilograms
Year
John weighed 68 kg in 1991, 70 kg in 1992, 74 kg in 1993, 74 kg in
1994, and 73 kg in 1995.
 If you want to create a graph that understands the
spread of the data and the median???
 The owner of the dental clinic wants to find out from
where his patients are coming from one day he
decided to gather data about the distance that
people commuted to get his clinic.
 Patient reported the following distances;
 5 6 12 13 14 15 18 22 50
REVIEW OF BOX AND WHISKER PLOT
DECILES AND PERCENTILES
Percentiles:
Percentiles are intended to divide data into 100 parts.
25th percentile is the Q1,
50th percentile is the Median (Q2) and
the 75th percentile of the data is Q3.
Deciles: If data is ordered and divided into 10 parts, then cut points are called
Deciles
QUARTILES: quartiles, which divide data into approximately four equal parts
Q1 is the median of the first half of the ordered observations and Q3 is
the median of the second half of the ordered observations.
In notations, quartiles of a data is the ((n+1)/4)qth observation of the data,
where q is the desired quartile and n is the number of observations of
data.
FIVE NUMBER SUMMARY
Five Number Summary:
The five number summary of a distribution consists of the
SMALLEST (MINIMUM) OBSERVATION,
THE FIRST QUARTILE (Q1),
THE MEDIAN(Q2),
THE THIRD QUARTILE,
AND THE LARGEST (MAXIMUM) OBSERVATION
Written in order from smallest to largest.
BOX PLOT: A BOX PLOT IS A GRAPH OF THE FIVE NUMBER
SUMMARY.
 The range as a measure of variability has a major
deficiency because it is sensitive to two extreme
values (which two values?).
 It is desirable to have a measure of dispersion that
is not easily influenced by a few extreme values.
 Interquartile range is such a measure.
 The sample interquartile range (IQR) is the
distance between Q1 and Q3.
 IQR = Q3 − Q1.
 An outlier is a data value that is much smaller or
much larger than the other values in the data set.
 IQR=Q3-Q1
 Test for outliers are:
 1) find IQR
 2) multiply (1.5) IQR
 3)subtract Q1- (1.5) IQR
 4)add Q3+(1-5)IQR
 Any value less than the value in step3 or more than
the value in step4 is an outlier.
OUTLIERS.
 Data often contain outliers. Outliers can occur due to a
variety of reasons: an inaccurate instrument,
mishandling of experimental units, measurement errors,
or recording errors such as incorrectly typed values or
misplacement of a decimal point.
 The observations may have been made about a subject
who does not meet research criteria; for example, a
research study on hypertension may have included a
patient with low blood pressure, due to experimenters’
oversight. An outlier can also be a legitimate observation
that occurred purely by chance.
 If we can explain how and why the outliers occurred,
they should be deleted from the data.
 The box plots are especially useful for comparing
two or more sets of data.
 It helps us to see the spread of the data .
The main features of a box plot, including
outliers or extreme values excluded
from the range
 The table below shows the length of time required to splint an avulsed tooth
with alveolar fracture for patients 18 years old or younger and for those older
than 18 years.
ADVANTAGES OF BOX AND WHISKER PLOTS
 Immediate visuals of a box-and-whisker plot are
the center, the spread, and the overall range of
distribution.
 Box plots are useful for comparing data sets,
especially when the data sets are large or when
they have different numbers of data elements.
DISADVANTAGES OF BOX AND WHISKER PLOTS
It shows only certain statistics rather than
all the data.
Since the data elements are not
displayed, it is impossible to determine if
there are gaps or clusters in the data.
 ERROR BARS
When the data is normally distributed and quantitative data.
For such data, the mean is the most appropriate measure of the population
average, and variability is typically represented by the 95% confidence
interval (CI). These items of information are shown graphically as an error plot
.
In some instances, you may see error plots where the line indicating the value
of the 95% CI is terminated with horizontal bars.
Figure 5. Examples of two forms of
error bars, both indicating the 95%
confidence
ERROR BARS
GROUP DIFFERENCES: 95%CI RULE OF
THUMB
 Error bars can be informative
about group differences, but you
have to know what to look for
Rule of thumb for 95% CIs:
 If the overlap is about half of one
one-sided error bar, the
difference is significant at ~ p <
.05
 If the error bars just abut, the
difference is significant at ~ p<
.01
Cumming & Finch, 2005
Rule of Thumb for SEs
 when gap is about the size
of a one-sided error bar, the
difference is significant at p
< .05
 when the gap is about the
size of two one-sided error
bars, the difference is
significant at about p < .01
GROUP DIFFERENCES: SE RULE OF THUMB
Cumming & Finch, 2005
Different levels of overlap in error bars: (a) large amount of
overlap, that includes the mean values, (b) small amount of
overlap, that does not include the mean values, (c) no overlap
FOREST PLOT OR BLOBBOGRAM
 Graph that displays information about the studies
contributing to Meta analysis, along with the
information about the synthesis of those studies.
 The graph is so called because of forest of lines
in the graph.
 The graph is also called as blobbogram because
of blobs are used to represent the weight given to
each study.
CONFIDENCE INTERVALS
DIAMOND IN META ANALYSIS
Diamond on Left of the line of no effect
Less episodes of outcome of interest in treatment
group
Diamond on Right of the line of no effect
MoRe episodes of outcome in treatment group
Diamond touches the line of no effect
No statistically significant difference between groups
Diamond does not touch the line of no effect
Difference between two groups statistically significant
277
References:
1. C.R.Kothari; Research Methodology, Methods & Techniques; 2nd ed., Pg 95-103
2. Kim & Dailey; Biostatistics for Oral Health Care; Ch. 1, 2 & 3.
3. J.V.Dixit; Principles & Practice of Biostatistics; Ch. 1, 2 & 3.
4. Rao & Murthy; Applied statistics in health sciences; 2nd ed., Ch. 1, 4, 5, 6 & 7.
5. B.K Mahajan; Methods in Biostatistics, Jaypee Publications,6th edition, pg 88-103.
6. Park. Textbook of social and preventive medicine, 22nd ed.
7. Cumming G ,Fidler F, and Vaux DL;Error bars in experimental biology;JCB
;volume 177;2007
8 C.R.Kothari, Methods of data collection, Research methodology methods and techniques
,2nd edition,112-130
9 B.K. Mahajan ,Sources and Presentation of Data, Methods in Biostatistics, 6th
edition,10-13
10 K. Park, Medicine and Social Sciences, Park’s text book of Preventive and Social
Medicine. 21th edition,643-644
11 JP Baride, Types of data ,Data collection , Manual of Biostatistics,1st edition,4-9
12. Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for
graphically based data interpretation. Canadian Journal of Experimental
Psychology, 57(3), 203-220.
279
Thank
You
Data in Research

Data in Research

  • 2.
    DATA IN RESEARCHMETHODOLOGY PRESENTED BY DR RIPIKA SHARMA DEPARTMENT OF PUBLIC HEALTH DENTISTRY MRADC
  • 3.
    CONTENTS:  Data &types of data  Collection of data  What is data collection?  Why to collect data?  Methods of data collection  Advantages & disadvantages of information collection tools
  • 4.
     Problems indata collection  Precautions in data collection  Conclusion  Different methods of presentation of data  References
  • 5.
    WHAT IS DATA When ever an observation is made , it will be recorded and a collective recording of these observations either numerical or otherwise is called as data .  Data are distinct pieces of information, usually formatted in a special way.  data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word
  • 6.
     observation maybe collected in a simple way like recording the sex of a person in a group or noting down the number of cases of a disease in a community or may be done through an experiment such as counting the total wbc in a given volume of blood, etc of an individual.  In each of the above cases certain observation is made of a characteristic and these characteristic which varies from one observation to the other is called as variable.
  • 7.
    TYPES OF DATAAND LEVELS OF MEASUREMENT
  • 8.
    MEASUREMENT AND MEASUREMENTSCALES  Measurement scales were introduced by Stevens.  Measurement : This may be defined as the assignment of numbers to objects or events according to set of rules.  The various measurement scales results from the fact that measurement may be carried out under set of rules.
  • 9.
    • Why dowe need to know what type of data we are dealing with? • The data type or level of measurement influences the type of statistical analysis techniques that can be used when analysing data. . Data types – important?
  • 10.
    TYPES OF DATA In general, data can be classified according to : 1. Based on Characteristic: 2. Based on Source: 3. Based on Field: 4. Based on Content:
  • 11.
     Based oncharacteristic: QUALITATIVE DATA QUANTITATIVE DATA
  • 12.
     Qualitative (orcategorical) data • Represents a particular quality , also named as attributes.  Consist of values that can be separated into different categories that are distinguished by some nonnumeric characteristic.  Qualitative variable are measured either on a nominal or ordinal scale.  Persons with the same characteristics are counted to form 1grp Ex classes vaccinated, sex, religion, nationality ,color of the eyes ,on drug ,on placebo etc.  these characteristic are called “attributes” or “attributive variates” or “descriptive characteristics”
  • 13.
     Quantitative data Data consist of values representing counts or measurements.  Quantitative variables are measured on an interval or ratio scale.  Quantitative data can be further classified into :-  DISCRETE  CONTINUOUS
  • 14.
     A DISCRETEVARIABLE:  Is a random variable , where the variable under observation can take only fixed values in a given range like whole numbers or the variable jumps from one number to another without taking in between values the data is called discrete data.
  • 15.
    THE FOLLOWING VARIABLEARE DISCRETE:  The number of DMF teeth. it can be any one of the 33 numbers, 0,1,2,3,4………….32.  The size of a family.  The number of erupted permanent teeth.  The number of patients with osseous disease.
  • 16.
    CONTINUOUS VARIABLE  Arandom variable that can take on a range of values or a continuum;  Continuous variable are:  Treatment time.  Pocket depth.  Amount of new bone growth.  Concentration level of anesthesia.  Acidity in saliva.  Hemoglobin percentage level.
  • 17.
     Attribute dataAnd Variables data  ATTRIBUTE DATA  Attribute data give you counts representing the presence or absence of a characteristic or defect.  As an example, if you are concerned with timely delivery of parts by your store keepers, you could develop a procedure that would give you a count of the number of supply parts they deliver on time and the number they deliver late (defects). This would give you attribute data, but it would not tell you how late a delivery actually was.
  • 18.
     VARIABLES DATAare based on measurement of a key quality characteristic produced by the process. Such measurements might include length, width, time, weight, or temperature, to name a few.  For example: the total time from receipt of the request to delivery of the part. This measurement, time, could be used to determine how timely or late the deliveries were.
  • 19.
  • 20.
    Internal sources ofData o Many institutions and departments have information about their regular functions,for their own internal purposes. o When those information are used in any survey is called internal sources of data. o Eg…social welfare socities. External sources of data o When information is collected from outside agencies is called external sources of data. o Such types of data are either primary or secondary. o This type of information can be collected by census or sampling method by conducting survey. Internal & External Sources of Data
  • 21.
    2.BASED ON SOURCE: Thereare two sources of data collection techniques. Primary data Secondary data collection techniques. Primary data collection uses surveys, experiments or direct observations. Secondary data collection may be conducted by collecting information from a diverse source of documents or electronically stored information, census and market studies are examples of a common sources of secondary data. This is also referred to as "data mining."
  • 22.
    PRIMARY DATA Primary datameans original data that has been collected specially for the purpose in mind. It means someone collected the data from the original source first hand. Data collected this way is called primary data. Primary data has not been published yet and is more reliable, authentic and objective. Primary data has not been changed or altered by human beings; therefore its validity is greater than secondary data.
  • 23.
    Merits Targeted issued are addressed Datainterpretation is better High accuracy of data Address as specific research issues Greater control Demerits Evaluated cost Time consuming More number of resources are required Inaccurate feedback Required lot of skill with labour. Primary Data
  • 24.
    SECONDARY DATA Secondary datais the data that has been already collected by and readily available from other sources. secondary data is data that is being reused. Such data are more quickly obtainable than the primary data. These secondary data may be obtained from many sources, including literature, industry surveys, compilations from computerized databases and information systems, and computerized or mathematical models of environmental processes.
  • 25.
    Merits Quick and cheapsource of data Wider geographical area Longer orientation period Leading to find primary data Demerits No fulfill our specific research needs Poor accuracy Data are not up to date Poor accessibility in some cases Secondary Data
  • 26.
    Primary data  Realtime data  Sure about sources of data  Help to give results/ finding  Costly and time consuming process  Avoid biasness of response data  More flexible Secondary data  Past data  Not sure about of sources of data  Refining the problem  Cheap and no time consuming process  Can not know the data biasness  Less flexible Difference b/w primary and secondary data
  • 27.
    3)Based on field: In computer database management software data is arranged in tabular form.  The columns are called fields and the rows are records.  Common types of fields are : a) Character type: eg- Name, Address etc., b) Numeric type :eg - Height, weight, blood sugar level, serial number etc., c) Data type: eg- date of birth, date of admission, data of discharge etc., d) Logical type: eg- dichotomous data like sex-male/female, residence-urban/rural.
  • 28.
    4)Based on contents: ORDINALSCALE NOMINAL SCALE INTERVAL SCALE RATIO SCALE
  • 29.
    THE NOMINAL SCALE The lowest measurement scale  It consist of named categories with no implied order among categories.  Represents the simplest type of data  The categories in nominal measurement scale have no quantitative relationship to each other .  Observations are placed into broad categories which may be denoted by symbols or labels or names.  Statisticians uses numbers to identify the categories, for example,0 for females and 1 for males.  The numbers are simply alternative labels
  • 30.
    THE NOMINAL SCALE……CONTI..  The categories available cannot be placed in any order  no judgement can be made about the relative size or distance from one category to another  Although attributes are labeled with numbers instead of words, the order and the magnitude of the number do not have any meaning at all.  Numerical values allows to perform the data analysis and are used only for the sake of convenience.  Nominal Data Reflect Qualitative Differences Rather Than Quantitative Ones
  • 31.
    DICHOTOMOUS DATA  Greekword meaning “cut into two”  Variables that have only two responses i.e. Yes or No, are known as dichotomies.  Have no prior qualitative direction.  Example: male/female , treatment or placebo  Have an implied direction that is favorable - Well/sick, living/dead, normal/abnormal - . - Examples: - Yes/no response for a survey questionnaire - Marital status - gender What is your gender? (please tick) Male Female
  • 32.
    ORDINAL SCALE  Categoriesor observation are ranked or ordered.  The amount of difference between the categories though ordered, it cannot be quantified.  Post- surgery pain can be classified according to its severity  0 – represents no pain  1- mild pain  2- moderate pain  3-severe pain  4- extremely severe pain
  • 33.
    ORDINAL SCALE…CONT..  Individualsmay be classified according to socio economic status as Low,medium,high.  Examples:  Disease state of cancer [stage1 , stage 2,…]  Tooth mobility  Silness- loe gingival index  Millers classification of root exposure.
  • 34.
    INTERVAL MEASUREMENT  moresophisticated scale.  With this scale it is not only possible to order measurements, but also the distance between any two measurements is known, its fixed and equal.  There is no meaningful absolute zero.  The temperature zero degree in Celsius or Fahrenheit does not mean the total absence of temperature.
  • 35.
    INTERVAL MEASUREMENT.. CONT.. Examples:  Interval are not common as other scales  -IQ score representing the level of intelligence IQ score of 0 is not indicative of no intelligence  Statistician knowledge represented by a statistics test score. the test score zero does not necessarily mean that the individual has zero knowledge in statistics,
  • 36.
    THE RATIO MEASUREMENTSCALE  There exist a true zero  Possesses same properties of interval scale  Most of the measurement scales in health sciences are ratio scale  Weights in pound , patient waiting time in dental office.  Zero waiting time means patient did not have to wait  The ratio measurement scale allow us to perform all arithmetic operations on the number .  The resulting numerical value has sensible meaning.
  • 37.
     Example:  Treatmentcost  Saliva flow rate  Length of root canal  Diastema  Sugar concentration in blood  In general interval and ratio scales contain more information than do nominal or ordinal scale.
  • 38.
    • Nominal datais the least complex and give a simple measure of whether objects are the same or different. • Ordinal data maintains the principles of nominal data but adds a measure of order to what is being observed. • Interval data builds on ordinal by adding more information on the range between each observation by allowing us to measure the distance between objects. • Ratio data adds to interval with including an absolute zero. Hierarchical data order
  • 39.
    • Knowing thehierarchy of data is useful. •Why? It is possible to recode or adjust certain types of data into others. • Can go from most complex (interval and ratio) to least complex (nominal) but cannot go the other way around. • Interval/ratio can be re-formatted to become ordinal or nominal, ordinal can become nominal. Hierarchical data order
  • 40.
  • 41.
    WHAT IS DATACOLLECTION???  Data Collection is nothing more than planning for and obtaining useful information on key quality characteristics produced by process .  The key issue in data collection is not: How do we collect data? Rather, it is: How do we obtain useful data?
  • 42.
    WHY TO COLLECTDATA?  Data Collection enables a team to formulate and test working assumptions about a process and develop information that will lead to the improvement of the key quality characteristics of the product or service.
  • 43.
     Data Collectionimproves decision-making by helping us focus on objective information about what is happening in the process, rather than subjective opinions.
  • 44.
     Object andscope of the enquiry.  Sources of information.  Quantitative expression.  Techniques of data collection.  Unit of collection. Factors to be Considered Before Collection of Data
  • 45.
    Methods of collecting primary data Direct Personal Investigation (i.e.interview method) Indirect oral investigation (i.e. through enumerators) Investigation through local reporters questionnaire Investigation through mailed questionnaire Investigation through observation
  • 46.
    COLLECTION OF PRIMARYDATA:  The various methods of collecting primary data, particularly in surveys and descriptive researches are: (i) Observation method, Simple or uncontrolled observation Systematic or controlled observation Mass observation (ii) Interview method, (iii) Through questionnaires, (iv) Through schedules, and
  • 47.
     (v) othermethods which include (A) Warranty cards; (B) Distributor audits; (C) Pantry audits; (D) Consumer panels; (e) Using mechanical devices; (F) Through projective techniques; (g) Depth interviews, and (H) Content analysis.
  • 48.
  • 49.
    COLLECTION OF SECONDARYDATA:  Secondary data may either be published data or unpublished data. Usually published data are available in:  (a)various publications of the central, state and local governments;  (b) various publications of foreign governments or of international bodies and their subsidiary organizations;  (c) technical and trade journals;  (d) books, magazines and newspapers;
  • 50.
    e) reports andpublications of various associations connected with business and industry, banks, stock exchanges, etc.; (f) reports prepared by research scholars, universities, economists, etc. in different fields; and (g) public records and statistics, historical documents, and other sources of published information
  • 51.
     The researcher,before using secondary data, must see that they possess following characteristics: 1. Reliability of data: The reliability can be tested by finding out such things about the said data: (a) Who collected the data? (b) What were the sources of data? (c) Were they collected by using proper methods ? (d) At what time were they collected? (e) Was there any bias of the compiler? (f) What level of accuracy was desired? (g) Was it achieved ?
  • 52.
    2. Suitability ofdata: The data that are suitable for one enquiry may not necessarily be found suitable in another enquiry. Hence, if the available data are found to be unsuitable, they should not be used by the researcher.  Similarly, the object, scope and nature of the original enquiry must also be studied.  If the researcher finds differences in these, the data will remain unsuitable for the present enquiry and should not be used.
  • 53.
    3. Adequacy ofdata: If the level of accuracy achieved in data is found inadequate for the purpose of the present enquiry, they will be considered as inadequate and should not be used by the researcher.  The data will also be considered inadequate, if they are related to an area which may be either narrower or wider than the area of the present enquiry.
  • 54.
    Observation method:  Observationbecomes a scientific tool provided it is systematically planned and recorded.  The main advantage of this method is subjective bias is eliminated  It is used in both experimental and non experimental research.  In this method the investigator obtains data by direct observation.
  • 55.
     Subjective biasis eliminated.  Information obtained relates to what is currently happening.  Observation is independent of respondents willingness to answer question.  Useful when respondent are not capable of answering verbal question.  Expensive method  Information is very limited  At times unforeseen factors may interfere with observation task.  Rarely accessible people to observation create obstacles. ADVANTAGES DISADVANTAGES
  • 56.
     In casethe observation is characterized by a careful definition of units to be observed, style of recording the observed information , standardized condition of observation, and the selection of pertinent data of observation --- then these observation is called as structured observation.  But when these observation is to take place without these characteristic to be thought of in advance, the same is termed as unstructured observation.
  • 57.
    PARTICIPANT OBSERVATION NON PARTICIPANTOBSERVATION  If the observer observes by making himself more or less a member of the group he is observing so that he can experience what the member of the group experience, the observation is called as the participant observation.  But when the observer observers as a detached emissary without any attempt on his part to experience through participation what others feel, the observation is called as non participant observation.  When the observer is observing in such a manner that is presence may be unknown to the people he is observing such observation is described as disguised observation
  • 58.
    PARTICIPANT OBSERVATION  Researcheris able to record the natural behavior of the group.  Could gather information which could not be easily obtained if he observes in a disinterested fashion.  The researcher can even verify the truth of the statements.  The observer may lose the objectivity to the extents he participates emotionally.  The problem of observation- control not solved and may narrow down researchers range of experience. merits demerits
  • 59.
    1. UNCONTROLLED OBSERVATION: 2.No attempt is made to use precision instrument 3. Aim is to get a spontaneous picture of life and persons. 4. It supplies naturalness and completeness of behavior allowing sufficient time to observe them. 5. Main pitfall is that of subjective interpretation. 6. It is resorted in case of exploratory research.
  • 60.
     Types ofuncontrolled observation:  A) participant observation  B)non participant observation  C)quasi:- participant observation
  • 61.
    2)SYSTEMATIC OR CONTROLLEDOBSERVATION:  Uses precision or mechanical instrument as it aids to accuracy and standardization.  Such observations supply formalized data upon which generalization can be built with some degree of assurance.  Controlled observation takes place in various experiments carried out in laboratory or under controlled conditions.
  • 62.
    3)Mass observation:  Herethe collective behavior of the people in public places and different situations is observed and recorded.
  • 63.
    INTERVIEW METHOD: Definition ‘An interviewis a purposeful discussion between two or more people’ Kahn and Cannell (1957)  The interview method of collecting data involves presentation of oral-verbal stimuli and reply in terms of oral-verbal responses. This method can be used through personal interviews and, if possible, through telephone interviews.
  • 64.
  • 66.
     Personal interviewmethod requires a person known as the interviewer asking questions generally in a face-to-face contact to the other person or persons. (At times the interviewee may also ask certain questions and the interviewer responds to these, but usually the interviewer initiates the interview and collects the information.)  This sort of interview may be  2 types: 1. Direct personal investigation 2. Indirect oral investigation
  • 67.
     Direct personalinvestigation  The interviewer has to collect information personally from the sources concerned.  This method is particularly suited for intensive investigation.  Indirect oral investigation.  Under this the investigator doesn’t collects the information directly, instead he gets them indirectly through those persons who know the information and who are ready to part away with the information they posses.  This method is used incase where direct contact is not possible.
  • 68.
    FACE TO FACE(IN-PERSON) INTERVIEWS Advantages  There is a high response rate.  Interviewers can make relevant observations on sensible variables.  The researcher can adapt the questions as necessary, clarify doubt and ensure that the responses are properly understood. An interactive process in which trained interviewers visit people in their homes or work to directly collect data from them. Disadvantages  Travel costs for interviewers can be high.  The interviewers do not always visit at times convenient to the interviewee and hence may have to revisit.  High cost to train and recruit interviewers.  Interviewer bias communicated by demean or, tone of voice and questioning style may influence respondents.
  • 69.
    TYPES OR CLASSESOF INTERVIEW STRUCTURED INTERVIEW SEMI-STRUCTURED INTERVIEW UNSTRUCTURED INTERVIEW
  • 70.
    STRUCTURED INTERVIEW  Descriptionand/or Aim of interview: - Normally, structured interviews are done in a face-to-face format or via telephone using a standard set of questions to obtain data that can be aggregated because identical questions have been asked of each participant.  Nature of questioning route: fixed, given order, very standardized  Role of probing: Little or none, perhaps only repeating or clarifying instructions
  • 71.
    SEMI – STRUCTUREDINTERVIEW  Description and/or aim of interview: “More or less open-ended questions are brought to the interview situation in the form of an interview guide” (Flick, 1998 p. 94). The level of depth of understanding that the researcher pursues is used to characterize this type of interview.  Nature of questioning route: flexible, but usually a given set of questions is covered, varying levels of standardization  Role of probing: Get the participant to expand upon their answer, give more details, and add additional perspectives
  • 72.
    UNSTRUCTURED INTERVIEW  Descriptionand/or Aim of interview: unstructured interviews are done in a face-to-face format and some would say you are trying to get participants to share stories. The researcher starts from a position of wanting to be sensitive to how participants construct their views and perspectives of things. Therefore, a goal is to allow the participant’s structure to dominate.  Nature of questioning route: ask questions to get people to talk about constructs/variables of interest to the researcher.  Role of probing: Simply to get the participant of talk about a topic area, normally probing questions are not directed, but rather asked to encourage the participant to keep talking or to get back to the subject of interest.
  • 73.
    OTHER TYPES OFINTERVIEW INCLUDE FOCUSED INTERVIEW , CLINICAL INTERVIEW AND THE NON DIRECTIVE INTERVIEW Focused Interview: Is meant to Focus attention on the given experience of the respondent and its effects. The interviewer has the freedom to decide the manner and sequence in which the questions would be asked and also the freedom to explore reasons and motives. The main task is to confine the respondent to a discussion of issues with which he seeks conversance. It is used generally in the development of hypothesis It constitutes major type of unstructured interview.
  • 74.
     Clinical Interview Concerned with broad underlying feelings or motivation or with the course of individuals life experience.  Non Directive Interview Simply encourage the respondent to talk about the given topic with a bare minimum questioning. The interviewer often acts as a catalyst to a comprehensive expression of the respondent.
  • 75.
    MERITS  More informationand in greater depth.  Interviewer can overcome resistance of respondents.  Greater flexibility.  Observation method can as well be applied to recording verbal answers to various questions.  Personal information can as well be obtained easily.  Samples can be covered completely with repeated visits.
  • 76.
     Interviewer maycatch the informant off guard and thus may secure the most spontaneous reactions.  Language of interview can be adopted to the ability or educational level of the person interviewed.  Interviewer can control which persons will answer the questions.  Interviewer can collect supplementary information.  Desired information can be collected at one point .  Provides accurate data for calculation of various rates and ratio.
  • 77.
    DEMERITS  Very expensive Possibility of bias (interviewer bias refers to the extend to which an answer is altered in meaning by some action or attitude on the part of the interviewer.)  Certain types of respondents such as important officials may not be easily approachable  More time consuming specially when the sample is large and recalls upon the respondents are necessary  Presence of interviewer on the spot may over stimulate the respondent.
  • 78.
    Pre-requisites and basictenets of interviewing:  For successful implementation of the interview method, interviewers should be carefully selected, trained and briefed. They should be honest, sincere, hardworking, impartial and must possess the technical competence and necessary practical experience.  Occasional field checks should be made to ensure that interviewers are neither cheating, nor deviating from instructions given to them for performing their job efficiently.
  • 79.
     The approachshould be friendly, courteous, conversational and unbiased.  Interviewer should not show disapproval or surprise of a respondents answer but he must keep the direction of interview in his own hand, discouraging irrelevant conversation and must make all possible effort to keep respondent on the track  In addition, some provision should also be made in advance so that appropriate action may be taken if some of the selected respondents refuse to cooperate or are not available when an interviewer calls upon them.
  • 80.
    TELEPHONE INTERVIEWS Advantages  Possiblecoverage of wide geographic area.  It is quicker and less expensive than the face-to- face method.  Random digital dialing can be used to make sampling easy.  High response rate possible.  Interviewer can control questioning sequence.  No field staff is required. Disadvantages  Only people with telephones can be interviewed.  High costs involved for long distance calls; may need several call backs.  Respondents can terminate interview by hanging up the phone.  Anonymity is limited. This involves trained interviewers calling persons to collect data.
  • 81.
    Basic steps ininterview: A. ESTABLISHING CONTACT B. STARTING AN INTERVIEW C. SECURING RAPPORT D. RECALL E. PROBE QUESTIONS
  • 82.
    F. ENCOURAGEMENT G. GUIDINGTHE INTERVIEW H. RECORDING I. CLOSING THE INTERVIEW J. REPORT
  • 83.
     QUESTIONNAIRE  Definition: A questionnaire is simply a list of mimeographed or printed questions that is completed by or for a respondent. An interview schedule is a list of more or less structured questions that are read out or verbalized by an interviewer (with or without probing) in interrogating a respondent. The interviewer then records the respondent replies either verbatim for (open-ended questions) or according to prespecified (or even precoded) answers or categories.
  • 84.
     COLLECTION OFDATA THROUGH QUESTIONNAIRES:  This method of data collection is quite popular, particularly in case of big enquiries. It is being adopted by private individuals, research workers, private and public organizations and even by governments.  In this method a questionnaire is sent (usually by post) to the persons concerned with a request to answer the questions and return the questionnaire.  Care needs to be taken to ensure that the questions elicit a useful and unbiased response
  • 85.
    TYPES OF QUESTIONS Open-ended Questions – They are used in qualitative interviews where the respondent is made to explain why certain things is done.  Free Response Questions – They are asked in such a way that the respondent does not limit the scope of his answers or responses.  Multiple Choices – It is the most commonly used type of questioning. It is a list of a number of answers provided for every question.  Scaled Response – The respondents are given a range of categories in which to express their feelings or opinions.  Checklist – This is a form of multiple choice questions from which the respondents chooses one or more response categories.  Ranking Questions – This refers to an opinion question where the respondent is asked to rank comparatively the items listed either in ascending or descending order.  Dichotomous Question – There are only two possible answers to the questions like the Yes – No type.
  • 86.
     Main aspectsof a questionnaire:  A) General form:  B) Question sequence:  C) Question formulation and wording:
  • 87.
    1. General form: The general form of a questionnaire, it can either be structured or unstructured questionnaire.  Structured questionnaires are those questionnaires in which there are definite, concrete and pre-determined questions. The questions are presented with exactly the same wording and in the same order to all respondents.  Resort is taken to this sort of standardization to ensure that all respondents reply to the same set of questions.
  • 88.
     Structured questionnairesmay also have fixed alternative questions in which responses of the informants are limited to the stated alternatives.  Thus a highly structured questionnaire is one in which all questions and answers are specified and comments in the respondent’s own words are held to the minimum.
  • 89.
    2.QUESTION SEQUENCE:  Inorder to make the questionnaire effective and to ensure quality to the replies received, a researcher should pay attention to the question-sequence in preparing the questionnaire.  Questions should proceed in logical sequence moving from easy to more difficult questions.  Question sequence should always go from the general to the more specific.  The answer given to a question is a function not only of specific question but of all previous questions as well.  The opening questions should be such as to arouse human interest.
  • 90.
     A propersequence of questions reduces considerably the chances of individual questions being misunderstood.  The question-sequence must be clear and smoothly-moving, thereby that the relation of one question to another should be readily apparent to the respondent, with questions that are easiest to answer being put in the beginning.
  • 91.
    QUESTIONS TO BEAVOIDED:  Questions that put too great a strain on the memory or intellect of the respondent.  Questions of a personal character.  Technical and vague expressions capable of different interpretations should be avoided.  Questions related to personal wealth etc.
  • 93.
     3. Questionformulation and wording:  Should be simple.  Should be easily understood.  Should be concrete and should conform to the respondent’s way of thinking.  With regard to this aspect of questionnaire, the researcher should note that each question must be very clear for any sort of misunderstanding.  Question should also be impartial in order not to give a biased picture of the true state of affairs. Questions should be constructed with a view to their forming a logical part of a well thought out tabulation plan.
  • 94.
  • 95.
    Open ended questionnaire Closed ended questionnaire Accuracyof response Easier to express complex situations Difficult to investigate complex situations Coverage May pick up anticipated situation Will miss areas not anticipated Size of questionnaire May need fewer lines of text May need many pages of text Subject recall Reduced Enhanced Analysis More complex Simpler
  • 96.
     There shouldbe some control questions in the questionnaire which indicate reliability of the respondent.  There should be provision for indications of uncertainty. .
  • 97.
     Essentials ofa good questionnaire:  To be successful, questionnaire should be comparatively short and simple i.e., the size of the questionnaire should be kept to the minimum. Questions should proceed in logical sequence moving from easy to more difficult questions.  Personal and intimate questions should be left to the end. Technical terms and vague expressions capable of different interpretations should be avoided in a questionnaire.
  • 98.
     Questions maybe dichotomous (yes or no answers), multiple choice (alternative answers listed) or open-ended.  The latter type of questions are often difficult to analyze and hence should be avoided in a questionnaire to the extent possible.  There should be some control questions in the questionnaire which indicate the reliability of the respondent.
  • 99.
    MERITS  Low cost. Free from bias of the interviewer.  Respondents have adequate time to give well thought out answers.  Respondents who are not easily approachable, can also be reached conveniently.  Large samples can be made use of and thus the results can be made more dependable.
  • 100.
    DEMERITS  Low rateof return of duly filled in questionnaires.  Can be used only when the respondents are educated and co-operating.  The control over questionnaire may be lost once it is sent.  There is inbuilt inflexibility because of the difficulty of amending the approach once questionnaires have been dispatched.
  • 101.
     COLLECTION OFDATA THROUGH SCHEDULES:  This method of data collection is very much like the collection of data through questionnaire, with little difference which lies in the fact that schedules (Proforma containing a set of questions) are being filled in by the enumerators who are specially appointed for the purpose.  These enumerators along with schedules, go to respondents, put to them the questions from the Proforma in the order the questions are listed and record the replies in the space meant for the same in the Proforma.
  • 102.
     In certainsituations, schedules may be handed over to respondents and enumerators may help them in recording their answers to various questions in the said schedules.  Enumerators explain the aims and objects of the investigation and also remove the difficulties which any respondent may feel in understanding the implications of a particular question or the definition or concept of difficult terms.
  • 103.
    1. It isfilled by the interviewer & is never mailed to the respondent. 2. To collect data through schedule is relatively more expensive. 3. Non response is generally very low in case of schedules because they are filled by interview. 4. In case of schedule we can identify the respondent 5. It is usually used where the survey is to be conducted of a relatively small geographical area. 6. It is useful to illiterate people. 7. Wording is not in the form of question here. 8. Along with schedules, observation is also possible. 1. It is filled by respondent himself and usually mailed to him 2. To collect data here is relatively cheap & economical. 3. Non response rate is usually high as many people dont respond & return with semi filled questionnaire. 4. in this case it is not always clear as to who replies. 5. Its generally used where the field of enquiry is large & questionnaire can be posted to different places. 6. It is not useful to illiterate people. 7. Wording is in the form of questionnaire. 8. Observation is not possible in this method. SCHEDULE QUESTIONNAIRE
  • 104.
    OTHER METHODS: 1. Warrantycards: Warranty cards are usually postal sized cards which are used by dealers of consumer durables to collect information regarding their products. The information sought is printed in the form of questions on the ‘warranty cards’ which is placed inside the package along with the product with a request to the consumer to fill in the card and post it back to the dealer.
  • 105.
    2. Distributor orstore audits: Distributor or store audits are performed by distributors as well as manufactures through their salesmen at regular intervals.  Distributors get the retail stores audited through salesmen and use such information to estimate market size, market share, seasonal purchasing pattern and so on.  The data are obtained in such audits not by questioning but by observation.
  • 106.
    3. Pantry audits: Pantry audit technique is used to estimate consumption of the basket of goods at the consumer level.  In pantry audit data are recorded from the examination of consumer’s pantry.  The usual objective in a pantry audit is to find out what types of consumers buy certain products and certain brands, the assumption being that the contents of the pantry accurately portray consumer’s preferences.
  • 107.
     4. Consumerpanels:  An extension of the pantry audit approach on a regular basis is known as ‘consumer panel’,  A consumer panel is essentially a sample of consumers who are interviewed repeatedly over a period of time.  Two types: transitory and continuing
  • 108.
    A. Transitory consumerpanel:  It is set up to measure the effect of a particular phenomenon.  Such a panel is conducted before and after basis.  Initial interviews are conducted before the phenomenon takes place to record the attitude of the consumer.  A second set of interviews is carried out afterwards to find out the consequent changes that might have occurred in the consumers attitude.  It is a favorite tool of advertising and social research.
  • 109.
    B. Continuing consumerpanel:  Often set up for an indefinite period with a view to collect data on a particular aspect of consumer behavior over time, generally at periodic intervals or may be meant to serve as a general purpose panel for researchers on a variety of subjects.  Used in the area of consumer expenditure, public opinion etc.
  • 110.
    5. Use ofmechanical devices:  The use of mechanical devices has been widely made to collect information by way of indirect means.  Eye camera, Pupilometric camera Psycho galvanometer, Motion picture camera and Audiometer are the principal devices so far developed and commonly used by modern big business houses, mostly in the developed world for the purpose of collecting the required information.
  • 111.
    Eye camera • Torecord the focus of eyes of a respondent on a specific portion of a sketch or diagram or written material. • Useful in designing advertising material. Psychogalvanometer • Used for measuring the extent of body excitement as a result of visual stimulus. Motion picture cameras • Used to record movements of body of a buyer while deciding to buy a consumer good from a shop or big store. • Packaging and information label will stimulate the buyer to perform certain physical movements.
  • 112.
    Pupillometer • Used torecord dilation of the pupil as a result of visual stimulus. • The extent of dilation show the degree of interest aroused by the stimulus. Audiometers • Used by some TV concerns to find out the type of programmes as well as stations preferred by people. • A device is fitted in the TV instrument itself to record these changes. • Such data is used to find out market share of competing TV stations.
  • 113.
    6. Projective techniques: Projective techniques for the collection of data have been developed by psychologists to use projections of respondents for inferring about underlying motives, urges, or intentions which are such that the respondent either resists to reveal them or is unable to figure out himself.
  • 114.
     In projectivetechniques the respondent in supplying information tends unconsciously to project his own attitudes or feelings on the subject under study. Projective techniques play an important role in motivational researches or in attitude surveys.
  • 115.
     The useof these techniques requires intensive specialized training.  In such techniques, the individual’s responses to the stimulus- situation are not taken at their face value.  The stimuli may arouse many different kinds of reactions. The nature of the stimuli and the way in which they are presented under these techniques do not clearly indicate the way in which the response is to be interpreted.
  • 116.
     The stimulusmay be a photograph, a picture, an inkblot and so on. Responses to these stimuli are interpreted as indicating the individual’s own view, his personality structure, his needs, tensions, etc. in the context of some pre-established psychological conceptualization of what the individual’s responses to the stimulus mean.
  • 117.
    Word association tests Sentencecompletion tests Story completion tests Verbal projection tests Pictorial techniques Play techniques Quizzes, tests and examinations Sociometry
  • 118.
    PICTORIAL TECHNIQUES Thematic appreciation test Askedto describe set of pictures dealing with day to day situations. Draw inferences about personality, attitudes.
  • 119.
    Rosenweig test Uses acartoon format where in a series of cartoons with words inserted in balloons are present. Asked to put in his own words in an empty balloon space provided for the picture.
  • 120.
     Consists of10 cards having inkblots. Design happens to be symmetrical but meaningless.  Asked to describe what they perceive in those symmetries. Rorschach test
  • 121.
    Holtzman inkblot test Consistsof 45 inkblot cards based on color, movement, shading and other factors.
  • 122.
     Designed forgroup administration. consists of 25 plates each containing 3 sketches that may be arranged in different ways to portray sequence of events. Tomkins horn picture arrangement test
  • 123.
    7. Depth interviews: Depth interviews are those interviews that are designed to discover underlying motives and desires and are often used in motivational research.  Such interviews are held to explore needs, desires and feelings of respondents. In other words, they aim to elicit unconscious as also other types of material relating especially to personality dynamics and motivations.
  • 124.
    8. Content-analysis: Content-analysis consistsof analyzing the contents of documentary materials such as books, magazines, newspapers and the contents of all other verbal materials which can be either spoken or printed.
  • 125.
     VARIABLE:  Acharacteristic ,which may take on different values, that is which may vary in different persons, places or things is called variable.  It is any characteristic of an object that can be measured or categorized.  If a variable can assume a number of different values such that any particular value is obtained purely by chance , it is called a random variable.
  • 126.
  • 127.
     DEPENDENT VARIABLE:  It is the outcome of interest, which should change in response to some intervention.  also known as a criterion variable, it is a variable or construct the researcher hopes to understand, explain and/or predict.  INDEPENDENT VARIABLE:  It is the intervention , or what is being manipulated.  also called a predictor variable, it is a variable or construct that influences or explains the dependent variable either in a positive or negative way.
  • 128.
    • Moderator variable= a variable that has an effect on the independent – dependent variable relationship. The presence of a moderator variable modifies the original relationship between the independent and dependent variables by interacting with the independent variable to influence the strength of the relationship with the dependent variable. • Mediating variable = also known as an intervening variable, it is a variable that surfaces as a function of the independent variable and explains the relationship between the dependent and independent variables. Moderator variables specify when certain effects will occur whereas mediators speak to how or why such effects occur. Moreover, mediators explain how external events take on internal psychological significance.
  • 129.
    Price Purchase Likelihood Price Purchase Likelihood Independent Dependent Variable Variable IndependentDependent Variable Variable • Discount Level • Restrictions Moderator Variable
  • 130.
    Price Purchase Likelihood Independent Dependent Variable Variable Perceived Value MediatorVariable (full mediation) Price Perceived Value Purchase Likelihood Mediator Variable (partial mediation)
  • 131.
     More generally,if we say one variable changes in response to the other , we say that dependent variable is the one that changes in response to the independent variable.  For example : Tobacco causes oral cancer.
  • 132.
    PRECAUTIONS IN DATACOLLECTION:  1)standardization: Using standard and universally accepted methods and techniques, reduces the problem of comparison of data collection with similar studies. 2)Training:  Training of the personnel involved in data collection ensures uniformity , accuracy and completeness of the data collection.
  • 133.
    3)Pretesting:  It isa mock trial of the exercise of data collection , but on a smaller scale. It avoids ambiguities , inaccuracies and uncertainties. 4)Storage:  If the time gap between data collection and analysis is long then ,some sort of durable storage like registers , cards , folders , punch cards and computers should be used.  Care should be taken to protect the data.
  • 134.
     METHODS OFDEALING WITH NON RESPONDENTS:  Unless non-response is confined to a small proportion of the whole sample , the results cannot claim any general validity. Every effort must be made to reduce non-response to negligible proportions.  Non-response is sometimes more serious in case of postal questionnaires.
  • 135.
     The firststep is to send a follow up letter, but if this does not produce any desired effect , the possibility of using more intensive methods such as telephone calls and personal visits must be considered.  In interviewer surveys , the amount of deliberate nonresponse is usually very small.  Revisits by the investigators may be tried.  Call backs are also required
  • 136.
    CONCLUSION  Each methodof data collection has its uses and none is superior in all situations. For instance, telephone interview method may be considered appropriate (assuming telephone population) if funds are restricted, time is also restricted and the data is to be collected in respect of few items with or without a certain degree of precision.
  • 137.
     In casefunds permit and more information is desired, personal interview method may be said to be relatively better.  In case time is ample, funds are limited and much information is to be gathered with no precision, then mail-questionnaire method can be regarded more reasonable.
  • 138.
     Thus, themost desirable approach with regard to the selection of the method depends on the nature of the particular problem and on the time and resources (money and personnel) available along with the desired degree of accuracy.  But, over and above all this, much depends upon the ability and experience of the researcher.
  • 139.
    REFERENCES:  C.R.Kothari, Methodsof data collection, Research methodology methods and techniques ,2nd edition,112-130  B.K. Mahajan ,Sources and Presentation of Data, Methods in Biostatistics, 6th edition,10-13  K. Park, Medicine and Social Sciences, Park’s text book of Preventive and Social Medicine. 21th edition,643-644  JP Baride, Types of data ,Data collection , Manual of Biostatistics,1st edition,4-9
  • 140.
    P.S.S.Sundar Rao, Introductionto Research Methods, Introduction to Biostatistics & Research Methods,4th edition,182-187 T.Bhaskara Rao, Methods of Data Collection, Methods in Medical Research,1st edition,131-152 Biostatistics: a quick guide to the use and choice of graphs and charts
  • 144.
     There areseveral methods of presenting data- tables, charts, diagrams, graphs, pictures and special row.
  • 145.
    PRESENTATION OF DATA TabularGraphical Simple table complex table For quantitative data For qualitative data 1. Histogram 1. Bar chart 2. Frequency polygon 2. Pictogram 3. Frequency curve 3. Pie chart 4. Line chart 4. Map diagram 5. Normal distribution curve 6. Cumulative distribution curve 7.Scatter diagram
  • 146.
     Def :“A table is a systematic arrangement of data into vertical columns and horizontal rows” Tabulation : The process of arranging data into rows and columns is called tabulation .
  • 147.
     Tabulation isthe first step before the data is used for analysis or interpretation.  A table can be simple or complex ,there are certain general principles which should be borne in mind in designing tables: (a) the tables should be numbered e.g., table 1, table 2, etc. (b) a title must be given to each table . The title must be brief and self explanatory.  (c) The headings of columns or rows should be clear and concise,  (d) the data must be presented according to size or importance; chronologically
  • 148.
    STATISTICAL TABLE Statistical tablehas four parts  The title  The stub  The box head  The body  in addition some tables have  Prefatory note  Foot note  Source note
  • 149.
    (i) Table Number: Eachtable must be given a number. (ii) Title of the Table: It should be short & clear. It is either placed just below the table number or at its right. (iii) Caption: Caption refers to the headings of columns. (iv) Stub: Stub refers to the headings of rows. (v) Body This is the most important part of a table. It contains a number of cells. Cells are formed due to the intersection of rows and column. Data are entered in these cells.
  • 150.
    vi) Head Note: Thehead-note (or prefactory note) contains the unit of measurement of data. It is usually placed just below the title or at the right hand top corner of the table. (vii) Foot Note A foot note is given at the bottom of a table. It helps in clarifying the point which is not clear in the table. A foot note may be keyed to the title or to any column or to any row heading. It is identified by symbols such as *,+,@,£ etc. Source Note: The source note shows the source of the data presented in the table. Reliability and accuracy of data can be tested to some extent from the source note. It shows the name of the author, title, volume, page, publisher’s name, year and place of publication of the book or journal from which data are complied.
  • 151.
    TYPES OF TABLES: ONTHE BASIS OF THE NUMBER OF CHARACTERISTICS, TABLES MAY BE CLASSIFIED AS FOLLOWS: Simple or one-way Table Two-way Table Manifold Table
  • 152.
    Simple or one-wayTable: A simple or one-way table is the simplest table which contains data of one characteristic only. A simple table is easy to construct and simple to follow.
  • 153.
     Two-way Table:A table, which contains data on two characteristics, is called a two way table. In such case, therefore, either stub or caption is divided into two co- ordinate parts.
  • 154.
     Manifold table:A table, which has more than two characteristics of data is considered as a manifold table.  Manifold tables, though complex are good in practice as these enable full information.  Not more than four characteristics should be represented in one table to avoid confusion.
  • 155.
    FREQUENCY DISTRIBUTION TABLE In the frequency distribution table, the data is first split up into convenient groups (class interval) and the number of items (frequency) which occur in each group is shown in adjacent columns.  Hence it is a table showing the frequency with which the values are distributed in different groups or classes with some defined characteristics.
  • 156.
    RULES FOR CONSTRUCTIONOF FREQUENCY TABLE 1) The class interval should not be too large or too small 2) The number of classes to be formed more than 8 and less than 15 3) The class interval should be equal and uniform through out the classification. 4) After construction of table, proper and clear heading should be given to it 5) The base or source of data should be mentioned with the pattern of analysis in footnote at the end of table
  • 157.
    To present rawdata, discrete or continuous in the form of a frequency distribution, we must divide the range of the measurements in the data into a number of non-overlapping intervals (or classes). The intervals need not have the same width, but typically they are constructed to have equal width this will make it easier to make comparisons among different classes. So how many intervals we should have?
  • 158.
    Some authors suggestthat there should be 10-20 intervals Let n denote the total number of measurements or data points. The number of intervals =√n Since √90=9.49, for the bacterial colony data, we will need about 9 or 10 intervals to construct a frequency distribution
  • 160.
    Use of zeros: Zeroshould not be used in a table. When no case have been found to exist or when the value of an item is zero, this is indicated by means of dots (…..) or short dashes (-----). Raw data: Def. “collected data which have not been organized numerically are called raw data”.
  • 161.
    “An arrangement ofraw numerical data in ascending or descending order of magnitude is called an Array”
  • 163.
     RELATIVE FREQUENCYAND CUMULATIVE FREQUENCY  To facilitate the interpretation of a frequency distribution, it is often helpful to express the frequency for each interval as a proportion or a percentage of the total number of observations.  A relative frequency distribution shows the proportion of the total number of measurements associated with each interval.  Absolute frequency for a particular interval Total number of measurements. Relative frequencies are useful for comparing different sets of data containing an unequal number of observations.
  • 164.
     The cumulativerelative frequency for an interval is the proportion of the total number of measurements that have a value less than the upper limit of the interval.  The cumulative relative frequency is computed by adding all the previous relative frequencies and the relative frequency for the specified interval.  The cumulative relative frequency is also useful for comparing different sets of data with an unequal number of observations.
  • 165.
     The relativefrequency for the class (162.5, 212.5) is 19/90≈ 0.21, or 19/90 × 100% 21.0%.  For example, the cumulative relative frequency for the interval (262.5, 312.5) is the sum, 0.02+ 0.21 + 0.06 + 0.30 = 0.59, or 59%. This means that 59% of the total number of measurements is less than 312.5
  • 166.
    CLASS LIMITS ANDCLASS BOUNDARIES: Def: “each class is defined by two numbers, these numbers are called class limits. The smaller number is called lower class limit and larger number is called upper class limit” Example: 45 and 49 Lower class limit Upper class limit  As measurements are seldom are exact so, 45kg is interpreted as (weight lying between 44.5kg & 45.5kg) Similarly, 49kg is interpreted as (weight lying between 48.5kg & 49.5kg)  The values 44.5 and 49.5 are called true class limits or class boundaries.
  • 167.
    FREQUENCY DISTRIBUTION TABLE Table3 Age distribution of polio patients Age Number of patients O-4 35 5-9 18 10-14 11 15-19 8 20-24 6
  • 168.
    FREQUENCY DISTRIBUTION: DISCRETE DATA Discrete data: possible values are countable Example: An advertiser asks 200 customers how many days per week they read the daily newspaper. Number of days read Frequency 0 44 1 24 2 18 3 16 4 20 5 22 6 26 7 30 Total 200
  • 169.
    RELATIVE FREQUENCY Relative Frequency:What proportion is in each category? Number of days read Frequency Relative Frequency 0 44 .22 1 24 .12 2 18 .09 3 16 .08 4 20 .10 5 22 .11 6 26 .13 7 30 .15 Total 200 1.00 .22 200 44  22% of the people in the sample report that they read the newspaper 0 days per week
  • 170.
    Class Frequency, f 1– 4 4 5 – 8 5 9 – 12 3 13 – 16 4 17 – 20 2 Frequency Distributions A frequency distribution is a table that shows classes or intervals of data with a count of the number in each class. The frequency f means the number of times a certain value of variable is repeated. Frequencies
  • 171.
    Class Frequency, f 1– 4 4 5 – 8 5 9 – 12 3 13 – 16 4 17 – 20 2 Class width The class width is the distance between lower (or upper) limits of consecutive classes. The class width is 3. 4 – 1 = 3 8 – 5 = 3 12 – 9 = 313-16=3
  • 172.
    Constructing a FrequencyDistribution 18 20 21 27 29 20 19 30 32 19 34 19 24 29 18 37 38 22 30 39 32 44 33 46 54 49 18 51 21 21 Example: The following data represents the ages of 30 students in a statistics class. Construct a frequency distribution that has five classes. Ages of Students
  • 173.
    Constructing a FrequencyDistribution Example continued: 250 – 57 342 – 49 434 – 41 826 – 33 1318 – 25 Tally Frequency, fClass 30f  Number of students Ages Check that the sum equals the number in the sample. Ages of Students
  • 174.
    Midpoint The midpoint ofa class is the sum of the lower and upper limits of the class divided by two. The midpoint is sometimes called the class mark. Midpoint = (Lower class limit) + (Upper class limit) 2 Frequency, fClass Midpoint 41 – 4 Midpoint = 1 2 4 5 2  2.5 2.5
  • 175.
    Relative Frequency Class Frequency,f Relative Frequency 1 – 4 4 The relative frequency of a class is the portion or percentage of the data that falls in that class. To find the relative frequency of a class, divide the frequency f by the sample size n. Relative frequency =Class frequency Sample size Relative frequency 8 4 1  0.222 0.222 f n  18f  f n 
  • 176.
    Cumulative Frequency The cumulativefrequency of a class is the sum of the frequency for that class and all the previous classes. 30 28 25 21 13 Total number of students + + + +50 – 57 2 3 4 8 13 42 – 49 34 – 41 26 – 33 18 – 25 Frequency, fClass 30f  CumulativeF requency Ages of Students
  • 177.
  • 178.
     A bargraph is a visual display used to compare the amounts or frequency of occurrence of different characteristics of data.  This type of display allows us to:  compare groups of data, and  to make generalizations about the data quickly.
  • 179.
    BAR GRAPH  Thedata presented is categorical  Data is presented in the form of rectangular bar of equal breadth.  Each bar represent one variant /attribute.  Suitable scale should be indicated and scale starts from zero.  The width of the bar and the gaps between the bars should be equal throughout.  The length of the bar is proportional to the magnitude/ frequency of the variable.  The bars may be vertical or horizontal.
  • 180.
    RULES OF MAKINGSIMPLE BAR CHATS. 1 • Vertical bars are used to represent data classified on quantitative or chronological basis . 2 • Horizontal bars are used to represent data classified on qualitative or geographical basis. 3 • The bar should neither be short and wide nor very long and narrow. 4 • Bars should be separated by spaces which are not less than half the width of a bar and greater than the width of a bar.
  • 181.
    PARTS OF ABAR GRAPH Graph Title--The graph title gives an overview of the information being presented in the graph. The title is given at the top of the graph.  Axes and their labels--Each graph has two axes. The axes labels tell us what information is presented on each axis.  Grouped Data Axis--The grouped data axis is always at the base of the bars. This axis displays the type of data being graphed.  Frequency Data Axis--The frequency axis has a scale that is a measure of the frequency or amounts of the different data groups.  Axes Scale-- Scale is the range of values being presented along the frequency axis.  Bars--The bars are rectangular blocks that can have their base at either vertical axis or horizontal axis (as in this example). Each bar represents the data for one of the data groups.  Key or legend explains any additional information is found on the graph. GRAPH TITLE Grouped Data Axis- KEY Frequency Data Axis- Axes Scale-
  • 182.
    While tables aremore exact in their presentation of data, they do not allow the quick visual view of the data. Bar graphs provide one way to present data so that we can get an overview at a glance
  • 183.
     Represent qualitativedata.  Only one variable also called as one variable bar chart.  Each category of the variable is represented by a bar. Limitation:  Represents only one classification.  Cannot be used for comparison Simple bar diagram
  • 184.
    0 10 20 30 40 50 60 FIRST SECOND THIRDFOURTH NOOFSUBJECTS SHOWING DISTRIBUTION OF SUBJECTS YEAR WISE NO.OF SUBJECTS 204TOTAL 39FOURTH 53THIRD 55SECOND FIRST 57 B. D. S. YEAR DISTRIBUTION OF STUDY BDS SUBJECTS, YEAR WISE SIMPLE BAR DIAGRAM TOTAL NO. OF SUBJECTS
  • 185.
    EXAMPLE 0 200 400600 800 1000 1200 CHINA INDIA INDONESIA JAPAN PAKISTAN 1088 816 175 123 106 POPULATION(MILLION) country population (million)
  • 187.
     It isalso called as grouped bar chart, compound bar chart  Used to display information from tables containing two or three variables.  An example of a grouped bar chart can be demonstrated by the variable “gender”, which has two categories male and female.  Bars within a group are usually joined.  There must be a legend to indicate what categories the bar represent.  Length corresponds to the frequency. Multiple bar diagram
  • 188.
    MULTIPLE BAR CHARTS Also called compound bar charts
  • 189.
     Also calledas stacked bar graph.  Represent qualitative data.  Both, the number of cases in major groups as well as the subgroups simultaneously.  In a stacked bar chart, the bar represents the total number of cases that occurred in a category, the segments in the bar graph represents the frequency of cases within the category.  Each rectangle is divided according to no in the subgroups. Component bar diagram
  • 190.
     Stacked barchart should be used with caution because they are very difficult to interpret.  Except for the bottom category, the categories do not rest on a flat baseline, when category of the variable ends the next begin.  Stacked bar chart are deceptive, so they are often used to exaggerate or hide information.
  • 193.
     Represent qualitativedata.  100% component bar chart is a variant of the stacked bar chart.  All the bars are of the same height and show the variable categories as percentages of the total rather than the actual values.  A set of 100% bar charts can be used instead of multiple pie charts, because its easier to make comparisons between bars than between pies. Proportional bar diagram
  • 195.
    COMPONENT OR PROPORTIONALBAR DIAGRAM 10 15 10 30 80 55 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Poor Community Rich Community Proportion of energy intake obtained from various food stuff by poor and rich community % of energy obtained Fats % of energy obtained Protein % of energy obtained Carbohdrate
  • 196.
    DOT PLOTS  Anothervariant of bar chart that is particularly useful when there are many categories is the dot plot.  Instead of a bar, just a heavy dot is placed where the end of the bar would be.  When there are many labels, smaller dots that extend back to the labeled axis are often used to make the chart easier to read.
  • 197.
    PARETO BAR CHART Bar chart arranged in descending order of height from left to right  This means the categories represented by the tall bars on the left are relatively more significant than those on the right.  Is a special form of vertical graph which help us to determine which problem to be solve in what order
  • 198.
    REVIEW OF CIRCLEGRAPHS PIE CHART  Circle graphs, also called pie charts, are a type of graph used to represent a part to whole relationship.  PROPERTIES OF CIRCLE GRAPHS:  They are circular shaped graphs with the entire circle representing the whole.  The circle is then split into parts, or sectors.  Each sector represents a part of the whole.  Each sector is proportional in size to the amount each sector represents, therefore it is easy to make generalizations and comparisons.
  • 199.
     Expressed inpercentages.  Angle at the centre of the circle is equal to 3600  Class frequency Total observations X 3600
  • 200.
     Graph Title--Agraph title gives an overview of the information displayed in the graph. The title is given at the top of the graph.  Sectors--Each sector represents one part of the whole. The size of each sector represents its fraction of the whole.  Sector Labels--The label of each sector indicates the category of information it refers to, and may also give numeric data (often a percentage) so we know the size of each sector. Graph Title- Sectors- Sector Labels-
  • 201.
     Circle graphs/piecharts should be used sparingly for two reasons.  First, they are best used for displaying statistical information when there are no more than six components only—otherwise, the resulting picture will be too complex to understand  Second, circle graphs/pie charts are not useful when the values of each component are similar because it is difficult to see the differences between slice sizes.
  • 204.
     Impress thefrequency of the occurrence of events to common man such as attacks, deaths, number operated, admitted, discharged, accidents etc in a population.  It is a popular method of presenting to the “man in the street”  Most useful way of representing data to those people who cannot understand.  Small pictures or symbols are used to present the data.  In essence pictogram are a form of bar chart. Pictogram
  • 206.
    PICTURE OF DOCTORTO REPRESENT THE POPULATION PER PHYSICIAN
  • 207.
     To showgeographical distribution of frequencies of characteristic.  Also called as cartogram. Map diagram or spot map
  • 209.
    HISTOGRAM  Used forQuantitative, Continuous, Variables.  It is similar to bar graph except it is used with interval or ratio variables.  It is used to present variables which have no gaps e.g age, weight, height, blood pressure, blood sugar etc.  It consist of a series of blocks. The class intervals are given along horizontal axis and the frequency along the vertical axis.  To draw a histogram (for equal class intervals)  ,class boundaries are marked along the x-axis and frequencies are marked along on y-axis.
  • 210.
    THE FOLLOWING AREA FEW GENERAL COMMENTS ABOUT HISTOGRAMS:  Histograms serve as a quick and easy check o the shape of a distribution of the data.  The construction of the graphs is subjective.  The shape of the histograms depends on the width and the number of class intervals.  Histograms could be misleading.  Histograms display grouped data. Individual measurements are not shown in the graphs.  Histograms can adequately handle data sets that are widely dispersed.
  • 212.
     A frequencypolygon is a short hand way of presenting histogram by putting a dot at the centre of the top of each bar and connecting these dot with a line.  In this way a graph called frequency polygon is created.  Shape of the distribution is easily seen in frequency polygon than in histogram
  • 213.
    FREQUENCY POLYGON:  Afrequency polygon is a many sided closed figure. it is constructed by plotting the class marks(mid-points) and then joining the resulting points by means of straight lines  A frequency polygon can also be obtained by joining the mid points of the tops of rectangles in the histogram
  • 214.
    Frequency polygon forfrequency distribution of weight of 120 students
  • 215.
  • 216.
    RELATIVE FREQUENCY HISTOGRAMAND RELATIVE FREQUENCY POLYGON  Graphic representation of relative frequency distribution can be obtained from the histogram or frequency polygon simply by changing the y-axis from frequency to relative frequency on a graph  The resulting graphs are called relative frequency polygon or percentage frequency polygon respectively
  • 217.
    CUMULATIVE FREQUENCY POLYGONOR OGIVE  A graph showing the cumulative frequencies plotted against the upper class boundaries is called a cumulative frequency polygon or an ogive  Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value.  If we use relative cumulative frequencies in place of cumulative frequencies, the resulting graph is called a relative cumulative frequency polygon or percentage ogive
  • 218.
     The graphscorresponding to a “less than” and an “or more” cumulative frequency distributions are called “less than "and “or more” ogives respectively  Less than ogive: Here the cumulative frequencies are plotted against the upper boundary of respective class interval.  Greater than ogive: Here the cumulative frequencies are plotted against the lower boundaries of respective class intervals.
  • 219.
    0 20 40 60 80 100 120 140 Ogive for theless than commulative frequency distribution of weight of 120 students COMMULATIVE FREQUENCY
  • 220.
    FREQUENCY CURVE  Smoothedcurve by joining the lowest and highest points of frequency polygon is frequency curve 0 5 10 15 20 25 30 47 52 57 62 67 72 77 82 87 92 97 noofstudent (frequency) frequency curve for the frequency distribution of weight of 120 students
  • 221.
    COMMON SHAPES OFFREQUENCY CURVE
  • 222.
    COMMON SHAPES OFFREQUENCY CURVE  The frequencies curves arising in practice take on certain characteristics shapes and are generally classified as 1) Symmetrical or bell shaped curve 2) Moderately asymmetrical curve 3) J shaped and reverse j shaped curves 4) U shaped curves 5) Bimodal & multimodal curve
  • 223.
     Symmetrical orbell shaped curve (observations are equidistant from the central maximum)
  • 224.
     Moderately asymmetricalcurve (in these curves , the tail of the curve to one side of the central maximum is longer than that to the other)  Positively skewed distributions have a relatively large number of low scores and a small number of very high scores.  Negatively skewed distributions have relatively large number of high scores and a small number of low scores.
  • 225.
     J shapedand reverse j shaped curve (a j shaped curve starts at a low point on the left hand and goes higher and higher towards extreme right and reverse j shaped curve starts with a high point on the right and goes to the extreme left) J-shaped curve Reverse J-shaped curve
  • 226.
     U shapedcurve (a frequency curve with a low spot on middle and high spots at both curves)
  • 227.
     Bimodel andmultimodel frequency curve (a bimodel curves has 2 maximas while a multimodel frequency curve has more than 2 maximas)
  • 228.
    STEM LEAF PLOTS Thesteps for histogram are; 1.Rank order the data 2.Find the range 3.Choose an appropriate width to yield about 10-20 Intervals 4.Make a new table consisting of intervals, their Midpoints, the count and a cumulative total. 5.Turn this into a histogram. 6.Lose some information along the way, consisting of the exact values. Tukey (1977) devised stem and leaf plot that consisted of only 3 steps
  • 229.
     Stem andleaf plot is a method of organizing data that uses of part of the data as “stem” and part as a “leaves”.  Series of no in a column called “stem”,which is the major part of the observed value  Remaining trailing digits in the row called the “leaves”
  • 231.
    ADANTAGES  An importantadvantage over a histogram is that it provides all information contained in the histogram while preserving the value of the individual observation.  It is quick  It is easy to determine the median and range of the data from the plot.  It can be used to quickly organize a large list of data values.  Outliers, data clusters, or gaps are easily visible.  DISADVANTAGES:  A stem and leaf plot is not very informative for a small set of data.
  • 232.
    THE JOINT DISTRIBUTIONGRAPH 2/23/2016 232
  • 233.
     When westudy the relationship between two variables we refer to the data as bivariate.  One graphical technique we use to show the relationship between variables is called a scatter diagram.  To draw a scatter diagram we need two variables. We scale one variable along the horizontal axis (X-axis) of a graph and the other variable along the vertical axis (Y-axis). 4-233
  • 234.
     The correlationbetween two variables, labeled x and y, can range from nonexistent to strong.  If the value of y increases as x increases, the correlation is positive;  If y decreases as x increases, the correlation is negative.  The graph, however, does not reveal the probability that such a relationship could have occurred by chance  The graph does not provide quantitative information about how strong the association is (although it looks strong to the eye).
  • 235.
    ADDITIONAL NOTES ABOUT SCATTERPLOTS If the relationship is thought to be a causal one, then the independent variable is represented along the x-axis and the dependent variable on the y-axis  A scatterplot can show that there is a positive, negative, constant, or no relationship (correlation) between the variables.  Positive: As the value of one variable increases, so does the other.  Negative: As the value of one variable increases, the other decreases.  Constant: As the value of one variable increases (or decreases), the other remains constant.  No relationship: There is no pattern to the points.
  • 236.
    2/23/2016 236 A scatter plotis often employed to identify potential associations between two variables.
  • 237.
    ADVANTAGES OF SCATTERPLOTS A scatter plot is one of the best ways to determine if two characteristics are related.  A scatterplot may be used when there are multiple trials for the same input variable in an experiment.
  • 238.
    DISADVANTAGES OF SCATTERPLOTS When a scatterplot shows an association between two variables, there is not necessarily a cause and effect relationship.  Both variables could be related to some third variable that explains their variation or there could be some other cause. Alternatively, an apparent association could simply be a result of chance.
  • 239.
    LINE GRAPHS  Linegraph is used to illustrate the relationship between two variables.  Line chart or line graph is a type of chart which displays information as a series of data points called 'markers' connected by straight line segments.  It is similar to a scatter plot except that the measurement points are ordered (typically by their x-axis value) and joined with straight line segments.  Parts of a Graph (line)
  • 240.
     Line chartsshow how a particular data changes at equal intervals of time.  A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically  Use percentages on Y axis when more than one distribution is to be shown.  Works well when  the data is paired (bi-variate);  the data is continuous.
  • 241.
    ADVANTAGES OF LINEGRAPHS  The line graph is especially useful when there are a large number of values to be ploted, that is, when you have a continuous variable with an unlimited number of possible points.  It also allows the presentation of several sets of data on one graph.  A line graph is a way to summarize how two pieces of information are related and how they vary depending on one another.  DISADVANTAGES OF LINE GRAPH  Changing the scale of either axes can dramatically change the visual impression of the graph.
  • 242.
    LINE GRAPH EXAMPLE 65 66 67 68 69 70 71 72 73 74 75 19911992 1993 1994 1995 John'sWeightinKilograms Year John weighed 68 kg in 1991, 70 kg in 1992, 74 kg in 1993, 74 kg in 1994, and 73 kg in 1995.
  • 244.
     If youwant to create a graph that understands the spread of the data and the median???
  • 245.
     The ownerof the dental clinic wants to find out from where his patients are coming from one day he decided to gather data about the distance that people commuted to get his clinic.  Patient reported the following distances;  5 6 12 13 14 15 18 22 50
  • 246.
    REVIEW OF BOXAND WHISKER PLOT
  • 247.
    DECILES AND PERCENTILES Percentiles: Percentilesare intended to divide data into 100 parts. 25th percentile is the Q1, 50th percentile is the Median (Q2) and the 75th percentile of the data is Q3. Deciles: If data is ordered and divided into 10 parts, then cut points are called Deciles
  • 248.
    QUARTILES: quartiles, whichdivide data into approximately four equal parts Q1 is the median of the first half of the ordered observations and Q3 is the median of the second half of the ordered observations. In notations, quartiles of a data is the ((n+1)/4)qth observation of the data, where q is the desired quartile and n is the number of observations of data.
  • 249.
    FIVE NUMBER SUMMARY FiveNumber Summary: The five number summary of a distribution consists of the SMALLEST (MINIMUM) OBSERVATION, THE FIRST QUARTILE (Q1), THE MEDIAN(Q2), THE THIRD QUARTILE, AND THE LARGEST (MAXIMUM) OBSERVATION Written in order from smallest to largest. BOX PLOT: A BOX PLOT IS A GRAPH OF THE FIVE NUMBER SUMMARY.
  • 250.
     The rangeas a measure of variability has a major deficiency because it is sensitive to two extreme values (which two values?).  It is desirable to have a measure of dispersion that is not easily influenced by a few extreme values.  Interquartile range is such a measure.  The sample interquartile range (IQR) is the distance between Q1 and Q3.  IQR = Q3 − Q1.
  • 251.
     An outlieris a data value that is much smaller or much larger than the other values in the data set.  IQR=Q3-Q1  Test for outliers are:  1) find IQR  2) multiply (1.5) IQR  3)subtract Q1- (1.5) IQR  4)add Q3+(1-5)IQR  Any value less than the value in step3 or more than the value in step4 is an outlier.
  • 252.
    OUTLIERS.  Data oftencontain outliers. Outliers can occur due to a variety of reasons: an inaccurate instrument, mishandling of experimental units, measurement errors, or recording errors such as incorrectly typed values or misplacement of a decimal point.  The observations may have been made about a subject who does not meet research criteria; for example, a research study on hypertension may have included a patient with low blood pressure, due to experimenters’ oversight. An outlier can also be a legitimate observation that occurred purely by chance.  If we can explain how and why the outliers occurred, they should be deleted from the data.
  • 253.
     The boxplots are especially useful for comparing two or more sets of data.  It helps us to see the spread of the data .
  • 254.
    The main featuresof a box plot, including outliers or extreme values excluded from the range
  • 255.
     The tablebelow shows the length of time required to splint an avulsed tooth with alveolar fracture for patients 18 years old or younger and for those older than 18 years.
  • 256.
    ADVANTAGES OF BOXAND WHISKER PLOTS  Immediate visuals of a box-and-whisker plot are the center, the spread, and the overall range of distribution.  Box plots are useful for comparing data sets, especially when the data sets are large or when they have different numbers of data elements.
  • 257.
    DISADVANTAGES OF BOXAND WHISKER PLOTS It shows only certain statistics rather than all the data. Since the data elements are not displayed, it is impossible to determine if there are gaps or clusters in the data.
  • 258.
  • 259.
    When the datais normally distributed and quantitative data. For such data, the mean is the most appropriate measure of the population average, and variability is typically represented by the 95% confidence interval (CI). These items of information are shown graphically as an error plot . In some instances, you may see error plots where the line indicating the value of the 95% CI is terminated with horizontal bars. Figure 5. Examples of two forms of error bars, both indicating the 95% confidence
  • 260.
  • 261.
    GROUP DIFFERENCES: 95%CIRULE OF THUMB  Error bars can be informative about group differences, but you have to know what to look for Rule of thumb for 95% CIs:  If the overlap is about half of one one-sided error bar, the difference is significant at ~ p < .05  If the error bars just abut, the difference is significant at ~ p< .01 Cumming & Finch, 2005
  • 262.
    Rule of Thumbfor SEs  when gap is about the size of a one-sided error bar, the difference is significant at p < .05  when the gap is about the size of two one-sided error bars, the difference is significant at about p < .01 GROUP DIFFERENCES: SE RULE OF THUMB Cumming & Finch, 2005
  • 263.
    Different levels ofoverlap in error bars: (a) large amount of overlap, that includes the mean values, (b) small amount of overlap, that does not include the mean values, (c) no overlap
  • 264.
    FOREST PLOT ORBLOBBOGRAM  Graph that displays information about the studies contributing to Meta analysis, along with the information about the synthesis of those studies.  The graph is so called because of forest of lines in the graph.  The graph is also called as blobbogram because of blobs are used to represent the weight given to each study.
  • 269.
  • 274.
    DIAMOND IN METAANALYSIS Diamond on Left of the line of no effect Less episodes of outcome of interest in treatment group Diamond on Right of the line of no effect MoRe episodes of outcome in treatment group Diamond touches the line of no effect No statistically significant difference between groups Diamond does not touch the line of no effect Difference between two groups statistically significant
  • 276.
    277 References: 1. C.R.Kothari; ResearchMethodology, Methods & Techniques; 2nd ed., Pg 95-103 2. Kim & Dailey; Biostatistics for Oral Health Care; Ch. 1, 2 & 3. 3. J.V.Dixit; Principles & Practice of Biostatistics; Ch. 1, 2 & 3. 4. Rao & Murthy; Applied statistics in health sciences; 2nd ed., Ch. 1, 4, 5, 6 & 7. 5. B.K Mahajan; Methods in Biostatistics, Jaypee Publications,6th edition, pg 88-103. 6. Park. Textbook of social and preventive medicine, 22nd ed. 7. Cumming G ,Fidler F, and Vaux DL;Error bars in experimental biology;JCB ;volume 177;2007 8 C.R.Kothari, Methods of data collection, Research methodology methods and techniques ,2nd edition,112-130
  • 277.
    9 B.K. Mahajan,Sources and Presentation of Data, Methods in Biostatistics, 6th edition,10-13 10 K. Park, Medicine and Social Sciences, Park’s text book of Preventive and Social Medicine. 21th edition,643-644 11 JP Baride, Types of data ,Data collection , Manual of Biostatistics,1st edition,4-9 12. Masson, M. E. J., & Loftus, G. R. (2003). Using confidence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology, 57(3), 203-220.
  • 278.