2. Data
• Research is any process by which information is
systematically and carefully gathered for the purpose of
answering questions, examining ideas, or testing
theories.
• Numerical information collected as part of any research
is called Data. Depending on the nature of the problem,
the data may relate to individuals, families, houses,
villages etc…
• The data collected are known as observations. The
individual subjects upon whom the data are collected
are known as statistical units.
3. Variables
• The characteristics or events that are measured on a
subject, in a research study are called variables,
because they vary. (i.e., they take different values in
different subjects or vary from one subject to
another).
• Variables are measured according to two broad
types of measurement scales: Numerical &
Categorical (otherwise known as Quantitative &
Qualitative).
4. Types of dataset and their measure
• Population - dataset consisting of all outcomes,
measurements, or responses of interest.
• Sample - dataset which is a subset of the
population.
• Parameter - a numerical measurement made
using the population.
• Statistic - a numerical measurement made using
a sample.
5. Properties of Measurement
• Difference - Different numerals mean different
instances the variable can take
• Magnitude – This indicates that something is
more or less than the other
• Equal Appearing Interval – Different numerals
have equal distances with preceding & succeeding
numbers
• True Zero – Zero has an absolute meaning
6. Level of Measurement (Measurement
Scale)
• Nominal
• Ordinal
• Interval
• Ratio
The measurement levels are considered in the
following hierarchy:
(LOWEST) Nominal -Ordinal – Interval - Ratio (HIGHEST)
7. Nominal Scale
• Numbers serve as labels.
• Numbers used only for identification and one-
to- one correspondence with the objects
• Only permissible operation is counting
• Statistical analysis based on frequency counts
such as percentage, mode.
Example: gender, religion, locality, party
affiliation etc
8. Ordinal Scale
• Ranking scale, assign numbers to indicate relative
extent to which the object possess some
characteristics
• Can determine whether an object has more or less
some characteristics than other object and not how
much more or less
• Any series of numbers can be given that preserves
the ordered relationship among objects.
• Along with counting operation of nominal scale this
has statistics based on percentiles, quartiles and
median.
Example: social class, severity of a behavior disorder
9. Interval Scale
• Distance between any two objects is fixed and equal
• It allows comparison of difference between two
objects
• Meaningful addition and subtraction of scale values
are possible
• The zero point and the unit of measurement are
arbitrary
• In addition to the statistical techniques applied to
nominal and ordinal data, the arithmetic mean and
standard deviation are used
Example: Temperature (Fahrenheit or Celsius)
10. Ratio Scale
• Possess all the properties of nominal, ordinal and
interval scale
• This has absolute zero point
• It is meaningful to calculate ratio of scale values.
• All statistical techniques can be applied.
Examples: Income, age, weight, height so on
11. Categorical variables
• They can be placed into one of two (dichotomous) or
more (polychotomous) categories.
• Examples of dichotomous categorical variables:
Male / Female Pregnant / Not pregnant
Smoker / Non smoker Married / Single
• However, many classifications require more than two
categories. For e.g., Married / Single / Divorced/
Separated/ Widowed; Blood group: A/ B/ AB/ O;
Religion: Hindu/ Christian/ Muslim etc…. There is no
ordering of the categories.
• These are examples of nominal scale, in which the
values fall into unordered categories or classes.
12. Categorical variables
• But often there is a natural order, as with the
varying stages of cancer and social class.
• Example : degree of smoking can be further
divided as non-smokers/ ex-smokers/ light
smokers/ heavy smokers. This is an example of
ordinal scale.
• In ordinal scales, the categories bear an ordered
relationship to one another.
13. Numerical variables
• Also called quantitative or interval variables. They are
expressed as integers, fractions or decimals, in which
equal distances exist between successive intervals. Age,
systolic & diastolic blood pressure, and height are
examples of continuous variables.
• Numerical variables can be further divided into discrete &
continuous. Discrete numerical variable can take only
intermittent values over a range, they differ by fixed
amount, and no intermediate values are possible.
• Examples of discrete numerical variables are no. of
children, no. of ectopic heart beats etc…
14. Numerical variables
• Data that represent measurable quantities but are
not restricted to taking on specified values such as
integers are known as continuous data.
• If the values of the measurement take any number in
a range, the data are said to be continuous.
• The difference between any two possible data values
can be very small. Common examples include height,
weight, temperature etc…
• Continuous data can be reduced to several
categories.
15. Discrete data -- Gaps between possible values
Continuous data -- no gaps between possible values
Discrete
vs.
Continuous Data
16. Derived Variables
• Used to measure diseases in epidemiological studies.
• Rate, ratio and proportion.
Ratio: quantifies the magnitude of one occurrence or
condition to another.
Expresses the relationship between two numbers
Example: The ratio of males or females in Ethiopia
Proportion: quantifies occurrences in relation to the
population in which these occurrences take place
Expressed as a percentage
Example: The proportion of all births that was male
17. Derived Variables…
• Rate: expresses probability or risk of disease in
a defined population over a specified period
of time.
Considered to be a basic measure of disease
occurrence.
Example: The number of newly diagnosed breast
cancer cases per 100,000 women.
18. Data collection
• There are two sources of data:
• Primary Data
Data measured or collect by the investigator or
the user directly from the source.
Data collected first hand by the investigator.
• Secondary Data
Data gathered or compiled from published and
unpublished sources or files.
19. Planning & Measuring
Planning:
• Identify source and elements of the data.
• Decide whether to consider sample or census.
• If sampling is preferred, decide on sample size,
selection method,… etc
• Decide measurement procedure.
• Set up the necessary organizational structure.
Measuring:
• there are different methods.
20. Methods of collecting primary data
• Survey method
- Investigator makes personal contact with the
informants either directly or indirectly and collect the
data (Telephone Interview, Mail Questionnaires)
- Collected information is more reliable/accurate
• Experimental method
-Determine whether/in what manner variables are
related to each other
- Large scale organizations with R & D departments
doing to determine the cause and effect relationships.
-to study the effect of fertilizer on crop
21. Methods of collecting primary data…
• Observation method
-Investigator observes the overall nature of the event
and collects the required data.
-devices used are automatic recorder, motion picture
etc
-ex: individual doing research on growth of plants,
behavior of bats, keenly observes and finds out the
required information.
-Gives more accurate result and supplementary
information. Costly and time consuming.
22. Secondary data sources
• Official publications of Government
• Publications of research institutions
• Professional bodies
• Economic trade and scientific Journals
23. When the source is secondary data check that:
• The type and objective of the situations.
• The purpose for which the data are collected and
compatible with the present problem.
• The nature and classification of data is appropriate to our
problem.
• There are no biases and misreporting in the published
data.
Note: Data which are primary for one may be secondary for
the other.
24. Descriptive Vs Inferential Statistics
Depending on how data can be used, statistics is
sometimes divided in to two main areas or
branches.
• Descriptive Statistics:
is concerned with summary calculations, graphs, charts
and tables.
Generally characterizes or describes a set of data
elements by graphically displaying the information or
describing its central tendencies and how it is
distributed.
25. • Inferential Statistics:
consists of generalizing from samples to populations,
performing estimations and hypothesis tests, determining
relationships among variables, and making predictions.
Statistical techniques based on probability theory are
required.
• Example: the following is the number of malaria patients who have
been treated in a Hospital from 2001 to 2005: 3645; 4568; 5432; 6751;
7369
If we calculate the average malaria patients from 2001 to 2005, then our
work belongs to the domain of descriptive statistics.
If we predict the number of malaria patients in the year 2015 to be 9917,
then our work belongs to the domain of inferential statistics.