Biostatistics
Definition of Statistics
• Different authors have defined statistics differently. The best
definition of statistics is given by Croxton and Cowden according to
whom statistics may be defined as the science, which deals
with collection, presentation, analysis and interpretation of
numerical data.
• The science and art of dealing with variation in data through collection,
classification, and analysis in such a way as to obtain reliable results. —(John M.
Last, A Dictionary of Epidemiology )
• Branch of mathematics that deals with the collection, organization, and
analysis of numerical data and with such problems as experiment design and
decision making. —(Microsoft Encarta Premium 2009)
Biostatistics
• Biostatistics may be defined as application of statistical methods
to medical, biological and public health related
problems.
• It is the scientific treatment given to the medical data derived from group
of individuals or patients
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
• The main theory of statistics lies in the term variability.
• There is No two individuals are same. For example, blood pressure of person may vary
from time to time as well as from person to person.
We can also have instrumental variability as well as observers variability.
• Methods of statistical inference provide largely objective means for drawing
conclusions from the data about the issue under study. Medical science is full of
uncertainties and statistics deals with uncertainties. Statistical methods try to quantify
the uncertainties present in medical science.
• It helps the researcher to arrive at a scientific judgment about
a hypothesis. It has been argued that decision making is an
integral part of a physician’s work.
• Frequently, decision making is probability based.
Role of Statistics in
Public Health and Community Medicine
• Statistics finds an extensive use in Public Health and Community Medicine. Statistical
methods are foundations for public health administrators to understand what is
happening to the population under their care at community level as well as individual
level. If reliable information regarding the disease is available, the public health
administrator is in a position to:
• Assess community needs
• Understand socio-economic determinants of health
• Plan experiment in health research
• Analyze their results
• Study diagnosis and prognosis of the disease for taking effective action
• Scientifically test the efficacy of new medicines and methods of treatment.
Basic Concepts
• Homogeneity: All individuals have similar values or belong to same category.
Example: all individuals are Chinese, women, middle age (30~40 years old), work
in a computer factory ---- homogeneity in nationality, gender, age and
occupation.
• Variation: the differences in feature, voice…
• Throw a coin: The mark face may be up or down ---- variation!
• Treat the patients suffering from pneumonia with same antibiotics: A part of
them recovered and others didn’t ---- variation!
• If there is no variation, there is no need for statistics.
• Many examples of variation in medical field: height, weight, pulse, blood
pressure, … …
2. Population and Sample
• Population: The whole collection of individuals that one intends to
study.
• Sample: A representative part of the population.
• Randomization: An important way to make the sample
representative.
limited population and limitless population
• All the cases with hepatitis B collected in a hospital. (limited)
• All the deaths found from the permanent residents in a city.
(limited)
• All the rats for testing the toxicity of a medicine.
(limitless)
• All the patients for testing the effect of a medicine. (limitless)
hypertensive, diabetic, …
Random
By chance!
• Random event: the event may occur or may not occur in one
experiment.
Before one experiment, nobody is sure whether the event occurs or
not.
Example: weather, traffic accident, …
There must be some regulation in a large number of experiments.
3. Probability
• Measure the possibility of occurrence of a random event.
• A : random event
• P(A) : Probability of the random event A
P(A)=1, if an event always occurs.
P(A)=0, if an event never occurs.
Estimation of Probability----Frequency
• Number of observations: n (large enough)
Number of occurrences of random event A: m
f(A)  m/n
(Frequency or Relative frequency)
Example: Throw a coin event:
n=100, m (Times of the mark face occurred)=46
m/n=46%, this is the frequency; P(A)=1/2=50%,
this is the Probability.
4. Parameter and Statistic
• Parameter : A measure of population or
A measure of the distribution of population.
Parameter is usually presented by Greek letter.
such as μ,π,σ.
-- Parameters are unknown usually
--Most commonly used parameter are the measures of central tendency like
mean , median and mode
To know the parameter of a population, we need a sample
• Statistic: A measure of sample or A measure of the distribution of sample.
Statistic is usually presented by Latin letter
such as s , p, t.
5. Sampling Error
error :The difference between observed value and true value.
Three kinds of error:
(1) Systematic error (fixed)
(2) Measurement error (random) (Observational error)
(3) Sampling error (random)
• The statistics of different samples from same population: different each other!
• The statistics: different from the parameter!
The sampling error exists in any sampling research.
It can not be avoided but may be estimated.
Scope of Biostatistics
• Biostatistics is used right from designing scientific
experiments through the data analysis. The scope includes
principles of scientific methodology, defining various types
of data and studies, levels of measurements, descriptive
statistics, inferential statistics and hypothesis testing, and
correlation.
Types of Data
1. Numerical Data ( Quantitative Data )
• The variable describe the characteristic of individuals quantitatively
-- Numerical Data
• The data of numerical variable
-- Quantitative Data
2. Categorical Data ( Enumeration Data )
• The variable describe the category of individuals according to a
characteristic of individuals
-- Categorical Data
• The number of individuals in each category
-- Enumeration Data
Special case of categorical data :
Ordinal Data ( rank data )
• There exists order among all possible categories. ( level of
measurement)
-- Ordinal Data
• The data of ordinal variable, which represent the order of
individuals only
-- Rank data
Types of Variables
• Categorical variable: variables than can be put into categories. For example, the category
“Toothpaste Brands” might contain the variables Colgate and Aquafresh.
• Confounding variable: extra variables that have a hidden effect on your experimental
results.
• Continuous variable: a variable with infinite number of values, like “time” or “weight”.
• Control variable: a factor in an experiment which must be held constant. For example, in an
experiment to determine whether light makes plants grow faster, you would have to control
for soil quality and water.
• Dependent variable: the outcome of an experiment. As you change the independent
variable, you watch what happens to the dependent variable.
• Discrete variable: a variable that can only take on a certain number of values. For example,
“number of cars in a parking lot” is discrete because a car park can only hold so many cars.
• Independent variable: a variable that is not affected by anything that you, the researcher,
does. Usually plotted on the x-axis.
• Lurking variable: a “hidden” variable the affects the relationship between the independent and dependent
variables.
• A measurement variable has a number associated with it. It’s an “amount” of something, or a”number” of
something.
• Nominal variable: another name for categorical variable.
• Ordinal variable: similar to a categorical variable, but there is a clear order. For example, income levels of low,
middle, and high could be considered ordinal.
• Qualitative variable: a broad category for any variable that can’t be counted (i.e. has no numerical value).
Nominal and ordinal variables fall under this umbrella term.
• Quantitative variable: A broad category that includes any variable that can be counted, or has a numerical
value associated with it. Examples of variables that fall into this category include discrete variables and ratio
variables.
• Random variables are associated with random processes and give numbers to outcomes of random events.
• A ranked variable is an ordinal variable; a variable where every data point can be put in order (1st, 2nd, 3rd,
etc.).
• Ratio variables: similar to interval variables, but has a meaningful zero.
Level of measurement
• Any measurement we use, belongs to one of four main categories of
measurement.
• It is important to be able to distinguish which category your data belong to
because it will affect the way in which you analyse your results.
• The four categories are of ‘levels of measurement’ and each category gives
us a different amount of information.
1. Nominal level
2. Ordinal level
3. Interval level
4. Ratio level
Quantitative
continuous
Types of variables
Quantitative variables Qualitative variables
Quantitative
descrete
Qualitative
nominal
Qualitative
ordinal
Level of measurement
• (a)Nominal Level: Nominal scale is simply a system of assigning number symbols to events in order to
label them. Nominal data are numerical in name only, because they do not share any of the properties of
the numbers we deal in ordinary arithmetic. The usual example of this is the assignment of numbers of
basketball players in order to identify them. Such numbers cannot be considered to be associated with an
ordered scale for their order is of no consequence; the numbers are just convenient labels for the particular
class of events and as such have no quantitative value. Nominal scales provide convenient ways of keeping
track of people, objects and events. One cannot do much with the numbers involved. Accordingly, we are
restricted to use mode as the measure of central tendency. There is no generally used measure of
dispersion for nominal scales. Nominal scale is the least powerful level of measurement. It indicates no
order or distance relationship and has no arithmetic origin. A nominal scale simply describes differences
between things by assigning them to categories. In spite of all this, nominal scales are still very useful and
are widely used in surveys and other ex-post-facto research when data are being classified by major sub-
groups of the population.
• Example: The rule of correspondence is: If the object in the domain appears to be male, assign to “0” and if female assign to “1”. Similarly, we can
record a person’s marital status as 1, 2, 3 or 4, depending on whether the person is single, married, widowed or divorced. We can as well record “Yes or
No” answers to a question as “0” and “1”. In this artificial or nominal way, categorical data (qualitative or descriptive) can be made into numerical data
and if we thus code the various categories, we refer to the numbers we record as nominal data
• (b) Ordinal scale: The lowest level of the ordered scale that is commonly used
is the ordinal scale. The ordinal scale places events in order, but there is no
attempt to make the intervals of the scale equal in terms of some rule. Rank
orders represent ordinal scales and are frequently used in research relating to
qualitative phenomena. Ordinal scales only permit the ranking of items from
highest to lowest. Ordinal measures have no absolute values, and the real
differences between adjacent ranks may not be equal. Thus, the use of an
ordinal scale implies a statement of ‘greater than’ or ‘less than’ (an equality
statement is also acceptable) without our being able to state how much
greater or less. Since the numbers of this scale have only a rank meaning, the
appropriate measure of central tendency is the median.
• Example: NPRS scale and VAS scale.
• (c) Interval scale: The interval scale of measurement is like ordinal scale, except that it does
assume equal intervals in its measurement. Interval scaled are measures such as percentage in an
exam, temperature, etc. In the case of interval scale, the intervals are adjusted in terms of some
rule that has been established as a basis for making the units equal. The units are equal only in so
far as one accepts the assumptions on which the rule is based. Interval scales can have an
arbitrary zero, but it is not possible to determine for them what may be called an absolute zero or
the unique origin. The primary limitation of the interval scale is the lack of a true zero; it does not
have the capacity to measure the complete absence of a trait or characteristic. Interval scales
provide more powerful measurement than ordinal scales for interval scale also incorporates the
concept of equality of interval. As such more powerful statistical measures can be used with
interval scales. Mean is the appropriate measure of central tendency, while standard deviation is
the most widely used measure of dispersion
• Ratio Scales: Ratio scales have an absolute or true zero of measurement. Ratio
scale represents the actual amounts of variables. Measures of physical
dimensions such as weight, height, distance, etc. are examples. Generally, all
statistical techniques are usable with ratio scales and all manipulations that one
can carry out with real numbers can also be carried out with ratio scale values.
Multiplication and division can be used with this scale but not with other scales
mentioned above. Geometric and harmonic means can be used as measures of
central tendency and coefficients of variation may also be calculated.
• Interval/ratio data are sometimes reffered to as continuous or scale data.
• Example:
• Thus, proceeding from the nominal scale (the least precise type of scale) to ratio scale (the most
precise), relevant information is obtained increasingly. If the nature of the variables permits, the
researcher should use the scale that provides the most precise description. Researchers in
physical sciences have the advantage to describe variables in ratio scale form but the behavioural
sciences are generally limited to describe variables in interval scale form, a less precise type of
measurement.
• Nominal scales give us least information and simply allow our data to be labelled and categorized,
eg. pass/fail, male/female, over 60/under 60, improvement/no improvement.
• Ordinal scales give us a bit more information in that they allow us to put our data into a rank
order , according to the dimension we are interested in, eg : NPRS/Vas scale, most competent to
least competent, heaviest smoker to lightest smoker , full movement to immobility.
• Interval/ratio scales gives us more information, in that they deal with actual numerical scores, e.g.
weight, height , time percentage, pressure, capacity etc. which allow direct mathematical
comparisons to be made. The intervals are assumed to be equal. The interval/ratio levels are
combined to form a single category for the purpose of data analysis.
Parametric Vs Non parametric
• When statistical analyses are applied, the statistics must take into account the
nature of the underlying measurement scale, because there are fundamental
differences in the types of information imparted by the different scales.
• Nominal and ordinal scales must be analyzed using what are called
nonparametric or distribution free statistical methods.
• On the other hand, interval and ratio scales are, if at all possible, to be
analyzed using the typically more powerful parametric statistical methods.
• But, parametric statistics typically require that the interval or ratio variables
have distributions shaped like the bell (normal) curve as well as having some
other assumptions.
Independent Vs Dependent variables
• The ones that are the causal factors, or that you may manipulate are called the independent variables. The
outcomes of the treatments or the responses to changes in the independent variables are called the
dependent variables, because their values presumably depend on what happens to the independent variables.
For example treatments you administer in an experiment constitute levels of the independent variable(s).
• Examples:
• i) In smoking research you might look at number of cigarettes smoked as an independent variable and
incidence of lung cancer as a dependent variable.
• ii) In research on atherosclerosis, you might look at dietary saturated fat or amount of vitamin E
supplementation as independent variables and degree of atherosclerosis as a dependent variable.
• iii)In research on comparative cancer treatments, the cancer treatments form the independent variable(s)
while various measures of progression of the disease would make up the dependent variables.
• iv)If you wanted to look at how aspirin dosages affect the frequency of second heart attacks, the aspirin
dosage would be the independent variable, while the heart attack frequency would be the dependent variable.
Question??
• Suppose that you wanted to look at the incidence of Low Back pain among
welders at the local car factory, you could measure:
(i) How many had experienced low back pain and how many had not
experienced low back pain over the last 2 years.
(ii) Frequency of back pain , using a 5 point scale, by asking the
question: How often have you experienced back pain over the last
two years?
Never-----rarely------Sometimes-------Quiet often------ very often
(iii) Frequency of back pain using absolute number of incidents by asking:
How many times have you experienced back pain over the last two years.
Which type of data they belong to?
• RBC (4.58 106
/mcL)
• Diastolic/systolic blood pressure
(8/12 kPa) or ( 80/100 mmHg)
• Percentage of individuals with blood type A (20%) (A, B, AB, O)
• Protein in urine (++) ( - , ±, +, ++, +++)
• Incidence rate of breast cancer ( 35/100,000)
• How many different levels of measure for variables exist?
No Response
1
2
3
4
• Can the distances between the categories of a nominal variable be measured?
No Response
Yes
No
• Nominal variables name only. Ordinal variables:
No Response
Name only
Order only
Both name and order
• Interval variables:
No Response
Name, order & have equal intervals
Name and order only
Order only
Name only
• Ratio variables have:
No Response
A real 0
Equal intervals
Order
Name
All except "No Response" above
• Interval or ratio variables should not be regrouped into nominal or ordinal measures.
No Response
True
False
• Nominal and ordinal variables require:
No Response
Parametric methods
Nonparametric methods
• Variables you manipulate are:
No Response
Independent variables
Dependent variables
Sources of Error in Measurement
• The researcher must be aware about the sources of error in measurement. The following are the possible sources of error
in measurement.
(a) Respondent: At times the respondent may be reluctant to express strong negative feelings or it is just possible that he may
have very little knowledge but may not admit his ignorance. All this reluctance is likely to result in an interview of ‘guesses.’
Transient factors like fatigue, boredom, anxiety, etc. may limit the ability of the respondent to respond accurately and fully.
(b) Situation: Situational factors may also come in the way of correct measurement. Any condition which places a strain on
interview can have serious effects on the interviewer-respondent rapport. For instance, if someone else is present, he can
distort responses by joining in or merely by being present. If the respondent feels that anonymity is not assured, he may be
reluctant to express certain feelings.
(c) Measurer: The interviewer can distort responses by rewording or reordering questions. His behaviour, style and looks may
encourage or discourage certain replies from respondents. Careless mechanical processing may distort the findings. Errors
may also creep in because of incorrect coding, faulty tabulation and/or statistical calculations, particularly in the data-analysis
stage.
(d) Instrument: Error may arise because of the defective measuring instrument. The use of complex words, beyond the
comprehension of the respondent, ambiguous meanings, poor printing, inadequate space for replies, response choice
omissions, etc. are a few things that make the measuring instrument defective and may result in measurement errors.
Another type of instrument deficiency is the poor sampling of the universe of items of concern.
Basic of Biostatisticsin the field of healthcare research.pptx

Basic of Biostatisticsin the field of healthcare research.pptx

  • 1.
  • 2.
    Definition of Statistics •Different authors have defined statistics differently. The best definition of statistics is given by Croxton and Cowden according to whom statistics may be defined as the science, which deals with collection, presentation, analysis and interpretation of numerical data. • The science and art of dealing with variation in data through collection, classification, and analysis in such a way as to obtain reliable results. —(John M. Last, A Dictionary of Epidemiology ) • Branch of mathematics that deals with the collection, organization, and analysis of numerical data and with such problems as experiment design and decision making. —(Microsoft Encarta Premium 2009)
  • 3.
    Biostatistics • Biostatistics maybe defined as application of statistical methods to medical, biological and public health related problems. • It is the scientific treatment given to the medical data derived from group of individuals or patients Collection of data. Presentation of the collected data. Analysis and interpretation of the results. Making decisions on the basis of such analysis
  • 4.
    • The maintheory of statistics lies in the term variability. • There is No two individuals are same. For example, blood pressure of person may vary from time to time as well as from person to person. We can also have instrumental variability as well as observers variability. • Methods of statistical inference provide largely objective means for drawing conclusions from the data about the issue under study. Medical science is full of uncertainties and statistics deals with uncertainties. Statistical methods try to quantify the uncertainties present in medical science. • It helps the researcher to arrive at a scientific judgment about a hypothesis. It has been argued that decision making is an integral part of a physician’s work. • Frequently, decision making is probability based.
  • 5.
    Role of Statisticsin Public Health and Community Medicine • Statistics finds an extensive use in Public Health and Community Medicine. Statistical methods are foundations for public health administrators to understand what is happening to the population under their care at community level as well as individual level. If reliable information regarding the disease is available, the public health administrator is in a position to: • Assess community needs • Understand socio-economic determinants of health • Plan experiment in health research • Analyze their results • Study diagnosis and prognosis of the disease for taking effective action • Scientifically test the efficacy of new medicines and methods of treatment.
  • 6.
    Basic Concepts • Homogeneity:All individuals have similar values or belong to same category. Example: all individuals are Chinese, women, middle age (30~40 years old), work in a computer factory ---- homogeneity in nationality, gender, age and occupation. • Variation: the differences in feature, voice… • Throw a coin: The mark face may be up or down ---- variation! • Treat the patients suffering from pneumonia with same antibiotics: A part of them recovered and others didn’t ---- variation! • If there is no variation, there is no need for statistics. • Many examples of variation in medical field: height, weight, pulse, blood pressure, … …
  • 7.
    2. Population andSample • Population: The whole collection of individuals that one intends to study. • Sample: A representative part of the population. • Randomization: An important way to make the sample representative.
  • 8.
    limited population andlimitless population • All the cases with hepatitis B collected in a hospital. (limited) • All the deaths found from the permanent residents in a city. (limited) • All the rats for testing the toxicity of a medicine. (limitless) • All the patients for testing the effect of a medicine. (limitless) hypertensive, diabetic, …
  • 9.
    Random By chance! • Randomevent: the event may occur or may not occur in one experiment. Before one experiment, nobody is sure whether the event occurs or not. Example: weather, traffic accident, … There must be some regulation in a large number of experiments.
  • 10.
    3. Probability • Measurethe possibility of occurrence of a random event. • A : random event • P(A) : Probability of the random event A P(A)=1, if an event always occurs. P(A)=0, if an event never occurs.
  • 11.
    Estimation of Probability----Frequency •Number of observations: n (large enough) Number of occurrences of random event A: m f(A)  m/n (Frequency or Relative frequency) Example: Throw a coin event: n=100, m (Times of the mark face occurred)=46 m/n=46%, this is the frequency; P(A)=1/2=50%, this is the Probability.
  • 12.
    4. Parameter andStatistic • Parameter : A measure of population or A measure of the distribution of population. Parameter is usually presented by Greek letter. such as μ,π,σ. -- Parameters are unknown usually --Most commonly used parameter are the measures of central tendency like mean , median and mode To know the parameter of a population, we need a sample • Statistic: A measure of sample or A measure of the distribution of sample. Statistic is usually presented by Latin letter such as s , p, t.
  • 13.
    5. Sampling Error error:The difference between observed value and true value. Three kinds of error: (1) Systematic error (fixed) (2) Measurement error (random) (Observational error) (3) Sampling error (random) • The statistics of different samples from same population: different each other! • The statistics: different from the parameter! The sampling error exists in any sampling research. It can not be avoided but may be estimated.
  • 14.
    Scope of Biostatistics •Biostatistics is used right from designing scientific experiments through the data analysis. The scope includes principles of scientific methodology, defining various types of data and studies, levels of measurements, descriptive statistics, inferential statistics and hypothesis testing, and correlation.
  • 15.
    Types of Data 1.Numerical Data ( Quantitative Data ) • The variable describe the characteristic of individuals quantitatively -- Numerical Data • The data of numerical variable -- Quantitative Data
  • 16.
    2. Categorical Data( Enumeration Data ) • The variable describe the category of individuals according to a characteristic of individuals -- Categorical Data • The number of individuals in each category -- Enumeration Data
  • 17.
    Special case ofcategorical data : Ordinal Data ( rank data ) • There exists order among all possible categories. ( level of measurement) -- Ordinal Data • The data of ordinal variable, which represent the order of individuals only -- Rank data
  • 18.
    Types of Variables •Categorical variable: variables than can be put into categories. For example, the category “Toothpaste Brands” might contain the variables Colgate and Aquafresh. • Confounding variable: extra variables that have a hidden effect on your experimental results. • Continuous variable: a variable with infinite number of values, like “time” or “weight”. • Control variable: a factor in an experiment which must be held constant. For example, in an experiment to determine whether light makes plants grow faster, you would have to control for soil quality and water. • Dependent variable: the outcome of an experiment. As you change the independent variable, you watch what happens to the dependent variable. • Discrete variable: a variable that can only take on a certain number of values. For example, “number of cars in a parking lot” is discrete because a car park can only hold so many cars. • Independent variable: a variable that is not affected by anything that you, the researcher, does. Usually plotted on the x-axis.
  • 19.
    • Lurking variable:a “hidden” variable the affects the relationship between the independent and dependent variables. • A measurement variable has a number associated with it. It’s an “amount” of something, or a”number” of something. • Nominal variable: another name for categorical variable. • Ordinal variable: similar to a categorical variable, but there is a clear order. For example, income levels of low, middle, and high could be considered ordinal. • Qualitative variable: a broad category for any variable that can’t be counted (i.e. has no numerical value). Nominal and ordinal variables fall under this umbrella term. • Quantitative variable: A broad category that includes any variable that can be counted, or has a numerical value associated with it. Examples of variables that fall into this category include discrete variables and ratio variables. • Random variables are associated with random processes and give numbers to outcomes of random events. • A ranked variable is an ordinal variable; a variable where every data point can be put in order (1st, 2nd, 3rd, etc.). • Ratio variables: similar to interval variables, but has a meaningful zero.
  • 20.
    Level of measurement •Any measurement we use, belongs to one of four main categories of measurement. • It is important to be able to distinguish which category your data belong to because it will affect the way in which you analyse your results. • The four categories are of ‘levels of measurement’ and each category gives us a different amount of information. 1. Nominal level 2. Ordinal level 3. Interval level 4. Ratio level
  • 21.
    Quantitative continuous Types of variables Quantitativevariables Qualitative variables Quantitative descrete Qualitative nominal Qualitative ordinal
  • 24.
    Level of measurement •(a)Nominal Level: Nominal scale is simply a system of assigning number symbols to events in order to label them. Nominal data are numerical in name only, because they do not share any of the properties of the numbers we deal in ordinary arithmetic. The usual example of this is the assignment of numbers of basketball players in order to identify them. Such numbers cannot be considered to be associated with an ordered scale for their order is of no consequence; the numbers are just convenient labels for the particular class of events and as such have no quantitative value. Nominal scales provide convenient ways of keeping track of people, objects and events. One cannot do much with the numbers involved. Accordingly, we are restricted to use mode as the measure of central tendency. There is no generally used measure of dispersion for nominal scales. Nominal scale is the least powerful level of measurement. It indicates no order or distance relationship and has no arithmetic origin. A nominal scale simply describes differences between things by assigning them to categories. In spite of all this, nominal scales are still very useful and are widely used in surveys and other ex-post-facto research when data are being classified by major sub- groups of the population. • Example: The rule of correspondence is: If the object in the domain appears to be male, assign to “0” and if female assign to “1”. Similarly, we can record a person’s marital status as 1, 2, 3 or 4, depending on whether the person is single, married, widowed or divorced. We can as well record “Yes or No” answers to a question as “0” and “1”. In this artificial or nominal way, categorical data (qualitative or descriptive) can be made into numerical data and if we thus code the various categories, we refer to the numbers we record as nominal data
  • 25.
    • (b) Ordinalscale: The lowest level of the ordered scale that is commonly used is the ordinal scale. The ordinal scale places events in order, but there is no attempt to make the intervals of the scale equal in terms of some rule. Rank orders represent ordinal scales and are frequently used in research relating to qualitative phenomena. Ordinal scales only permit the ranking of items from highest to lowest. Ordinal measures have no absolute values, and the real differences between adjacent ranks may not be equal. Thus, the use of an ordinal scale implies a statement of ‘greater than’ or ‘less than’ (an equality statement is also acceptable) without our being able to state how much greater or less. Since the numbers of this scale have only a rank meaning, the appropriate measure of central tendency is the median. • Example: NPRS scale and VAS scale.
  • 26.
    • (c) Intervalscale: The interval scale of measurement is like ordinal scale, except that it does assume equal intervals in its measurement. Interval scaled are measures such as percentage in an exam, temperature, etc. In the case of interval scale, the intervals are adjusted in terms of some rule that has been established as a basis for making the units equal. The units are equal only in so far as one accepts the assumptions on which the rule is based. Interval scales can have an arbitrary zero, but it is not possible to determine for them what may be called an absolute zero or the unique origin. The primary limitation of the interval scale is the lack of a true zero; it does not have the capacity to measure the complete absence of a trait or characteristic. Interval scales provide more powerful measurement than ordinal scales for interval scale also incorporates the concept of equality of interval. As such more powerful statistical measures can be used with interval scales. Mean is the appropriate measure of central tendency, while standard deviation is the most widely used measure of dispersion
  • 27.
    • Ratio Scales:Ratio scales have an absolute or true zero of measurement. Ratio scale represents the actual amounts of variables. Measures of physical dimensions such as weight, height, distance, etc. are examples. Generally, all statistical techniques are usable with ratio scales and all manipulations that one can carry out with real numbers can also be carried out with ratio scale values. Multiplication and division can be used with this scale but not with other scales mentioned above. Geometric and harmonic means can be used as measures of central tendency and coefficients of variation may also be calculated. • Interval/ratio data are sometimes reffered to as continuous or scale data. • Example:
  • 28.
    • Thus, proceedingfrom the nominal scale (the least precise type of scale) to ratio scale (the most precise), relevant information is obtained increasingly. If the nature of the variables permits, the researcher should use the scale that provides the most precise description. Researchers in physical sciences have the advantage to describe variables in ratio scale form but the behavioural sciences are generally limited to describe variables in interval scale form, a less precise type of measurement. • Nominal scales give us least information and simply allow our data to be labelled and categorized, eg. pass/fail, male/female, over 60/under 60, improvement/no improvement. • Ordinal scales give us a bit more information in that they allow us to put our data into a rank order , according to the dimension we are interested in, eg : NPRS/Vas scale, most competent to least competent, heaviest smoker to lightest smoker , full movement to immobility. • Interval/ratio scales gives us more information, in that they deal with actual numerical scores, e.g. weight, height , time percentage, pressure, capacity etc. which allow direct mathematical comparisons to be made. The intervals are assumed to be equal. The interval/ratio levels are combined to form a single category for the purpose of data analysis.
  • 29.
    Parametric Vs Nonparametric • When statistical analyses are applied, the statistics must take into account the nature of the underlying measurement scale, because there are fundamental differences in the types of information imparted by the different scales. • Nominal and ordinal scales must be analyzed using what are called nonparametric or distribution free statistical methods. • On the other hand, interval and ratio scales are, if at all possible, to be analyzed using the typically more powerful parametric statistical methods. • But, parametric statistics typically require that the interval or ratio variables have distributions shaped like the bell (normal) curve as well as having some other assumptions.
  • 30.
    Independent Vs Dependentvariables • The ones that are the causal factors, or that you may manipulate are called the independent variables. The outcomes of the treatments or the responses to changes in the independent variables are called the dependent variables, because their values presumably depend on what happens to the independent variables. For example treatments you administer in an experiment constitute levels of the independent variable(s). • Examples: • i) In smoking research you might look at number of cigarettes smoked as an independent variable and incidence of lung cancer as a dependent variable. • ii) In research on atherosclerosis, you might look at dietary saturated fat or amount of vitamin E supplementation as independent variables and degree of atherosclerosis as a dependent variable. • iii)In research on comparative cancer treatments, the cancer treatments form the independent variable(s) while various measures of progression of the disease would make up the dependent variables. • iv)If you wanted to look at how aspirin dosages affect the frequency of second heart attacks, the aspirin dosage would be the independent variable, while the heart attack frequency would be the dependent variable.
  • 31.
    Question?? • Suppose thatyou wanted to look at the incidence of Low Back pain among welders at the local car factory, you could measure: (i) How many had experienced low back pain and how many had not experienced low back pain over the last 2 years. (ii) Frequency of back pain , using a 5 point scale, by asking the question: How often have you experienced back pain over the last two years? Never-----rarely------Sometimes-------Quiet often------ very often (iii) Frequency of back pain using absolute number of incidents by asking: How many times have you experienced back pain over the last two years.
  • 32.
    Which type ofdata they belong to? • RBC (4.58 106 /mcL) • Diastolic/systolic blood pressure (8/12 kPa) or ( 80/100 mmHg) • Percentage of individuals with blood type A (20%) (A, B, AB, O) • Protein in urine (++) ( - , ±, +, ++, +++) • Incidence rate of breast cancer ( 35/100,000)
  • 33.
    • How manydifferent levels of measure for variables exist? No Response 1 2 3 4 • Can the distances between the categories of a nominal variable be measured? No Response Yes No
  • 34.
    • Nominal variablesname only. Ordinal variables: No Response Name only Order only Both name and order • Interval variables: No Response Name, order & have equal intervals Name and order only Order only Name only
  • 35.
    • Ratio variableshave: No Response A real 0 Equal intervals Order Name All except "No Response" above • Interval or ratio variables should not be regrouped into nominal or ordinal measures. No Response True False • Nominal and ordinal variables require: No Response Parametric methods Nonparametric methods
  • 36.
    • Variables youmanipulate are: No Response Independent variables Dependent variables
  • 37.
    Sources of Errorin Measurement • The researcher must be aware about the sources of error in measurement. The following are the possible sources of error in measurement. (a) Respondent: At times the respondent may be reluctant to express strong negative feelings or it is just possible that he may have very little knowledge but may not admit his ignorance. All this reluctance is likely to result in an interview of ‘guesses.’ Transient factors like fatigue, boredom, anxiety, etc. may limit the ability of the respondent to respond accurately and fully. (b) Situation: Situational factors may also come in the way of correct measurement. Any condition which places a strain on interview can have serious effects on the interviewer-respondent rapport. For instance, if someone else is present, he can distort responses by joining in or merely by being present. If the respondent feels that anonymity is not assured, he may be reluctant to express certain feelings. (c) Measurer: The interviewer can distort responses by rewording or reordering questions. His behaviour, style and looks may encourage or discourage certain replies from respondents. Careless mechanical processing may distort the findings. Errors may also creep in because of incorrect coding, faulty tabulation and/or statistical calculations, particularly in the data-analysis stage. (d) Instrument: Error may arise because of the defective measuring instrument. The use of complex words, beyond the comprehension of the respondent, ambiguous meanings, poor printing, inadequate space for replies, response choice omissions, etc. are a few things that make the measuring instrument defective and may result in measurement errors. Another type of instrument deficiency is the poor sampling of the universe of items of concern.