This document provides an overview of biostatistics concepts. It defines biostatistics as the application of statistics to biological and medical topics. Biostatisticians play roles in designing studies, analyzing data, and interpreting results. They apply statistical methods to address questions in public health, medicine, and environmental biology. The document outlines different types of variables, such as categorical, ordinal, interval and ratio variables. It also distinguishes between populations and samples, and between random and non-random sampling. Finally, it discusses different levels of measurement and categories of data in biostatistics.
This module has prepared for the postgraduate medical students in any specialty. Last 10 questions are MCQ which is very important for FCPS part 1 (all subjects)
This presentation is about Basic Statistics-related to types of Data-Qualitative and Quantitative, and its Examples in everyday life- By: Dr. Farhana Shaheen
This module has prepared for the postgraduate medical students in any specialty. Last 10 questions are MCQ which is very important for FCPS part 1 (all subjects)
This presentation is about Basic Statistics-related to types of Data-Qualitative and Quantitative, and its Examples in everyday life- By: Dr. Farhana Shaheen
Study designs, Epidemiological study design, Types of studiesDr Lipilekha Patnaik
Study design, Epidemiological study designA study design is a specific plan or protocol
for conducting the study, which allows the investigator to translate the conceptual hypothesis into an operational one.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Statistics is a scientific study of numerical data based on natural phenomena.
It is also the science of collecting, organizing, interpreting and reporting data.
Study designs, Epidemiological study design, Types of studiesDr Lipilekha Patnaik
Study design, Epidemiological study designA study design is a specific plan or protocol
for conducting the study, which allows the investigator to translate the conceptual hypothesis into an operational one.
The ppt gives an idea about basic concept of Estimation. point and interval. Properties of good estimate is also covered. Confidence interval for single means, difference between two means, proportion and difference of two proportion for different sample sizes are included along with case studies.
Statistics is a scientific study of numerical data based on natural phenomena.
It is also the science of collecting, organizing, interpreting and reporting data.
A modest introduction to statistics from a clinician.
With great thanks to Professor Dr. Ahmed Shokeir, Head of Urology and Nephrology center, Mansoura University, Egypt
Forensic toxicology is one of the science that regard the investigation of the toxicological criminal issues . this science have many discipline like death investigation, doping control etc,......`
These disciplines are much more applicable in the toxicology .
BOTANY IS THE SCIENCE THAT DEAL WITH THE LIVING THING (PLANT), IN THE PHARMACY , THIS SCIENCES IS SO SPECIAL AS IT IS HELPFUL IN IDENTIFICATION OF PLANT WITH PHARMACOLOGICAL EFFECT TO HUMAN , APPLIED IN TRADTIONAL MEDICINES
QUESTIONS ABOUT NUTRITION AND FOOD QUALITY CONTROLMINANI Theobald
From the industries there manufacturing of the food , but many factor makes our food no safe , that is why quality control is applied in our diet science for further progressive maintaince of our health , this exercise contain food chemistry , food microbiology , food law etc,,,
Advanced pharmaceutical care and anti microbial resistanceMINANI Theobald
microbial resistance is one of the among challenging problem in the word that is the reasons why we have to apply antimicrobial resistance (antibacterial , antiviral and other parasite resistance). this will achieved via providing good pharmaceutical care and handling well anti-microbe drugs .
all health care providers and patients globally need to care about the special issues of microbe resistance resistance by proper and necessary of of drug, controlling well infection,. this will involve avoiding the microbe transmitting resistant strain between them and phenotypically changing their structures further affecting target site of drug and permeabilty
Difference between Nuclear Medicines and others imaging modalitiesMINANI Theobald
Nuclear medicines is a branch of medicines deal with the diagnosis and identification of disease. it is better than other medicine because it is more specifics to a given organ , tissues or cells. that is the reason why seems to provoque less harm and is better expressing the reality of disease
SOME LEARNING GUIDELINE QUESTIONS OF MEDICAL SEMIOLOGY MULTIPLE CHOICE QUESTIONS
PHARMACY III STUDENT , UNIVERSITY OF RWANDA -COLLEGE OF MEDICINE AND HEALTH SCIENCES - SCHOOL OF MEDICINE AND PHARMACY -YEAR 2017-2018
STERILISATION AND DISINFECTION QUESTIONS WITH ANSWERSMINANI Theobald
STERILISATION AND DISINFECRION IS MOST IMPORTANT IN PHARMACY FIELD WHILE SOME MEDICINES SUCH AS OPHTALMIC , INJECTION NEED TO BE STERILISED MAKING SURE THAT ALL HARMFUL MICROORAGNISM ARE KILLED . SO WHY STERILIZATION AND DISINFECTION IS ALMOST NECESSARY IN PHARMACY FIELD
CLINICAL TOXICOLOGY QUESTIONS PDF.
Clinical toxicolgy dealt with the toxicity of the medicine in the human body once used at the higher dose . beside of clinical toxicology there are forensic toxicology that dealwith environment . toxicology is the wide course that has many field of applications.
Clinical Pharmacology MCQS
PART CHEMOTHERAPY . Chemotherapy are part of clinical pharmacoloy deal with the infections. this learn about the medicine curing viral infection , bacterial infection , and other parasites such as ascaris , trichomonas etc,.....It ie better that this kind ofmedicne are handled carefully and used properly since the misuse of them cause many socialproblemof death increasing due to the resistance of microbe .
Trauma Outpatient Center is a comprehensive facility dedicated to addressing mental health challenges and providing medication-assisted treatment. We offer a diverse range of services aimed at assisting individuals in overcoming addiction, mental health disorders, and related obstacles. Our team consists of seasoned professionals who are both experienced and compassionate, committed to delivering the highest standard of care to our clients. By utilizing evidence-based treatment methods, we strive to help our clients achieve their goals and lead healthier, more fulfilling lives.
Our mission is to provide a safe and supportive environment where our clients can receive the highest quality of care. We are dedicated to assisting our clients in reaching their objectives and improving their overall well-being. We prioritize our clients' needs and individualize treatment plans to ensure they receive tailored care. Our approach is rooted in evidence-based practices proven effective in treating addiction and mental health disorders.
COVID-19 PCR tests remain a critical component of safe and responsible travel in 2024. They ensure compliance with international travel regulations, help detect and control the spread of new variants, protect vulnerable populations, and provide peace of mind. As we continue to navigate the complexities of global travel during the pandemic, PCR testing stands as a key measure to keep everyone safe and healthy. Whether you are planning a business trip, a family vacation, or an international adventure, incorporating PCR testing into your travel plans is a prudent and necessary step. Visit us at https://www.globaltravelclinics.com/
ICH Guidelines for Pharmacovigilance.pdfNEHA GUPTA
The "ICH Guidelines for Pharmacovigilance" PDF provides a comprehensive overview of the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) guidelines related to pharmacovigilance. These guidelines aim to ensure that drugs are safe and effective for patients by monitoring and assessing adverse effects, ensuring proper reporting systems, and improving risk management practices. The document is essential for professionals in the pharmaceutical industry, regulatory authorities, and healthcare providers, offering detailed procedures and standards for pharmacovigilance activities to enhance drug safety and protect public health.
Cold Sores: Causes, Treatments, and Prevention Strategies | The Lifesciences ...The Lifesciences Magazine
Cold Sores, medically known as herpes labialis, are caused by the herpes simplex virus (HSV). HSV-1 is primarily responsible for cold sores, although HSV-2 can also contribute in some cases.
Under Pressure : Kenneth Kruk's StrategyKenneth Kruk
Kenneth Kruk's story of transforming challenges into opportunities by leading successful medical record transitions and bridging scientific knowledge gaps during COVID-19.
International Cancer Survivors Day is celebrated during June, placing the spotlight not only on cancer survivors, but also their caregivers.
CANSA has compiled a list of tips and guidelines of support:
https://cansa.org.za/who-cares-for-cancer-patients-caregivers/
Navigating Challenges: Mental Health, Legislation, and the Prison System in B...Guillermo Rivera
This conference will delve into the intricate intersections between mental health, legal frameworks, and the prison system in Bolivia. It aims to provide a comprehensive overview of the current challenges faced by mental health professionals working within the legislative and correctional landscapes. Topics of discussion will include the prevalence and impact of mental health issues among the incarcerated population, the effectiveness of existing mental health policies and legislation, and potential reforms to enhance the mental health support system within prisons.
Rate Controlled Drug Delivery Systems, Activation Modulated Drug Delivery Systems, Mechanically activated, pH activated, Enzyme activated, Osmotic activated Drug Delivery Systems, Feedback regulated Drug Delivery Systems systems are discussed here.
This document is designed as an introductory to medical students,nursing students,midwives or other healthcare trainees to improve their understanding about how health system in Sri Lanka cares children health.
1. BIOSTATISTIC
S
SESSIONS 1 , LEARNING NOTES
BY STUDENT MT-EINSTEIN , YEAR 2016-
2017
COLLEGE OF MEDICINE AND HEALTH SCIENCES
SCHOOL OF MEDICINES AND PHARMACY
DEPARTMENT OF PHARMACY YEAR 2 2016-
2017
2. WHAT IS BIOSTATISTICS
Etymologically,
biostatistics refers to the application of statistics to a wide
range of topics in biology, including medical sciences.
Specifically, biostatistics is the science which deals with the
development and application of the most appropriate methods for
the:
Collection of data;
Presentation/organization of the collected data in quantitative
form;
Analysis and interpretation of the results;
Interpretation and making decisions on the basis of such analysis.
3. WHY OF BIOSTATISTICS
Role of biostatisticians
GENERAL
To guide the design of an experiment or survey prior to data collection
To analyze data using proper statistical procedures and techniques
To present and interpret the results to researchers and other decision
makers
SPECIFIC
Identify and develop treatments for disease and estimate their
effects.
Identify risk factors for diseases.
Develop statistical methodologies to address questions arising from
medical/public health data
Design, monitor, analyze, interpret, and report results of clinical
studies.
4. AREA OF BIOSTATISTICS APPLICATION
Areas of application of biostatistics
Biostatistics concepts are applied to biological problems, including for:
Public health
Medicine
Ecological and environmental
Biostatisticians must have knowledge of above areas.
5. WHAT IS DATA
Data – definition and source
- It appears that “data”
are the starting point when dealing with the
whole biostatistical operations,
specifically from design of an experiment up
to interpretation of results of clinical studies
though accurate data analysis process(es).
There is a need to first of all know what a data
is, how and where it is obtained.
6. DATA VS VARIABLE
Data: measurements or observations of a
variable made from a sample of a given
population.
Variable: the characteristic or
phenomenon that can be measured or
classified is called a variable.
7. TYPE EXAMPLE
Example: class survey
Students in an introductory statistics course were asked the following
questions as part of a class survey:
1 What is your gender, male or female?
2 Are you introverted or extraverted?
3 On average, how many hours of sleep do you get per night?
4 What is your bedtime: 8pm-10pm, 10pm-12am, 12am-2am, later
than 2am?
5 How many countries have you visited?
6 On a scale of 1 (very little) - 5 (a lot), how much do you dread
this semester?
9. TYPE OF VARIABLE
Gender,contry,personality : categorical=categorical=dicothomous
no inherent order between male and female, therefore gender is
not ordinal
sleep: numerical, continuous
even though data is reported as whole numbers, sleep is measured
on a continuous scale, people just tend to round their responses in
surveys
bedtime: categorical, ordinal
there is an inherent ordering in these time intervals
10. TYPE OF VARIABLE
countries: numerical, discrete
data are counted, and can only take on whole numbers
dread: categorical, ordinal, could also be used as numerical
categories have an inherent ordering
Demographic data are recorded as nominal variables.
Categorical variables can be nominal or ordinal.
A nominal variable is assigned (not measured) and could be a demographic
characteristic such as sex or race.
An ordinal variable is a ranking, such as mild, moderate, or severe.
11. POPULATION VS SAMPLE
Population
we may learn something about
the characteristics of the
population (parameters).
Population is parameter
Sample
We learn data from the
population sample since the
whole study population is time
consuming
Sample is statistics
12. POPULATION VS SAMPLE
Statistic: Summary data from a sample.
Examples:
The observed proportion of the sample that responds to treatment;
The observed association between a risk factor and a disease in this
sample.
Parameter: Summary data from a population.
Examples:
The proportion of the population that would respond to a certain
drug
The association between a risk factor and a disease in a population
13. POPULATION VS SAMPLE
Population
A group of individuals
that we would like to
know something about
Sample
A subset of a population
(hopefully
representative)
14. RANDOM VS NON RANDOM SAMPLING
Random samples
Subjects are selected from a population so that
each individual has an equal chance of being selected.
Random samples are representative of the source population.
Non-random samples are not representative.
They may be biased regarding age, severity of the condition,
socioeconomic status etc. …
15. RANDOM VS NON RANDOM
Random samples are rarely utilized in health care
research.
Instead, patients are randomly assigned to treatment and
control groups.
Each person has an equal chance of being assigned to either
of the groups.
@Random assignment is also known as randomization.
16. LEVELS OF MEASUREMENT IN BIOSTATISTICS
Variables in a study are measured on a certain scale of measurement.
Scales or levels of measurement refer to how the properties of numbers
can change with different uses.
@There are 4 levels of variables or scale of measurement, which define
different kinds of variables, hence different kind of data:
Nominal
Ordinal
Interval
Ratio
17. NOMINAL DATA/VARIABLE
Nominal= categorical
*Data that is classified into
categories and
cannot be arranged in any particular order.
Nominal=Categorical=Dichotomous
*E.g. Gender (Male and Female); country of birth (Rwanda, USA,
India...), personality type, yes or no, demographic population.
18. ORDINAL VARIABLE
Ordinal =ranked
Data that is ranked or ordered: 1st, 2nd, 3rd etc..
Used to rank and order the levels of the data or variable being
studied. No particular value is placed between the numbers in the
rating scale.
E.g. Adverse events: ocular problem determination in patient
Mild, moderate, severe, life-threatening, death
Income , the level of income is diff and ordered
Low, medium, high
19. INTERVAL DATA/VARIABLE
Interval
Difference between the numbers on the scale is meaningful and intervals
are equal in size. NO absolute zero. 7
Temperatures on a thermometer: The difference between 60 and 70
is the same as the difference between 90 and 100.
The length of a person or an object
Intervals allow for comparisons between things being measured
20. RATIO VARIABLE/ DATA
Ratio : no absolute 0 point
Scales that do have an absolute zero point
than indicated the absence of the variable
being studied.
E.g. Body weight, height, family size, age,....
21. MEASURE OF CENTRAL TENDENCY
Measure of scale
Nominal
Ordinal
Interval
Ratio
Best measurement
Mode
Median
Symmetrical – Mean
Skewed – Median
Symmetrical – Mean
Skewed – Median
23. COMMENT
Quantitative variables are measured values.
A discrete quantitative variable has a finite number of possible
measurements.
A continuous quantitative variable has an infinite number of possible
measurements within a range, as would be typical for a serum
chemistry test such as glucose.
24. CLINICAL CASE
A 33-year-old woman comes to you complaining of lower
abdominal pain which she has had for the past day. She
left her job as a nurse's aide (her second day on the job)
because the pain was so bad. She says the pain began
after she had fallen off a stepstool while getting a
bedpan off a top shelf. No one saw her fall, but she
convinced her supervisor that she had an industrial
accident and needed medical attention because of blood
in her urine. To prove it, she brings in a urine specimen
25. QUESTIONS 1
How do you correlate the macroscopic and microscopic findings?
The macroscopic appearance is red, but the test for blood is
negative and there are no RBC's microscopically. It is unlikely
to be rhabdomyolysis. This specimen could be factitious.
It would be a simple matter to have the patient produce
another sample (though she might still be carrying the same
bottle of red food coloring with her). Remember that various
drugs can also produce colored urine. Eating fresh beets can
color the urine red temporarily
26. QUESTIONS 2
What do you think is happening?
Although care and concern should be the
immediate response of health care workers to
a patient, and historical findings should be duly
noted, remember that patients may not always
be telling you everything, or telling you
correctly, particularly when compensation is
being sought.
27. QUESTIONS 3
What kind of variables are pH and protein?
These measurements represent a quantitative (measured) variable
that is discrete, with a finite number of possible measurements in
the range of 5 to 8 for pH and from 0 to 4+ for protein.
The other form of quantitative variable is continuous with an infinite
number of possible measurements within a range, as would be
typical for a serum chemistry test such as urea nitrogen or
creatinine.
Categorical variables could be nominal or ordinal. A nominal variable
is assigned (not measured) and could be a demographic
characteristic such as sex or race. An ordinal variable is a ranking,
28. CONTINUOUS VS DISCRETE DATA
continuous
Definition: A set of data is said
to be continuous if the values
belonging to the set can take on
ANY value within a finite or
infinite interval.
Examples: • A person's height
could be any value (within the
range of human heights), not just
certain fixed heights
Discrete
Definition: A set of data is said
to be discrete if the values
belonging to the set are distinct
and separate (unconnected
values
Examples: • The number of
students in a class. • The
number of questions on a
pharmacology test.
29. CONTINUOUS DATA VS DISCRETE
A person's body weight, age ….
• The outdoor temperature (To)
at noon (any value within
possible To ranges
NOTE: Continuous data (CD) is
measured
Function: In the graph of CD,
the points are connected with a
continuous line, since every point
has meaning to the original
problem
NOTE: Discrete data (DD) is
counted
Function: In the graph of DD,
only separate, distinct points are
plotted, and only these points
have meaning to the original
problem
30. MORE TO HEAD IN
Continuous numeric data are of interest in investigations such as:
Average age of patients compared to average age of non-patients
Respiratory rate of those exposed to a chemical vs. respiratory rate
of those who were not exposed
If there are many different discrete values, then discrete data is
often treated as continuous.
Examples: CD4 count, HIV viral load
If there are very few discrete values, then discrete data is often
treated as ordinal.
31. TYPE OF VARIABLE NOT KIND
2type
Variables can be classified as
independent or
dependent
1.independent variable (IV)
is the variable that is manipulated (measured) in an experiment and that
remains unchanged (=“independent”) between conditions being observed in an
experiment.
IV is believed to influence the outcome measure (dependent variable) and is
the “presumed cause.”
e.g. time, age,..
32. TYPE OF VARIABLE
A dependent variable (DV)
is the variable that is dependent on the independent variable(s)
i.e a DV is the variable that is believed to change in the presence of the
independent variable.
It is the “presumed effect.”
The measured variable in an experiment (e.g. plasma concentration) is
referred to as DV.
DV vs IV: plasma concentration and time: Let’s take example of a patient who
has taken a drug in the morning. The plasma concentration of this drug is a DV
since it changes over time during the day after drug intake.
33. TYPE OF VARIABLE
An intervening variable
is the variable that links the independent and dependent variable
A confounding variable is a variable that has many other variables,
or dimensions built into it. Not sure what it contains or measures.
For example: Socio Economic Status (SES)How can we measure
SES? Income, Employment status, etc
34. EXAMPLE OF COFOUNDING VARIABLE
Need to be careful when using confounding variables.
Example
A researcher wants to study the effect of Vitamin C on cancer.
Vitamin C would be the independent variable because it is hypothesized
that it will have an affect on cancer, and cancer would be the dependent
variable because it is the variable that may be influenced by Vitamin C.
37. GRAPHICAL PRESENTATION
1.Graphs drawn using Cartesian coordinates
In graphs, the data can be concisely summarized into:
• Bar graph (or Bar charts) , Histogram , Box Plot , Line graph , Frequency polygon
, Frequency curve , , , Scatter plot
Bar Graphs when presenting Nominal data (No order to horizontal axis)
Histograms when presenting Continuous or ordinal data (these should be on
horizontal axis)
Box Plots when presenting Continuous data
2.pie chart
3.statistical maps
39. WHY IS IT ALWAYS BETTER OF SUMMARIZING UR DATA
It is ALWAYS a good idea to summarize your data (at least for important
variables)
You become familiar with the data and the characteristics of the sample
that you are studying
You can also identify problems with data collection or errors in the data
(data management issues)
Dataset Structure presenting data need data building
Think of data as a rectangular matrix of rows and columns.
Simplest structure.
Rows represent the “experimental unit” NB: Each row is an independent
observation.
Columns represent “variables” measured on the experimental unit
41. MATHEMATICAL PRESENTATION
Data presentation is usually performed through Descriptive statistics.
Some measures that are commonly used to describe a data set are the
following
Measures of Central
Tendency
-mean
-median
-mode
Measures of
Variability
(Dispersion)
-range
-variance
-standard deviation
42. MEASURE OF CENTRAL TENDENCY
Mode : The mode is the most frequently occurring score
Median : divide the score into 2 halves , care about odd and even number
mean is the sum of all the scores divided by the total number of scores =average
distribution of the data is normal, the mean =in middle distribution of the score =median
mean is a good measure of central tendency
It is preferred whenever possible and is the only measure of central tendency that is
used in advanced statistical calculations:
o More reliable and accurate
o Better suited to arithmetic calculations
43. C.T
mean can be misleading because it can be greatly influenced by extreme
scores called the out layer
For example, the average length of stay at a hospital could be greatly
influenced by one patient that stays for 5 years
44. 17-46 C.T
Sometimes the median may yield more information when your
distribution contains outliers, or is skewed (not normally distributed).
What is a median?
45. MEASURE OF THE VARIABILITYRange = MAX-MIN
Used only for Ordinal, Interval, and Ratio scales as the data must be ordered
Example: 2 3 4 6 8 11 24 (Range is 22)
Variance (S2)
- The variance is the extent to which individual scores in a distribution of scores
differ from one another. The larger the variance, the further spread out the data. IS
a measure of how spread out a distribution S
- The variance is the average squared deviation of the observations from their
mean (how the observations ‘vary’ from the mean).
Standard Deviation (SD)
SD=The square root of the variance
SD is a measure of the variability of a set of data in a distribution (most widely
used measure of the dispersion)
SD reflects how the data/observations/scores vary from the mean
49. QUARTILES
Quartiles are the three
values that split the sorted
data into four equal parts.
-Second Quartile (Q2) =
median.
-Lower quartile (Q3) =
median of lower half of
the data
-Upper quartile (Q1) =
median of upper half of
the data
-Need to order the
individuals first (from 1 to
“N” individuals)
-One quarter of the
individuals are in each
50. STANDARD ERROR OF MEAN
A measure of variability among means of samples selected from
certain population.
51. PROBABILITY OF DISTRIBUTION
A probability distribution
is a device for indicating the values that a random variable may have.
There are two categories of random variables:
c. discrete random variables, and
d. continuous random variables.
1.The probability distribution of a discrete random variable
specifies all possible values of a discrete random variable along with their respective
probabilities. Examples can be:
Frequency distribution
Probability distribution (relative frequency distribution)
Cumulative frequency
52. PROBABILITY OF DISTRIBUTION
A continuous random variable can assume any value
within a specified interval of values assumed by the
variable.
In a general case, with a large number of class
intervals, the frequency polygon begins to resemble a
smooth curve.
53. NORMAL DISTRIBUTION=GAUSSIAN DISTRIBUTION
The shape of data
Histograms of frequency distributions
demonstrates better the shape of the
data.
Distributions are often symmetrical
with most scores falling in the middle
and fewer toward the extremes.
Most biological data are symmetrically
distributed and form a normal curve
(also called “bell-shaped curve”). Such
data are said to be normally
distributed.
54. PROPERTIES OF A NORMAL
DISTRIBUTIONThe area under a normal
curve has a normal
distribution
Properties of a normal
distribution are:
It is symmetric about its
mean
The highest point is at its
mean
The mean, median and
mode are all equal.
The total area under the
curve above the x-axis is 1
square unit. Therefore 50%
is to the right of median
and 50% is to the left of
55. PROPERTIES OF A NORMAL
DISTRIBUTION
Perpendiculars of:
± 1s contain
about 68%;
±2 s contain
about 95%;
±3 s contain
about 99.7% of the
area under the
curve
56. WHY WIDE SPREAD IS NOT
IMPORTANT
Spread is important
when comparing 2 or
more group means.
For instance, it is
more difficult to see
a clear distinction
between groups in
the upper example
because the spread is
wider, even though
the means are the
same.
57. STANDARD NORMAL
DISTRIBUTION
A normal distribution is
determined by . This creates a
family of distributions
depending on whatever the
values of m and s are.
- The standard normal
distribution has
mean=0 and standard dev =1.
Standard Z-Scores The
standard z score is obtained
by creating a variable z whose
value is:
58. STANDARD NORMAL DISTRIBUTION
Given the values of m and s we can convert a value of x to a value of z.
A Z-score
is the number of standard deviations above or below the mean.
A Z-score of 1.5 means
that the score is 1.5 standard deviations above the mean;
a Z score of -1.5 means
that the score is 1.5 standard deviations below the mean.
It always has the same meaning in all distributions.
59. DISTORTION OF NORMAL CURVE
Data may not be normally
distributed:
- There may be data that
are outliers that distort
the mean. This is called
skewed distribution
(SKEWNESS).
- Data may be bunched
about the mean in a non-
normal fashion. This is
called kurtotic
distribution (KURTOSIS).
Normal Distribution Graph-Box
Plot:
60. +,-SKEWNESS
Skewness : not distributed symmetrically
around the mean. Consequently:
The mean, median, and mode are not
equal and are in different positions;
Scores (data) are clustered at one end of
the distribution (right or left)
A small number of outliers are located in
the limits of the opposite end
A variable that is positively skewed has
large outliers to the right of the mean, that
is, greater than the mean. In that case, a
positively skewed distribution ‘points’
towards the right.
A negatively skewed variable has large
outliers to the left of the mean; a
negatively
61. +(LEPTO) ,- (PLATY) KURTOSIS
It examines the horizontal
movement of a distribution from
a perfect normal ‘bell shape’.
A variable that is positively
kurtic (has a positive kurtosis) is
lepto-kurtic and is too ‘pointed’
have low standard deviation
value. In this case, the data are
bunched together and give a
tall, think distribution which is
not normal.
A variable that is negatively
kurtic is platy-kurtic and is too
‘flat’. In this case, the data are
spread out and give a low, flat
distribution which is not normal.
62. HOW TO EXAMINE THE NORMAL DISTRIBUTION OF THE
DATA
There are both
graphical and
statistical methods for evaluating normality.
Graphical methods mainly include Histogram, Box-Whisker plot,
Dot plot, the normality plots (=Q-Q and P-P plots), etc… Normality
plots are much used.
Statistical methods include:
o diagnostic tests for skewness and kurtosis between (+ 0r – 0.5
interval is norma)
o Normality Statistical tests
63. WHAT SHOULD BE DONE FOR THE ABNORMAL DISTRIBUTION OF THE
DATA
Transformation is required in order to study the data
parametrically while normality is tested
If not done we conduct a non parametric study for the data
Three common transformations are:
the logarithmic transformation (the commonest),
the square root transformation, and
the inverse transformation. They actually correct for skews &
unequal variances
Notice
Transformation should be justified: it is recommended when
including a non-normally distributed variable in the analysis
will reduce the effectiveness at identifying statistical
relationships, i.e. when this leads to losing power, due to lack
of normal distribution of the variable to be analyzed.
64. TYPE OF THE STATISTICS
There are two types of statistics:
Descriptive Statistics
Inferential Statistics
1.Descriptive statistics
used to summarize, organize, and make sense of a set of data (scores or
observations).
are typically presented graphically, in tabular form (in tables), or as summary
statistics (single values) (descriptive statistics).
-e.g. : Mean, median, mode, frequencies, range, variance, standard deviation,
quartiles, standard error of the mean
also helps when it comes to describe the relationship between variables.
NB: descriptive statistics has been largely discussed in the previous paragraphs.
65. INFERENTIAL STATISTICS
Inferential Statistics are used to draw inferences about a population
from a sample.
Specifically it allows researchers to infer (make inferences) or
generalize observations made with samples to the larger population from
which they were selected.
Population and samples (reminder!):
Population: Group that the researcher wishes to study.
Sample: A group of individuals selected from the population.
Census: Gathering data from all units of a population, no sampling.
Inferential statistics generally require that data come from a random
sample (i.e. Probability sampling/equal chance of being chosen).
66. STATISTICAL SIGNIFICANCE
Significance level
Statistical analyses:
Allow to quantify the degree of relationship between variables
Allow generalization about populations using data from samples (inferential)
Specifically, the goal of statistical analysis is to answer the questions whether there is a
significant effect/association/difference between the variables of interest, and how big it is
(if there is any).
Significance level is the value that is pre-determined used to reject or retain the
hypothesis.
value of 0.05 is used called “p-value” common
Statistically significant findings mean that the probability of obtaining such findings by
chance only is less than 5% (i.e findings would occur no more than 5 out of 100 times by
chance alone).
Therefore, findings would be deemed
statistically significant if they were found to be 0.05 or less (p<0.05)
not statistically significant (insignificant) if they were found to be greater
67. MEASURE OF ASSOCIATION
What if there is an effect?
You need to measure how big the effect is by using a measure of
association like odds ratio, relative risk, absolute risk, attributable risk
etc..
Absolute Risk is the chance that a person will develop a certain disease
over a period of time is like the hazard is toxicology
E.g.: out of 20,000 people, 1600 developed lung cancer over 10 years,
therefore the absolute risk of developing lung cancer is 8%.
Relative Risk (RR) is a measure of association between the presence or
absence of an exposure and the occurrence of an event.
o RR is when we compare one group of people to another to see if there
is an increased risk from being exposed.
68. MEASURE OF ASSOCIATION
o Used in randomized control trials and cohort studies.
o Can't use RR unless looking forward in time.
o RR is the measure of risk for those exposed compared to those
unexposed.
E.g. :
The 20 year risk of lung cancer for smokers is 15%
The 20 year risk of lung cancer among non-smokers is 1%
69. MEASURE OF ASSOCIATION
Odds Ratio (OR) is a way of comparing whether the probability of a certain event
is the same for two groups. Compare event in two grp
Used for cross-sectional studies, case control trials, and retrospective trials is
study done referring to the past event.
o In case control studies you can't estimate the rate of disease among study
subjects because subjects selected according to disease/no disease. So, you can't
take the rate of disease in both populations (in order to calculate RR).
o OR is the comparison between the odds of exposure among cases to the odds of
exposure among controls.
o Odds are same as betting odds. Example: if you have a 1 in 3 chance of winning a
draw, your odds are 1:2.
o To calculate OR, take the odds of exposure (cases)/odds of exposure (controls).
E.g. Smokers are 2.3 times more likely to develop lung cancer than non-smokers.
70. CONFIDENCE INTERVALS
When we measure the size of the effect we use confidence intervals
(CI). A CI is the range* in risk we would expect to see in the population.
CI provide an expected upper and lower limit (=range*) for a statistics
at a specified probability level (usually 95%, and sometime 99%)
The odds ratio we found from our sample (E.g. Smokers are 2.3 times
more likely to develop cancer than non-smokers) is only true for the
sample we are using.
This exact number is only true for the sample we have examined; it
might be slightly different if we used another sample.
For this reason we calculate a confidence interval-which is the range in
risk we would expect to see in this population.
71. C.I
E.g: “a study of the effect of smoking on developing cancer”:
o A 95% confidence interval of 2.1 to 3.4 tells us that while smokers in
our study were 2.3 times more likely to develop cancer, in the general
population, smokers are between 2.1 and 3.4 times more likely to develop
cancer. We are 95% confident that this is true.
Calculating a CI:
For example, a sample mean is an estimate of the population mean.
A CI provides a band within which the population mean is likely to fall:
CI = mean ± (Sm × confidence level) , Sm is standard error dev
72. CI
Example: n = 30, M = 40, s = 8
CI = 40 ± (1.46 × 2.045)
CI = 40 ± 2.99 = 37.01 to 42.99
The value “1.46” came from the following formula:
The value “2.045” (confidence level) came from appropriate tables.
73. POWER
If findings are statistically significant, then conclusions can be
easily drawn, but what if findings are insignificant? Power
is the probability that a test or study will detect a
statistically significant result.
Did the independent variables or treatment have zero effect? If
an effect really occurs, what is the chance that the experiment
will find a "statistically significant" result?
Determining power depends on several factors:
1) Sample size: how big was your sample?
2) Effect size: what size of an effect are you looking for? E.g.
How large of a difference (association, correlation) are you looking
for? What would be the most scientifically interesting?
3) Standard deviation: how scattered was your data?
74. POWER
For example:
a large sample, with a large effect, and a small standard
deviation
would be very likely to have found a statistically significant
finding, if it existed.
A power of 80%-95% is desirable.
One of the best ways to increase the power of your study is to increase
your sample size.
75. STATISTICAL ANALYSES
Statistical analyses are either
parametric and
non-parametric.
Therefore, statistical analyses are performed using
parametric tests =variable in question is from a normal
distribution:
non-parametric tests =do not require any assumption of normal
distribution, are not sensitive
Most non-parametric tests do not require an interval or ratio
level of measurement; can be used with nominal/ordinal level data.
77. INTRODUCTION TO SPSS FOR DATA HANDLING
Data entry in SPSS
Drawing graphs in SPSS
Computing descriptive statistics
Testing for normality assumptions
SPSS (Statistical Package for the Social Sciences) was designed to offer a more
user-friendly data analysis presentation than other statistical software.
It has got different versions over the past years (SPSS, IBM-SPSS, PASW -
Predictive Analytics Software
78. TYPE OF THE DATA
Types of data (reminder):
Nominal , Ranked , Scales (measures :Interval Ratio) , Mixed
types
Text answers (open ended questions)
Nominal (categorical)
− Order is arbitrary when entering data in SPSS
− e.g. Gender, country of birth, personality type, yes or no.
− Use numeric in SPSS and give value labels.
(e.g. 1=Female, 2=Male, 99=Missing)
(e.g. 1=Yes, 2=No, 99=Missing)
(e.g. 1=UK, 2=Ireland, 3=Pakistan, 4=India, 5=other, 99=Missing)
79. Ranks or Ordinal
Data must be ordered, 1st, 2nd, 3rd etc. e.g. status, social class
Use numeric in SPSS with value labels
E.g. 1=Working class, 2=Middle class, 3=Upper class
• E.g. Class of degree, 1=First, 2=Upper second, 3=Lower second, 4=Third,
5=Ordinary, 99=Missing
Measures, scales
− Interval - equal units
− Ratio - equal units, zero on scale
• e.g. Family size, Salary
• Makes sense to say one value is twice another
− Use numeric (or comma, dot or scientific) in SPSS
• NB: numeric if you can manage to use numbers
• E.g. Family size, 1, 2, 3, 4 etc.
• E.g. Salary per year, 25000, 14500, 18650 etc.
80. Mixed type
− Categorised data
− Actually ranked, but used to identify categories or groups
e.g. age groups
= ratio data put into groups
− Use numeric in SPSS and use value labels.
E.g. Age group, 1=Under 15
2=15-34
3=35-54
4=55 or greater
Text answers
− E.g. answers to open-ended questions
− Either enter text as given (Use String in SPSS) or
− Code or classify answers into one of a small number types (Use numeric/nominal in
SPSS)
81. COMPUTING DESCRIPTIVE
STATISTICS
Steps for statistical data analysis
Statistical data analysis is conducted in two steps:
1st step = Descriptive Statistics (to describe the sample) including Testing for
NORMALITY ASSUMPTIONS
2nd step = making inference (Inferential Statistics) (making inferences about the
population using what is observed in the sample).
Association statistics
Comparative statistics
Notice: As an introduction to SPSS for data analysis, we will focus on the first step
(Descriptive statistics); the second step is better covered after or combined with
“Research Methodology” courses/lectures
83. TESTING FOR NORMALITY
ASSUMPTIONS
Evaluating normality
There are both graphical and statistical methods for evaluating normality.
Graphical methods mainly include Histogram, Box-Whisker plot, Dot plot, the
normality plots (=Q-Q and P-P plots), etc… Normality plots are much used (Q-Q
plot is more common).
The assumption of univariate normality can be investigated using Statistical
methods including:
o diagnostic tests for skewness and kurtosis
o Normality Statistical tests
(=Kolmogorov-Smirnov Test and Shapiro-Wilk Test)
84. GRAPHICAL METHOD VS STATISTICAL
METHOD
Statistical tests
Make an objective judgment of normality
sometimes not being sensitive enough at low sample sizes or overly sensitive to
large sample sizes.
As such, some statisticians prefer to use their experience to make a subjective
judgment about the data from plots/graphs.
Graphical interpretation
allowing good judgment to assess normality in situations when statistical tests
might be over or under sensitive
graphical methods do lack objectivity.
Conclusions :In some cases, both methods complement each other (sometimes
you need to rely on statistical methods when graphical methods do not help you
to decide whether your data is normally distributed or not)
85. ASSESSING NORMALITY GRAPHICALLY
Q-Q plot and P-P plot are called probability plots.
Probability plot helps to compare two data sets in terms of
distribution;
one data set being from the data to be analyzed (data you collected
yourself) and another one from reference normally distributed data
(usually shown as a straight solid line) (theoretical normally
distributed data).
If the data is normally distributed, the result would be a straight line
with positive slope like in the figure on right below indicating a good
match for both data distributions.
86. WHY DO WE EVEN NEED Q-Q PLOT OR P-P PLOT?
If we consider plotting non-cumulative distribution of two data sets
against each other then it is called Q-Q plot.
If we consider plotting cumulative distribution of two sets against
each other then it is called P-P plot. Q-Q plot is more common
Difficult to interpret histogram that’s why Q-Q or P-P plots is better
87. BOX-WHISKER PLOT
Usually used as measure of Variability (Dispersion).
Box-Whisker plot shows four equal parts along with
three quartiles:
• Second Quartile (Q2) = median.
• Lower quartile (Q3) = median of lower half of the
data
• Upper quartile (Q1) = median of upper half of the
data
• Need to order the individuals first (from 1 to “N”
individuals)
• One quarter of the individuals are in each inter-
88. ASSESSING NORMALITY STATISTICALLY
Statistical methods include
a) diagnostic tests for skewness and kurtosis
b) Normality Statistical tests :
Kolmogorov-Smirnov Test
Shapiro-Wilk Test
tests for normality follow a rule of thumb
distribution is normal if its skewness and kurtosis have values between –1.0 and
+1.0”.
A perfectly normal distribution will have a skewness statistic of zero.
89. ASSESSING NORMALITY STATISTICALLY
Positive values of the skewness score describe positively skewed
distribution (pointing to large positive scores) and
negative skewness scores are negatively skewed.
A perfectly normal distribution will also have a kurtosis statistic of
zero.
Values above zero (positive kurtosis score) will describe “pointed”
distributions leptokurtosis and
values below zero will make flat platykurtosis (negative skewness)
90. NORMALITY STATISTICAL TESTS
Normality Statistical Tests include
Shapiro-Wilk Test (SW)
Kolmogorov-Smirnov Test (KS).
The KS is for a completely specified distribution (so if you are testing
normality, you must specify the mean and variance; they can't be
estimated from the data).
the SW is for normality, with unspecified mean and variance.
So the SW test is better for testing normality.
The KS test is a good method for comparing the shapes of two
sample distributions.
91. TST
however.
As such, the SW is more appropriate for small sample
sizes (< 50 samples), but can also handle sample sizes
as large as 2000, which makes it the best test for
normality.
How do you ascertain statistically normal distribution of
the data?
If the p-value (see as “Sig.” in the output table) of the
Shapiro-Wilk Test is greater than 0.05 (> 0.05), the
data is significantly normally distributed.
If it is below 0.05 (< 0.05), the data significantly
deviate from a normal distribution.
93. TRANSFORMATION REMINDER
When a variable is not normally distributed, we can create a
transformed variable to achieve normality. After transformation,
normality should be tested.
Then the transformed variable (normally distributed) is analyzed by
parametric methods.
Three common transformations are: the logarithmic transformation
(the commonest), the square root transformation, and the inverse
transformation. They actually correct for skews & unequal variances
Transformation should be justified: it is recommended when
including a non-normally distributed variable in the analysis will
reduce the effectiveness at identifying statistical relationships, i.e.
when this leads to losing power, due to lack of normal distribution of
the variable to be analyzed.
When transformations do not work, we do have the option of
95. I.A Fahrenheit thermometer is an
example of what:
A. Nominal
B. Ordinal
C. Interval
D. Ratio
II.Within 3 standard deviations,
the mean picks up how much of
the scores?
A. 68
B. 78
C. 99
D. 99.7
E. 99.9
III.Classifications of dental
disease is an example of what:
A. Nominal
B. Ordinal
C. Interval
D. Fratio
IV. Has categorical variables and
bars are separate, but equal
distances apart:
A. Bar Graph
B. Histogram
C. Frequency Polygon
96. V. Has continuous variables, bars
touch and you can always find a third
value:
A. Bar Graph
B. Histogram
C. Frequency Polygon
VI. Within 1 standard deviation, the
mean picks up over how many of the
values?
A. 60
B. 62
C. 65
D. 66
E. 68
VII. The degree to which the
independent variable alone brings
about the change in the dependent
variable is what:
A. Internal Validity
B. External Validity
VIII. The students t-test measures what:
A. Test the difference between 2 means
B. Test for differences between 3 or more
means
C. Differences between two frequency
distributions
D. Whether two distributions are
independent or dependent
IX. The Scientific Method is:
A. Qualitatitive Research
B. Quantitative Research
X. As income level declines, tooth decay
increases. This is an example of what:
A. Positive correlation
B. Negative correlation
C. Internal Validity
D. External Validity
97. XI. Randomly selecting a
proportionate amount from
subgroups is an example of
what:
A. Random Sampling
B. Stratified Sampling
C. Systematic Sampling
D. Convenience Sampling
XII. Retrospective and Prospective
are what types of Epidemiological
Studies?
A. Analytical
B. Descriptive
XIII. Descriptive statistics make no
attempt to generalize the
research findings beyond the
immediate sample.
A. True
XIV. Randomly selecting a proportionate
amount from subgroups is an example
of what:
A. Random Sampling
B. Stratified Sampling
C. Systematic Sampling
D. Convenience Sampling
XV. In systematic sampling, every person
has an equal or random chance of being
selected.
A. True
B. False
XVI. A zero correlation coefficient shows:
A. A strong relationship
B. No relationship
What is thw difference between positive
correlation and negative correclation