Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Week 2 measures of disease occurence
1. Week 2: Measures of diseases
occurrence and related statistics
Dr. Hamdi Alhakimi
MD, MPH, M- epidemiology
2. Goals
• Describe the steps of descriptive data
analysis
• Be able to define variables
• Understand basic coding principles
• Learn simple descriptive data analysis
• Learn simple inferential statistics
6. Descriptive
Statistics
graphs
tabulations
calculations
- Proportions, rates &
ratios.
- Measures of central
tendency (Mean,
Mode & Median).
- Measures of
dispersion (S.d,
range).
-Quintiles.
-Frequency
distribution
tables.
-Cross tabs.
-
- Bar graphs.
-Pie chart.
-Histogram.
- Scatter plot.
4/7/2021 6
7. Types of Variables
• (Quantitative) Numerical variables:
– Always numbers
– Examples: age in years, weight, blood pressure readings,
temperature, concentrations of pollutants and, counts of
cases per week, or any other measurements
• (qualitative) Categorical variables:
– Information that can be found into categories
– Types of categorical variables – ordinal, nominal and
dichotomous (binary)
8. Categorical Variables:
Ordinal Variables
• Ordinal variable—a categorical variable with some
intrinsic order
• Examples of ordinal variables:
– Education (illitrate, HS degree, some college, college
degree)
– Agreement (strongly disagree, disagree, neutral, agree,
strongly agree)
– Rating (excellent, good, fair, poor)
– Frequency (always, often, sometimes, never)
– Any other scale (“On a scale of 1 to 5...”)
9. Categorical Variables:
Nominal Variables
• Nominal variable – a categorical variable without an
intrinsic order
• Examples of nominal variables:
– Where a person lives in the U.S. (Northeast, South,
Midwest, etc.)
– Nationality (American, Mexican, French)
– Race/ethnicity (African American, Hispanic, White, Asian
American)
– Favorite pet (dog, cat, fish, snake)
10. Categorical Variables:
Dichotomous Variables
• Dichotomous (or binary) variables – a categorical
variable with only 2 levels of categories
– Often represents the answer to a yes or no question
• For example:
– “Did you attend the church on May 24?” Yes /No
– “Did you eat potato salad ?” Yes/No
– Anything with only 2 categories
– Gender (male, female)
11. Coding
• Coding – process of translating information gathered
from questionnaires or other sources into something
that can be analyzed
• Involves assigning a value to the information given—
often value is given a label
• Coding can make data more consistent:
– Example: Question = Gender
Answers = Male, Female, M, or F -> (0 ,1)
12. Coding Systems
• Common coding systems (code and label) for dichotomous
variables:
– 0=No 1=Yes
(1 = value assigned, Yes= label of value)
– OR: 1=No 2=Yes
• When you assign a value you must also make it clear what
that value means
– As long as it is clear how the data are coded, either is fine
• You can make it clear by creating a data dictionary to
accompany the dataset
13. Coding:
Attaching Labels to Values
• Many analysis software packages allow you to attach a label
to the variable values
Example: Label 0’s as male and 1’s as female
• Makes reading data output easier:
Without label: Variable SEX Frequency Percent
0 21 60%
1 14 40%
With label: Variable SEX Frequency Percent
Male 21 60%
Female 14 40%
14. Coding- Ordinal Variables
• Coding process is similar with other categorical variables
• Example: variable EDUCATION, possible coding:
0 = Did not graduate from high school
1 = High school graduate
2 = Some college or post-high school education
3 = College graduate
• Could be coded in reverse order (0=college graduate, 3=did
not graduate high school).
• For this ordinal categorical variable we want to be consistent
with numbering because the value of the code assigned has
significance.
15. Coding: Nominal Variables
• For coding nominal variables, order makes no
difference
• Example: variable RESIDE
1 = Northeast
2 = South
3 = Northwest
4 = Midwest
5 = Southwest
• Order does not matter, no ordered value associated
with each response
16. Coding: Continuous Variables
• Creating categories from a continuous variable (age) is
common
• Example: variable = AGE_CAT
Children= 0–9 years old
Teenagers= 10–19 years old
Young adults = 20–39 years old
Middle aged = 40–59 years old
Elderlies= 60 years or older
17. Data Cleaning
• One of the first steps in analyzing data is to “clean” it
of any obvious data entry errors:
– Outliers? (really high or low numbers)
Example: Age = 110 (really 10 or 11?)
– Value entered that doesn’t exist for variable?
Example: 2 entered where 1=male, 0=female
– Missing values?
Did the person not give an answer? Was answer
accidentally not entered into the database?
18. Data Cleaning (cont.)
• “double-entry” – ie., entering the data twice and then
comparing both entries for discrepancies
• Univariate data analysis is a useful way to check the quality of
the data
19. Univariate Data Analysis
• Univariate data analysis-explores each variable in a
data set separately:
– Serves as a good method to check the quality of the data
– Inconsistencies or unexpected results should be
investigated using the original data as the reference point
• Frequencies (percentages) can tell you if many study
participants share a characteristic of interest (age,
gender, etc.)
– Graphs and tables can be helpful
20. Univariate Data Analysis (cont.)
• Examining variables can give you important
information:
– Do all subjects have data, or are values missing?
– Are most values clumped together, or is there a lot of
variation?
– Are there outliers?
– Do the minimum and maximum values make sense, or
could there be mistakes in the coding?
21. Recap:
• All these descriptive statistics are univariate
(describe only one variable).
• Next week, we will discuss bivariate
descriptive analysis (2 variables involved).
24. Use of descriptive
Statistics in quantitative
graphs
calculations
- Measures of central
tendency (Mean,
Mode & Median).
- Measures of
dispersion (S.d,
range).
-correlation coefficient
- Regression
coefficient.
- Quintiles.
- Histogram.
- Scatter plot.
4/7/2021 24
25. Use of descriptive
Statistics in
qualitative data
graphs
tabulations
calculations
- Proportions, rates &
ratios.
-Frequency
distribution
tables.
-Cross tabs.
- Bar graphs.
-Pie chart.
4/7/2021 25
26. Proportion (percentage, frequency):
Proportion:
a included in the denominator (a + b)
No measurement unit
> 0 to < 1
Often expressed as %
• Example: From 7,999 females there are 2,496 use modern contraceptive
methods.
• The proportion of those who use modern contraceptive methods
= 2,496 / 7,999 x 100 = 31.2%
26
4/7/2021
28. Prevalence rate:
Rate: is a specific time of proportion
Prevalence rate: the proportion of a defined group or population
that has a clinical condition or outcome at a given point in time
– Prevalence rate = Number of cases observed at time t
Total number of individuals at time t
• ranges from 0 to 1 (it’s a proportion), but usually referred
to as a rate and is often shown as a %
28
4/7/2021
29. Prevalence rate:
Example:
• Of 100 patients hospitalized with stroke, 18 had
Myocardial infarction (MI)
• Prevalence of MI among hospitalized stroke
patients = 18%
• The prevalence rate answers the question:
– “what fraction of the group is affected at this moment
in time?”
29
4/7/2021
30. Incidence rate in population based data:
4/7/2021 أسنان صحة
(
1
) 30
32. Descriptive statistics of Categorical Data
• Distribution of categorical
variables should be
examined before more in-
depth analyses.
– Bar graph
Number of people answering example questionnaire who reside
in 5 regions of the United States
Distribution of Area of Residence
Example Questionnaire Data
0
5
10
15
20
25
30
Midwest Northeast Northwest South Southwest
variable: RESIDE
Number
of
People
33. Descriptive statistics of Categorical Data
• Another way to look at
the data is to list the data
categories in tables.
• Frequency distribution
table.
Frequency Percent
Midwest 16 20%
Northeast 13 16%
Northwest 19 24%
South 24 30%
Southwest 8 10%
Total 80 100%
Table: Number of people answering sample
questionnaire who reside in 5 regions of the United
States
37. Descriptive statistics
• Commonly used statistics with univariate analysis of
continuous variables:
– Mean – average of all values of this variable in the dataset
– Median – the middle of the distribution, the number
where half of the values are above and half are below
– Mode – the value that occurs the most times
– Range of values – from minimum value to maximum value
38. Statistics describing a continuous variable distribution
Example Scatter Chart: Age
0
10
20
30
40
50
60
70
80
90
Age
(in
years)
,
84 = Maximum (an outlier)
2 = Minimum
28 = Mode (Occurs
twice)
33 = Mean
36 = Median (50th
Percentile)
42. Measures of Central Tendency
Mean … the most frequently used but is
sensitive to extreme scores
e.g. 1 2 3 4 5 6 7 8 9 10
Mean = 5.5 (median = 5.5)
e.g. 1 2 3 4 5 6 7 8 9 20
Mean = 6.5 (median = 5.5)
e.g. 1 2 3 4 5 6 7 8 9 100
Mean = 14.5 (median = 5.5)
49. Histogram (only for a numerical variable)
• Divide measurement up into equal-sized
categories.
• Determine number of measurements falling
into each category.
• Draw a bar for each category so bars’
heights represent number (or percent)
falling into the categories.
58. •bell-shaped density function.
•Symmetric, around the mean
•Mean=Median=Mode
• 68% of area under the curve between m s.
• 95% of area under the curve between m 2s.
• 99.7% of area under the curve between m 3s.
Standard Normal Form
.68
.95
m
ms m+s m+2s
m2s
Properties of the Normal Distribution
Empirical Rule
59. Estimation
• Estimation is one of the main purposes of
statistics.
• The basic idea is that we take a sample of data
and use it to make inferences about the
population of interest.
Important distrbutions 59
60. Estimation
• Estimation involves the calculation of confidence
intervals for some statistic (For ex. a mean or
proportion)
Important distrbutions 60
61. Example I
• What is the complication rate of heart surgery in KFH
hospital?
• Using 3 years of data from KFH , a sample of 52
patients who had a heart surgery was selected; of these,
4 patients had a complication.
• 7.7% complication rate (95% Confidence Interval = 2.5%
to 12.5%)
Important distrbutions 61
62. Confidence interval
• Interpretation of 95% confidence interval:
Based on our sample data,
“we are 95% confident that the "true"
complication rate at KFH is between 2.5% and
12.5%.”
Important distrbutions 62
63. Advantages of using confidence intervals:
• (1) Confidence intervals remind us that study estimates
have variability (i.e. the width of the CI).
• (2) Confidence intervals show clearly the role that sample
size plays in the estimation.
. Large sample size = Narrow confidence limits
Small sample size = Wide confidence limits
Important distrbutions 63
64. Calculation of confidence interval of the mean
1. Compute the standard error of the mean.
• 2. Add and subtract 2 SE to the mean to formulate the
interval (from F to Q)
Important distrbutions 64
66. Example
• A random sample of 16 students reported
having an average age of 31 with a
standard deviation of 6 years.
• In what range of values can we be 95%
67. Example
• 95% confidence interval =
• C. I. = 31 ± 1.96 ( 6/4) = 31± 3
• C.I = (31-3 to 31 + 3)= (28 to 34) years old
• Interpretation???
68. Length of Confidence Interval
• We want confidence interval to be as
narrow as possible.
• Length = Upper Limit - Lower Limit
69. How length of CI is affected?
• As the standard deviation decreases…
• As we decrease the confidence level…
• As we increase sample size …