SlideShare a Scribd company logo
1 of 71
Week 2: Measures of diseases
occurrence and related statistics
Dr. Hamdi Alhakimi
MD, MPH, M- epidemiology
Goals
• Describe the steps of descriptive data
analysis
• Be able to define variables
• Understand basic coding principles
• Learn simple descriptive data analysis
• Learn simple inferential statistics
DATA
RESULTS
+
conclusions
Biostatistics
4/7/2021 3
DATA
Binary
4/7/2021 4
Types of
Biostatistics
Descriptive
Statistics
Inferential
statistics
4/7/2021 5
Descriptive
Statistics
graphs
tabulations
calculations
- Proportions, rates &
ratios.
- Measures of central
tendency (Mean,
Mode & Median).
- Measures of
dispersion (S.d,
range).
-Quintiles.
-Frequency
distribution
tables.
-Cross tabs.
-
- Bar graphs.
-Pie chart.
-Histogram.
- Scatter plot.
4/7/2021 6
Types of Variables
• (Quantitative) Numerical variables:
– Always numbers
– Examples: age in years, weight, blood pressure readings,
temperature, concentrations of pollutants and, counts of
cases per week, or any other measurements
• (qualitative) Categorical variables:
– Information that can be found into categories
– Types of categorical variables – ordinal, nominal and
dichotomous (binary)
Categorical Variables:
Ordinal Variables
• Ordinal variable—a categorical variable with some
intrinsic order
• Examples of ordinal variables:
– Education (illitrate, HS degree, some college, college
degree)
– Agreement (strongly disagree, disagree, neutral, agree,
strongly agree)
– Rating (excellent, good, fair, poor)
– Frequency (always, often, sometimes, never)
– Any other scale (“On a scale of 1 to 5...”)
Categorical Variables:
Nominal Variables
• Nominal variable – a categorical variable without an
intrinsic order
• Examples of nominal variables:
– Where a person lives in the U.S. (Northeast, South,
Midwest, etc.)
– Nationality (American, Mexican, French)
– Race/ethnicity (African American, Hispanic, White, Asian
American)
– Favorite pet (dog, cat, fish, snake)
Categorical Variables:
Dichotomous Variables
• Dichotomous (or binary) variables – a categorical
variable with only 2 levels of categories
– Often represents the answer to a yes or no question
• For example:
– “Did you attend the church on May 24?” Yes /No
– “Did you eat potato salad ?” Yes/No
– Anything with only 2 categories
– Gender (male, female)
Coding
• Coding – process of translating information gathered
from questionnaires or other sources into something
that can be analyzed
• Involves assigning a value to the information given—
often value is given a label
• Coding can make data more consistent:
– Example: Question = Gender
Answers = Male, Female, M, or F -> (0 ,1)
Coding Systems
• Common coding systems (code and label) for dichotomous
variables:
– 0=No 1=Yes
(1 = value assigned, Yes= label of value)
– OR: 1=No 2=Yes
• When you assign a value you must also make it clear what
that value means
– As long as it is clear how the data are coded, either is fine
• You can make it clear by creating a data dictionary to
accompany the dataset
Coding:
Attaching Labels to Values
• Many analysis software packages allow you to attach a label
to the variable values
Example: Label 0’s as male and 1’s as female
• Makes reading data output easier:
Without label: Variable SEX Frequency Percent
0 21 60%
1 14 40%
With label: Variable SEX Frequency Percent
Male 21 60%
Female 14 40%
Coding- Ordinal Variables
• Coding process is similar with other categorical variables
• Example: variable EDUCATION, possible coding:
0 = Did not graduate from high school
1 = High school graduate
2 = Some college or post-high school education
3 = College graduate
• Could be coded in reverse order (0=college graduate, 3=did
not graduate high school).
• For this ordinal categorical variable we want to be consistent
with numbering because the value of the code assigned has
significance.
Coding: Nominal Variables
• For coding nominal variables, order makes no
difference
• Example: variable RESIDE
1 = Northeast
2 = South
3 = Northwest
4 = Midwest
5 = Southwest
• Order does not matter, no ordered value associated
with each response
Coding: Continuous Variables
• Creating categories from a continuous variable (age) is
common
• Example: variable = AGE_CAT
Children= 0–9 years old
Teenagers= 10–19 years old
Young adults = 20–39 years old
Middle aged = 40–59 years old
Elderlies= 60 years or older
Data Cleaning
• One of the first steps in analyzing data is to “clean” it
of any obvious data entry errors:
– Outliers? (really high or low numbers)
Example: Age = 110 (really 10 or 11?)
– Value entered that doesn’t exist for variable?
Example: 2 entered where 1=male, 0=female
– Missing values?
Did the person not give an answer? Was answer
accidentally not entered into the database?
Data Cleaning (cont.)
• “double-entry” – ie., entering the data twice and then
comparing both entries for discrepancies
• Univariate data analysis is a useful way to check the quality of
the data
Univariate Data Analysis
• Univariate data analysis-explores each variable in a
data set separately:
– Serves as a good method to check the quality of the data
– Inconsistencies or unexpected results should be
investigated using the original data as the reference point
• Frequencies (percentages) can tell you if many study
participants share a characteristic of interest (age,
gender, etc.)
– Graphs and tables can be helpful
Univariate Data Analysis (cont.)
• Examining variables can give you important
information:
– Do all subjects have data, or are values missing?
– Are most values clumped together, or is there a lot of
variation?
– Are there outliers?
– Do the minimum and maximum values make sense, or
could there be mistakes in the coding?
Recap:
• All these descriptive statistics are univariate
(describe only one variable).
• Next week, we will discuss bivariate
descriptive analysis (2 variables involved).
Describing disease occurrence
Descriptive statistics in
qualitative data
Lecture 4
Use of descriptive
Statistics in quantitative
graphs
calculations
- Measures of central
tendency (Mean,
Mode & Median).
- Measures of
dispersion (S.d,
range).
-correlation coefficient
- Regression
coefficient.
- Quintiles.
- Histogram.
- Scatter plot.
4/7/2021 24
Use of descriptive
Statistics in
qualitative data
graphs
tabulations
calculations
- Proportions, rates &
ratios.
-Frequency
distribution
tables.
-Cross tabs.
- Bar graphs.
-Pie chart.
4/7/2021 25
Proportion (percentage, frequency):
Proportion:
a included in the denominator (a + b)
No measurement unit
> 0 to < 1
Often expressed as %
• Example: From 7,999 females there are 2,496 use modern contraceptive
methods.
• The proportion of those who use modern contraceptive methods
= 2,496 / 7,999 x 100 = 31.2%
26
4/7/2021
27
4/7/2021
Prevalence rate:
Rate: is a specific time of proportion
Prevalence rate: the proportion of a defined group or population
that has a clinical condition or outcome at a given point in time
– Prevalence rate = Number of cases observed at time t
Total number of individuals at time t
• ranges from 0 to 1 (it’s a proportion), but usually referred
to as a rate and is often shown as a %
28
4/7/2021
Prevalence rate:
Example:
• Of 100 patients hospitalized with stroke, 18 had
Myocardial infarction (MI)
• Prevalence of MI among hospitalized stroke
patients = 18%
• The prevalence rate answers the question:
– “what fraction of the group is affected at this moment
in time?”
29
4/7/2021
Incidence rate in population based data:
4/7/2021 ‫أسنان‬ ‫صحة‬
(
1
) 30
Incidence rate: (usually in clinical data)
4/7/2021 ‫أسنان‬ ‫صحة‬
(
1
) 31
Descriptive statistics of Categorical Data
• Distribution of categorical
variables should be
examined before more in-
depth analyses.
– Bar graph
Number of people answering example questionnaire who reside
in 5 regions of the United States
Distribution of Area of Residence
Example Questionnaire Data
0
5
10
15
20
25
30
Midwest Northeast Northwest South Southwest
variable: RESIDE
Number
of
People
Descriptive statistics of Categorical Data
• Another way to look at
the data is to list the data
categories in tables.
• Frequency distribution
table.
Frequency Percent
Midwest 16 20%
Northeast 13 16%
Northwest 19 24%
South 24 30%
Southwest 8 10%
Total 80 100%
Table: Number of people answering sample
questionnaire who reside in 5 regions of the United
States
Descriptive statistics of continuous
variable
4/7/2021 ‫أسنان‬ ‫صحة‬
(
1
) 34
Measures of Central Tendency
• Measures of central tendency yield information about the
center of the data.
• Common Measures of Location
–Mode
–Median
–Mean
© 2002 Thomson / South-Western Slide 3-35
Mean
• Is the average of a group of numbers.
• Applicable for continuous and discrete data, not applicable for
nominal or ordinal data.
• Affected by each value in the data set, including extreme
values.
• Computed by summing all values in the data set and dividing
the sum by the number of values in the data set.
© 2002 Thomson / South-Western Slide 3-36
Descriptive statistics
• Commonly used statistics with univariate analysis of
continuous variables:
– Mean – average of all values of this variable in the dataset
– Median – the middle of the distribution, the number
where half of the values are above and half are below
– Mode – the value that occurs the most times
– Range of values – from minimum value to maximum value
Statistics describing a continuous variable distribution
Example Scatter Chart: Age
0
10
20
30
40
50
60
70
80
90
Age
(in
years)
,
84 = Maximum (an outlier)
2 = Minimum
28 = Mode (Occurs
twice)
33 = Mean
36 = Median (50th
Percentile)
Median
• Middle value in an ordered array of numbers.
• Applicable for ordinal, interval, and ratio data.
• Unaffected by extremely large and extremely
small values.
© 2002 Thomson / South-Western Slide 3-39
Median: Computational Procedure
• First Procedure
– Arrange observations in an ordered array.
– If number of terms is odd, the median is the
middle term of the ordered array.
– If number of terms is even, the median is the
average of the middle two terms.
© 2002 Thomson / South-Western Slide 3-40
Median: Example
Ordered Array includes:
4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
• Median is 15.
© 2002 Thomson / South-Western Slide 3-41
Measures of Central Tendency
Mean … the most frequently used but is
sensitive to extreme scores
e.g. 1 2 3 4 5 6 7 8 9 10
Mean = 5.5 (median = 5.5)
e.g. 1 2 3 4 5 6 7 8 9 20
Mean = 6.5 (median = 5.5)
e.g. 1 2 3 4 5 6 7 8 9 100
Mean = 14.5 (median = 5.5)
Quartiles: after sorting of data
© 2002 Thomson / South-Western Slide 3-43
25% 25% 25% 25%
Q3
Q2
Q1
Measures of Variability
• Measures of variability describe the spread or
the dispersion of a set of data.
• Common Measures of Variability
–Standard Deviation
© 2002 Thomson / South-Western Slide 3-44
Variation: Standard Deviation
Example Scatter Chart 2: Age
0
10
20
30
40
50
60
70
80
90
Age
(in
years)
.
Example Scatter Chart 1: Age
0
10
20
30
40
50
60
70
80
90
Age
(in
years)
,
• Figure left: narrowly distributed age values (SD = 7.6) , mean=33
• Figure right: widely distributed age values (SD = 20.4), mean=33
Variability
© 2002 Thomson / South-Western Slide 3-46
Mean
Mean
Mean
No Variability in Cash Flow
Variability in Cash Flow Mean
Standard Deviation: measures of variation
• Square root of the
sample variance
© 2002 Thomson / South-Western Slide 3-47
 
2
2
2
1
663 866
3
221 288 67
221 288 67
470 41
S
X X
S
n
S









,
, .
, .
.
2,398
1,844
1,539
1,311
7,092
625
71
-234
-462
0
390,625
5,041
54,756
213,444
663,866
X X X
  
2
X X

Graphs to describe a numerical variable
Histogram (only for a numerical variable)
• Divide measurement up into equal-sized
categories.
• Determine number of measurements falling
into each category.
• Draw a bar for each category so bars’
heights represent number (or percent)
falling into the categories.
Histograms
Graph that uses
bars to show
frequencies or
percentage of a
possible outcome.
Too few categories
18 23 28
0
10
20
30
40
50
60
Age (in years)
Age of Spring 1998 Stat 250 Students
n=92 students
Too many categories
2 3 4
0
1
2
3
4
5
6
7
GPA
Frequency
(Count) GPAs of Spring 1998 Stat 250 Students
n=92 students
Good histogram
Histogram with normal curve
Normal distribution of a continuous variable
Why normal distribution is
important?
• Answer is in the next week?
Inferential statistic
Confidence Interval of Mean
57
•bell-shaped density function.
•Symmetric, around the mean
•Mean=Median=Mode
• 68% of area under the curve between m  s.
• 95% of area under the curve between m  2s.
• 99.7% of area under the curve between m  3s.
Standard Normal Form
.68
.95
m
ms m+s m+2s
m2s
Properties of the Normal Distribution
Empirical Rule
Estimation
• Estimation is one of the main purposes of
statistics.
• The basic idea is that we take a sample of data
and use it to make inferences about the
population of interest.
Important distrbutions 59
Estimation
• Estimation involves the calculation of confidence
intervals for some statistic (For ex. a mean or
proportion)
Important distrbutions 60
Example I
• What is the complication rate of heart surgery in KFH
hospital?
• Using 3 years of data from KFH , a sample of 52
patients who had a heart surgery was selected; of these,
4 patients had a complication.
• 7.7% complication rate (95% Confidence Interval = 2.5%
to 12.5%)
Important distrbutions 61
Confidence interval
• Interpretation of 95% confidence interval:
Based on our sample data,
“we are 95% confident that the "true"
complication rate at KFH is between 2.5% and
12.5%.”
Important distrbutions 62
Advantages of using confidence intervals:
• (1) Confidence intervals remind us that study estimates
have variability (i.e. the width of the CI).
• (2) Confidence intervals show clearly the role that sample
size plays in the estimation.
. Large sample size = Narrow confidence limits
Small sample size = Wide confidence limits
Important distrbutions 63
Calculation of confidence interval of the mean
1. Compute the standard error of the mean.
• 2. Add and subtract 2 SE to the mean to formulate the
interval (from F to Q)
Important distrbutions 64
95% Confidence Interval
Formula in English:
Estimate ± (1.96 × standard error)
Example
• A random sample of 16 students reported
having an average age of 31 with a
standard deviation of 6 years.
• In what range of values can we be 95%
Example
• 95% confidence interval =
• C. I. = 31 ± 1.96 ( 6/4) = 31± 3
• C.I = (31-3 to 31 + 3)= (28 to 34) years old
• Interpretation???
Length of Confidence Interval
• We want confidence interval to be as
narrow as possible.
• Length = Upper Limit - Lower Limit
How length of CI is affected?
• As the standard deviation decreases…
• As we decrease the confidence level…
• As we increase sample size …
Population
Mean = m
Sample
x

mean
m
n
s
2
n
s
2
s.d. = s
There is 95% chance that will fall
inside the interval
x
n
s
m
2

ND
• Questions

More Related Content

What's hot

Choosing the Right Statistical Techniques
Choosing the Right Statistical TechniquesChoosing the Right Statistical Techniques
Choosing the Right Statistical TechniquesBodhiya Wijaya Mulya
 
Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsSetia Pramana
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eeznEhealthMoHS
 
MELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updates
MELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updatesMELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updates
MELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updatesMELJUN CORTES
 
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettyApplication of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettySundar B N
 
MELJUN CORTES research seminar_1__data_analysis_basics_slides
MELJUN CORTES research seminar_1__data_analysis_basics_slidesMELJUN CORTES research seminar_1__data_analysis_basics_slides
MELJUN CORTES research seminar_1__data_analysis_basics_slidesMELJUN CORTES
 
MELJUN CORTES research seminar_1_data_analysis_basics
MELJUN CORTES research seminar_1_data_analysis_basicsMELJUN CORTES research seminar_1_data_analysis_basics
MELJUN CORTES research seminar_1_data_analysis_basicsMELJUN CORTES
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysissristi1992
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysisankurjain1909
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Quantitative data analysis
Quantitative data analysisQuantitative data analysis
Quantitative data analysisAyuni Abdullah
 
Applied statistics lecture 1
Applied statistics lecture 1Applied statistics lecture 1
Applied statistics lecture 1Daria Bogdanova
 

What's hot (20)

Choosing the Right Statistical Techniques
Choosing the Right Statistical TechniquesChoosing the Right Statistical Techniques
Choosing the Right Statistical Techniques
 
Biostatistics and Statistical Bioinformatics
Biostatistics and Statistical BioinformaticsBiostatistics and Statistical Bioinformatics
Biostatistics and Statistical Bioinformatics
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
MELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updates
MELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updatesMELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updates
MELJUN CORTES research seminar_1__data_analysis_basics_slides_2nd_updates
 
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettyApplication of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
 
Data analysis
Data analysisData analysis
Data analysis
 
Analysis
AnalysisAnalysis
Analysis
 
Chapter One Introduction To Business Statistics
Chapter One Introduction To Business StatisticsChapter One Introduction To Business Statistics
Chapter One Introduction To Business Statistics
 
MELJUN CORTES research seminar_1__data_analysis_basics_slides
MELJUN CORTES research seminar_1__data_analysis_basics_slidesMELJUN CORTES research seminar_1__data_analysis_basics_slides
MELJUN CORTES research seminar_1__data_analysis_basics_slides
 
MELJUN CORTES research seminar_1_data_analysis_basics
MELJUN CORTES research seminar_1_data_analysis_basicsMELJUN CORTES research seminar_1_data_analysis_basics
MELJUN CORTES research seminar_1_data_analysis_basics
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
statistical analysis
statistical analysisstatistical analysis
statistical analysis
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysis
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Bio statistics1
Bio statistics1Bio statistics1
Bio statistics1
 
Quantitative data analysis
Quantitative data analysisQuantitative data analysis
Quantitative data analysis
 
Applied statistics lecture 1
Applied statistics lecture 1Applied statistics lecture 1
Applied statistics lecture 1
 

Similar to Week 2 measures of disease occurence

Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.pptssuserf0d95a
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyKern Rocke
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdfBirhanTesema
 
Introduction to Biostatistics_20_4_17.ppt
Introduction to Biostatistics_20_4_17.pptIntroduction to Biostatistics_20_4_17.ppt
Introduction to Biostatistics_20_4_17.pptnyakundi340
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptxMuddaAbdo1
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care Dhasarathi Kumar
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theoryUnsa Shakir
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptDiptoKumerSarker1
 
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxNeed a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxlea6nklmattu
 

Similar to Week 2 measures of disease occurence (20)

Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
 
Lect 1_Biostat.pdf
Lect 1_Biostat.pdfLect 1_Biostat.pdf
Lect 1_Biostat.pdf
 
RM7.ppt
RM7.pptRM7.ppt
RM7.ppt
 
Introduction to Biostatistics_20_4_17.ppt
Introduction to Biostatistics_20_4_17.pptIntroduction to Biostatistics_20_4_17.ppt
Introduction to Biostatistics_20_4_17.ppt
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
Frequency Distribution.pdf
Frequency Distribution.pdfFrequency Distribution.pdf
Frequency Distribution.pdf
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
introduction to statistical theory
introduction to statistical theoryintroduction to statistical theory
introduction to statistical theory
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
Data collection
Data collectionData collection
Data collection
 
Need a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docxNeed a nonplagiarised paper and a form completed by 1006015 before.docx
Need a nonplagiarised paper and a form completed by 1006015 before.docx
 

Recently uploaded

Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaPooja Gupta
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girl Nagpur Sia 7001305949 Independent Escort Service Nagpur
Call Girl Nagpur Sia 7001305949 Independent Escort Service NagpurCall Girl Nagpur Sia 7001305949 Independent Escort Service Nagpur
Call Girl Nagpur Sia 7001305949 Independent Escort Service NagpurRiya Pathan
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingArunagarwal328757
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...rajnisinghkjn
 

Recently uploaded (20)

Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in munirka  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in munirka DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girl Nagpur Sia 7001305949 Independent Escort Service Nagpur
Call Girl Nagpur Sia 7001305949 Independent Escort Service NagpurCall Girl Nagpur Sia 7001305949 Independent Escort Service Nagpur
Call Girl Nagpur Sia 7001305949 Independent Escort Service Nagpur
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, Pricing
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
Noida Sector 135 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few C...
 

Week 2 measures of disease occurence

  • 1. Week 2: Measures of diseases occurrence and related statistics Dr. Hamdi Alhakimi MD, MPH, M- epidemiology
  • 2. Goals • Describe the steps of descriptive data analysis • Be able to define variables • Understand basic coding principles • Learn simple descriptive data analysis • Learn simple inferential statistics
  • 6. Descriptive Statistics graphs tabulations calculations - Proportions, rates & ratios. - Measures of central tendency (Mean, Mode & Median). - Measures of dispersion (S.d, range). -Quintiles. -Frequency distribution tables. -Cross tabs. - - Bar graphs. -Pie chart. -Histogram. - Scatter plot. 4/7/2021 6
  • 7. Types of Variables • (Quantitative) Numerical variables: – Always numbers – Examples: age in years, weight, blood pressure readings, temperature, concentrations of pollutants and, counts of cases per week, or any other measurements • (qualitative) Categorical variables: – Information that can be found into categories – Types of categorical variables – ordinal, nominal and dichotomous (binary)
  • 8. Categorical Variables: Ordinal Variables • Ordinal variable—a categorical variable with some intrinsic order • Examples of ordinal variables: – Education (illitrate, HS degree, some college, college degree) – Agreement (strongly disagree, disagree, neutral, agree, strongly agree) – Rating (excellent, good, fair, poor) – Frequency (always, often, sometimes, never) – Any other scale (“On a scale of 1 to 5...”)
  • 9. Categorical Variables: Nominal Variables • Nominal variable – a categorical variable without an intrinsic order • Examples of nominal variables: – Where a person lives in the U.S. (Northeast, South, Midwest, etc.) – Nationality (American, Mexican, French) – Race/ethnicity (African American, Hispanic, White, Asian American) – Favorite pet (dog, cat, fish, snake)
  • 10. Categorical Variables: Dichotomous Variables • Dichotomous (or binary) variables – a categorical variable with only 2 levels of categories – Often represents the answer to a yes or no question • For example: – “Did you attend the church on May 24?” Yes /No – “Did you eat potato salad ?” Yes/No – Anything with only 2 categories – Gender (male, female)
  • 11. Coding • Coding – process of translating information gathered from questionnaires or other sources into something that can be analyzed • Involves assigning a value to the information given— often value is given a label • Coding can make data more consistent: – Example: Question = Gender Answers = Male, Female, M, or F -> (0 ,1)
  • 12. Coding Systems • Common coding systems (code and label) for dichotomous variables: – 0=No 1=Yes (1 = value assigned, Yes= label of value) – OR: 1=No 2=Yes • When you assign a value you must also make it clear what that value means – As long as it is clear how the data are coded, either is fine • You can make it clear by creating a data dictionary to accompany the dataset
  • 13. Coding: Attaching Labels to Values • Many analysis software packages allow you to attach a label to the variable values Example: Label 0’s as male and 1’s as female • Makes reading data output easier: Without label: Variable SEX Frequency Percent 0 21 60% 1 14 40% With label: Variable SEX Frequency Percent Male 21 60% Female 14 40%
  • 14. Coding- Ordinal Variables • Coding process is similar with other categorical variables • Example: variable EDUCATION, possible coding: 0 = Did not graduate from high school 1 = High school graduate 2 = Some college or post-high school education 3 = College graduate • Could be coded in reverse order (0=college graduate, 3=did not graduate high school). • For this ordinal categorical variable we want to be consistent with numbering because the value of the code assigned has significance.
  • 15. Coding: Nominal Variables • For coding nominal variables, order makes no difference • Example: variable RESIDE 1 = Northeast 2 = South 3 = Northwest 4 = Midwest 5 = Southwest • Order does not matter, no ordered value associated with each response
  • 16. Coding: Continuous Variables • Creating categories from a continuous variable (age) is common • Example: variable = AGE_CAT Children= 0–9 years old Teenagers= 10–19 years old Young adults = 20–39 years old Middle aged = 40–59 years old Elderlies= 60 years or older
  • 17. Data Cleaning • One of the first steps in analyzing data is to “clean” it of any obvious data entry errors: – Outliers? (really high or low numbers) Example: Age = 110 (really 10 or 11?) – Value entered that doesn’t exist for variable? Example: 2 entered where 1=male, 0=female – Missing values? Did the person not give an answer? Was answer accidentally not entered into the database?
  • 18. Data Cleaning (cont.) • “double-entry” – ie., entering the data twice and then comparing both entries for discrepancies • Univariate data analysis is a useful way to check the quality of the data
  • 19. Univariate Data Analysis • Univariate data analysis-explores each variable in a data set separately: – Serves as a good method to check the quality of the data – Inconsistencies or unexpected results should be investigated using the original data as the reference point • Frequencies (percentages) can tell you if many study participants share a characteristic of interest (age, gender, etc.) – Graphs and tables can be helpful
  • 20. Univariate Data Analysis (cont.) • Examining variables can give you important information: – Do all subjects have data, or are values missing? – Are most values clumped together, or is there a lot of variation? – Are there outliers? – Do the minimum and maximum values make sense, or could there be mistakes in the coding?
  • 21. Recap: • All these descriptive statistics are univariate (describe only one variable). • Next week, we will discuss bivariate descriptive analysis (2 variables involved).
  • 24. Use of descriptive Statistics in quantitative graphs calculations - Measures of central tendency (Mean, Mode & Median). - Measures of dispersion (S.d, range). -correlation coefficient - Regression coefficient. - Quintiles. - Histogram. - Scatter plot. 4/7/2021 24
  • 25. Use of descriptive Statistics in qualitative data graphs tabulations calculations - Proportions, rates & ratios. -Frequency distribution tables. -Cross tabs. - Bar graphs. -Pie chart. 4/7/2021 25
  • 26. Proportion (percentage, frequency): Proportion: a included in the denominator (a + b) No measurement unit > 0 to < 1 Often expressed as % • Example: From 7,999 females there are 2,496 use modern contraceptive methods. • The proportion of those who use modern contraceptive methods = 2,496 / 7,999 x 100 = 31.2% 26 4/7/2021
  • 28. Prevalence rate: Rate: is a specific time of proportion Prevalence rate: the proportion of a defined group or population that has a clinical condition or outcome at a given point in time – Prevalence rate = Number of cases observed at time t Total number of individuals at time t • ranges from 0 to 1 (it’s a proportion), but usually referred to as a rate and is often shown as a % 28 4/7/2021
  • 29. Prevalence rate: Example: • Of 100 patients hospitalized with stroke, 18 had Myocardial infarction (MI) • Prevalence of MI among hospitalized stroke patients = 18% • The prevalence rate answers the question: – “what fraction of the group is affected at this moment in time?” 29 4/7/2021
  • 30. Incidence rate in population based data: 4/7/2021 ‫أسنان‬ ‫صحة‬ ( 1 ) 30
  • 31. Incidence rate: (usually in clinical data) 4/7/2021 ‫أسنان‬ ‫صحة‬ ( 1 ) 31
  • 32. Descriptive statistics of Categorical Data • Distribution of categorical variables should be examined before more in- depth analyses. – Bar graph Number of people answering example questionnaire who reside in 5 regions of the United States Distribution of Area of Residence Example Questionnaire Data 0 5 10 15 20 25 30 Midwest Northeast Northwest South Southwest variable: RESIDE Number of People
  • 33. Descriptive statistics of Categorical Data • Another way to look at the data is to list the data categories in tables. • Frequency distribution table. Frequency Percent Midwest 16 20% Northeast 13 16% Northwest 19 24% South 24 30% Southwest 8 10% Total 80 100% Table: Number of people answering sample questionnaire who reside in 5 regions of the United States
  • 34. Descriptive statistics of continuous variable 4/7/2021 ‫أسنان‬ ‫صحة‬ ( 1 ) 34
  • 35. Measures of Central Tendency • Measures of central tendency yield information about the center of the data. • Common Measures of Location –Mode –Median –Mean © 2002 Thomson / South-Western Slide 3-35
  • 36. Mean • Is the average of a group of numbers. • Applicable for continuous and discrete data, not applicable for nominal or ordinal data. • Affected by each value in the data set, including extreme values. • Computed by summing all values in the data set and dividing the sum by the number of values in the data set. © 2002 Thomson / South-Western Slide 3-36
  • 37. Descriptive statistics • Commonly used statistics with univariate analysis of continuous variables: – Mean – average of all values of this variable in the dataset – Median – the middle of the distribution, the number where half of the values are above and half are below – Mode – the value that occurs the most times – Range of values – from minimum value to maximum value
  • 38. Statistics describing a continuous variable distribution Example Scatter Chart: Age 0 10 20 30 40 50 60 70 80 90 Age (in years) , 84 = Maximum (an outlier) 2 = Minimum 28 = Mode (Occurs twice) 33 = Mean 36 = Median (50th Percentile)
  • 39. Median • Middle value in an ordered array of numbers. • Applicable for ordinal, interval, and ratio data. • Unaffected by extremely large and extremely small values. © 2002 Thomson / South-Western Slide 3-39
  • 40. Median: Computational Procedure • First Procedure – Arrange observations in an ordered array. – If number of terms is odd, the median is the middle term of the ordered array. – If number of terms is even, the median is the average of the middle two terms. © 2002 Thomson / South-Western Slide 3-40
  • 41. Median: Example Ordered Array includes: 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 • Median is 15. © 2002 Thomson / South-Western Slide 3-41
  • 42. Measures of Central Tendency Mean … the most frequently used but is sensitive to extreme scores e.g. 1 2 3 4 5 6 7 8 9 10 Mean = 5.5 (median = 5.5) e.g. 1 2 3 4 5 6 7 8 9 20 Mean = 6.5 (median = 5.5) e.g. 1 2 3 4 5 6 7 8 9 100 Mean = 14.5 (median = 5.5)
  • 43. Quartiles: after sorting of data © 2002 Thomson / South-Western Slide 3-43 25% 25% 25% 25% Q3 Q2 Q1
  • 44. Measures of Variability • Measures of variability describe the spread or the dispersion of a set of data. • Common Measures of Variability –Standard Deviation © 2002 Thomson / South-Western Slide 3-44
  • 45. Variation: Standard Deviation Example Scatter Chart 2: Age 0 10 20 30 40 50 60 70 80 90 Age (in years) . Example Scatter Chart 1: Age 0 10 20 30 40 50 60 70 80 90 Age (in years) , • Figure left: narrowly distributed age values (SD = 7.6) , mean=33 • Figure right: widely distributed age values (SD = 20.4), mean=33
  • 46. Variability © 2002 Thomson / South-Western Slide 3-46 Mean Mean Mean No Variability in Cash Flow Variability in Cash Flow Mean
  • 47. Standard Deviation: measures of variation • Square root of the sample variance © 2002 Thomson / South-Western Slide 3-47   2 2 2 1 663 866 3 221 288 67 221 288 67 470 41 S X X S n S          , , . , . . 2,398 1,844 1,539 1,311 7,092 625 71 -234 -462 0 390,625 5,041 54,756 213,444 663,866 X X X    2 X X 
  • 48. Graphs to describe a numerical variable
  • 49. Histogram (only for a numerical variable) • Divide measurement up into equal-sized categories. • Determine number of measurements falling into each category. • Draw a bar for each category so bars’ heights represent number (or percent) falling into the categories.
  • 50. Histograms Graph that uses bars to show frequencies or percentage of a possible outcome.
  • 51. Too few categories 18 23 28 0 10 20 30 40 50 60 Age (in years) Age of Spring 1998 Stat 250 Students n=92 students
  • 52. Too many categories 2 3 4 0 1 2 3 4 5 6 7 GPA Frequency (Count) GPAs of Spring 1998 Stat 250 Students n=92 students
  • 55. Normal distribution of a continuous variable
  • 56. Why normal distribution is important? • Answer is in the next week?
  • 58. •bell-shaped density function. •Symmetric, around the mean •Mean=Median=Mode • 68% of area under the curve between m  s. • 95% of area under the curve between m  2s. • 99.7% of area under the curve between m  3s. Standard Normal Form .68 .95 m ms m+s m+2s m2s Properties of the Normal Distribution Empirical Rule
  • 59. Estimation • Estimation is one of the main purposes of statistics. • The basic idea is that we take a sample of data and use it to make inferences about the population of interest. Important distrbutions 59
  • 60. Estimation • Estimation involves the calculation of confidence intervals for some statistic (For ex. a mean or proportion) Important distrbutions 60
  • 61. Example I • What is the complication rate of heart surgery in KFH hospital? • Using 3 years of data from KFH , a sample of 52 patients who had a heart surgery was selected; of these, 4 patients had a complication. • 7.7% complication rate (95% Confidence Interval = 2.5% to 12.5%) Important distrbutions 61
  • 62. Confidence interval • Interpretation of 95% confidence interval: Based on our sample data, “we are 95% confident that the "true" complication rate at KFH is between 2.5% and 12.5%.” Important distrbutions 62
  • 63. Advantages of using confidence intervals: • (1) Confidence intervals remind us that study estimates have variability (i.e. the width of the CI). • (2) Confidence intervals show clearly the role that sample size plays in the estimation. . Large sample size = Narrow confidence limits Small sample size = Wide confidence limits Important distrbutions 63
  • 64. Calculation of confidence interval of the mean 1. Compute the standard error of the mean. • 2. Add and subtract 2 SE to the mean to formulate the interval (from F to Q) Important distrbutions 64
  • 65. 95% Confidence Interval Formula in English: Estimate ± (1.96 × standard error)
  • 66. Example • A random sample of 16 students reported having an average age of 31 with a standard deviation of 6 years. • In what range of values can we be 95%
  • 67. Example • 95% confidence interval = • C. I. = 31 ± 1.96 ( 6/4) = 31± 3 • C.I = (31-3 to 31 + 3)= (28 to 34) years old • Interpretation???
  • 68. Length of Confidence Interval • We want confidence interval to be as narrow as possible. • Length = Upper Limit - Lower Limit
  • 69. How length of CI is affected? • As the standard deviation decreases… • As we decrease the confidence level… • As we increase sample size …
  • 70. Population Mean = m Sample x  mean m n s 2 n s 2 s.d. = s There is 95% chance that will fall inside the interval x n s m 2  ND