SlideShare a Scribd company logo
Week 1
Part 1 
Introduction to the Course 
The Nature of Data
Why Statistics? 
• Evidence-based practice! 
• Research provides evidence for changes 
in nursing/medical practice 
– Away from “that’s the way it has always been 
done”
Integral to Research 
• Question (hypothesis) 
• Design 
• Data collection 
• Analysis 
• Answer to question 
– And often more questions asked!
Data 
• Data: 
– Factual information, especially information 
organized for analysis or used to reason or 
make decisions; a fact or proposition used to 
draw a conclusion or make a decision 
• The American Heritage® Dictionary of the English Language, Fourth Edition 
Copyright © 2000 by Houghton Mifflin Company. 
– Datum: Singular of data 
• an item of factual information derived from 
measurement or research
Two Types of Data 
• Qualitative 
– Non-numeric or narrative information 
• Example: transcripts of interviews 
• Maybe “scored” to be made quantitative 
• Quantitative 
– Numeric or quantifiable information 
• Example: weights of kindergartners
Variable 
• A quantity capable of assuming a set of 
values 
• A characteristic or attribute of a person, 
object, etc that varies within a population 
under study 
• Examples: 
– Body temperature, BP, DOB, ABG, weight
Independent and Dependent 
• Independent 
– The variable assumed to influence the 
outcome 
• It is independent of the outcome 
– In research, the manipulated variable 
• Dependent 
– The outcome variable of interest 
– In research, value assumed to be dependent 
on the independent variable (by hypothesis)
Independent and Dependent 
• Examples: 
– What is the effect of smoking on the incidence 
of lung cancer? 
– Does high fiber diet reduce the risk of colon 
cancer? 
– Does AZT help prevent maternal transmission 
of HIV?
Discrete vs Continuous 
• Discrete variable: has a finite number of 
values between two points 
• Continuous variable: has, in theory, an 
infinite number of values between two 
points
Discrete vs Continuous 
• Examples: 
– Number of children 
– Body temperature 
– Hospital readmissions 
– Chemotherapy sessions 
– Body weight 
– DOB
Measurement 
• The assignment of numbers to objects 
according to specified rules to 
characterize quantities of some attribute
Measurement Rules 
• Common/familiar/accepted 
– Temperature, weight, height 
• Researcher designed 
– Particularly for new materials/ideas 
• Coding 
– The process of transforming raw data into 
standardized form for processing and analysis
Advantages of Measurement 
• Objectivity 
– Objective measure can be independently 
verified by other researchers 
• Precision 
– Quantitative measures allow for reasonable 
precision 
• Communication 
– Facilitates communication of data and 
research
Levels of Measurement/ 
Types of Variables 
• Nominal 
• Ordinal 
• Interval 
• Ratio
Nominal Measurement/Variable 
• Nominal = Named 
• Lowest level 
• Assignment of characteristics into 
categories 
– Simply putting into boxes with no meaning of 
where the boxes fall in a line 
• Examples 
– Gender, marital status
Ordinal Measurement/Variable 
• Ordinal=Order 
• Next in the hierarchy of measurement 
• Involves rank order of variable along some 
dimension 
• Examples 
– School grades 
– Clinical nursing levels
Interval Measurement/Variable 
• Interval=equal distances 
• Attribute is rank-ordered on a scale that 
has equal distances between points on 
that scale 
• Examples 
– Temperature
Ratio Measurement/Variables 
• Equal distances between score units and 
which has a true, meaningful zero point 
– A true ratio can be calculated 
• The highest level of measurement 
• Examples 
– Weight 
– Pulse
Why care about type of 
measurement/variable? 
• Statistical tests are/have been developed 
to work and provide meaningful analysis 
for specific types of measurement and 
variable 
• The tests you choose to run should be 
based, in part, on the type of variables 
with which you work
Which measurement? 
• A single variable may be measurable 
using different types of measurement 
• Rule of Thumb: use the highest level of 
measurement possible 
– Higher levels provide more information 
– Higher levels can be analyzed with more 
powerful statistical tools
Data Analysis 
• Data starts out “raw” 
– unanalyzed 
• Processing 
– Coding, if appropriate 
– Data entry 
• Into database or matrix 
– Cleaning 
• Finding and correcting (if possible) errors in entry 
and coding 
– Analysis
Sample vs Population 
• Sample 
– A subset of a population 
– Ideally selected to be representative of the 
population 
• Population 
– The entire set of individuals (objects, units, 
etc) having common characteristics
Two Types of Statistics 
• Descriptive 
– Used to describe and summarize data set 
– Allows us to describe, compare, determine a 
relationship 
– Usually straightforward - %, averages, etc 
• Inferential 
– Permit us to infer whether a relationship 
observed in a sample is likely to occur in the 
population of concern 
– Are relationships “real”?
Uses of Inferential Statistics 
• Draw conclusions about a single variable 
in a population 
• Evaluate relationships between variables 
in populations 
• Are the relationships “real”?
Inferential Stats: Relationships 
• Existence 
– Is there a relationship between X and Y? 
• Magnitude 
– How strong is the relationship between X and 
Y? 
• Nature 
– What type of relationship is there between X 
and Y?
Number of variables… 
• “Univariate” 
– One variable being described 
• “Bivariate” 
– Two variables being compared 
• NOTE: in epidemiology, this is also known as 
“univariate” 
• Mulitvariate 
– More than two variables being compared 
• Different statistical tests for each
Purposes of Data Analysis 
• In research all usually get done to some 
extent 
– Clean data 
– Sample description 
– Assessment of bias 
– Evaluation of tools used to collect data 
– Evaluation of need for data transformations 
– Address the research question
Describing the Data Set 
• Organize the data 
• Examine the patterns of distribution 
• Describe patterns of distribution 
• Asses the variability of the data
Simplest Distribution: 
The Frequency Distribution 
• Lists categories of scores or values as 
well as counts of the number of each 
score or value 
– List and tally 
– By computer 
• Enter data 
• Run “frequency”
Two Kinds of Frequency 
• Absolute 
– Number of times a score occurs 
– Symbol: f 
• Relative 
– Proportion of times a score occurs 
– Most commonly percent 
• % = (f/N) X 100 
– f=frequency, N=sum of all frequencies
Frequency Example: 
Blood Pressure (mm Hg) Readings in 
an Anti-Hypertensive Trial – 
Raw Data 
166 160 166 162 168 148 164 174 164 188 
176 170 166 172 168 172 150 190 164 150 
164 146 178 154 166 148 156 164 180 166 
172 170 180 156 162 176 184 166 174 158 
186 158 166 170 168 178 178 154 166 152 
168 160 168 166 152 160 170 146 186 176 
n=60
Frequency Distribution 
x f x f x f 
146 1 162 4 178 2 
148 2 164 5 180 2 
150 2 166 9 182 2 
152 2 168 5 184 1 
154 2 170 4 186 2 
156 2 172 3 188 1 
158 2 174 2 190 1 
160 3 176 2 192 0 
n=60
Relative Frequency (rf) Distribution 
x rf x rf x rf 
146 0.03 162 0.03 178 0.05 
148 0.03 164 0.08 180 0.03 
150 0.03 166 0.15 182 
152 0.03 168 0.08 184 0.02 
154 0.03 170 0.07 186 0.03 
156 0.03 172 0.05 188 0.02 
158 0.03 174 0.03 190 0.01 
160 0.05 176 0.05 192 
n=60
Cumulative Relative Frequency (Cf) 
Distribution 
x Cf x Cf x Cf 
146 0.03 162 0.32 178 0.88 
148 0.07 164 0.40 180 0.91 
150 0.10 166 0.55 182 0.91 
152 0.13 168 0.63 184 0.93 
154 0.17 170 0.70 186 0.96 
156 0.20 172 0.75 188 0.98 
158 0.23 174 0.78 190 1 
160 0.28 176 0.83 192 
n=60
Grouped Frequency Distribution 
• Values are grouped into intervals 
– Class intervals are all the same size 
– Class intervals are mutually exclusive 
– Useful when data is dispersed 
• Or there are restrictions on “small cell size” 
– For example: HIV/AIDS reporting 
– Loss of information with grouping 
• Anytime one moves from the individual level to 
group level
Grouped Frequency Distribution 
Interval f 
<150mm Hg 4 
150-158mm Hg 10 
160-168mm Hg 24 
170-178mm Hg 15 
180-188mm Hg 6 
≥188mm Hg 1 n=60
Displaying Data 
• Tables 
• Bar graphs 
• Pie charts 
• Histograms 
• Frequency Polygons 
– aka Line charts/graphs
Bar Graph 
• Used primarily for nominal and ordinal 
data 
• Values across the X axis 
• Frequencies along the Y axis
Bar Graph of Hypertension Data 
10 
9 
8 
7 
subjects 
6 
5 
of number 4 
3 
2 
1 
0 
mm Hg 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 
(Generated in Excel)
Histogram 
• Like a bar graph 
• Used for continuous (interval or ratio) data 
– Rarely seen even for interval or ratio data 
– Not offered as an option in Excel 
• Bars touch 
• May use grouped data
Bar Chart and Histogram 
Count (Generated in SPSS) 
150 160 170 180 190 
BP1 
14 
12 
10 
8 
6 
4 
2 
0 
Frequency 
Mean = 166.43 
Std. Dev. = 10.692 
N = 60 
146 
148 
150 
152 
154 
156 
158 
160 
162 
164 
16 
168 
170 
172 
174 
176 
178 
180 
184 
186 
18 
190 
BP1 
10 
8 
6 
4 
2 
0
Frequency Polygon 
(aka Line Graph) 
• Used for interval and ratio data 
• X and Y axes the same as for bar charts 
• Marker placed at intersection of the value 
and frequency for a series of values 
• Markers then connected with a line
Frequency Polygon of 
Hypertension Data 
10 
9 
8 
7 
6 
5 
4 
3 
2 
1 
0 
146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 
mm Hg 
number of subjects
Effective Graphical Display 
• Should accurately represent data 
• Should be easily understood 
– Not too busy or complicated 
• Should stand alone 
– Ideal and rare
Distribution Shapes – 5 Basics 
• Modality 
• Symmetry and Skewness 
• Kurtosis 
• Central Tendency 
• Variability
Modality – Basic Shape 
• Peaks or high points 
in the data 
• May have one or 
multiple peaks 
– Unimodal = 1 peak 
– Bimodal = 2 peaks 
– Multimodal = multiple 
peaks
Symmetry 
• Symmetrical 
– If you draw a line through the center it 
produces mirror images 
– In real life: approximately the same 
distribution on either side of the center line 
• Asymmetrical 
– Distribution is lopsided or skewed
Asymmetric Distribution: Skewness 
• Affected by outliers 
• Positive 
– The “tail” points to the 
right (positive 
direction) 
• Negative 
– The “tail” points to the 
left (negative direction)
Kurtosis 
• Assumes symmetric 
distribution 
• Refers to how pointy the 
peak of the distribution is 
– How concentrated in the 
middle of the distribution 
• Platykurtic 
– Low, flattened peak 
• Leptokurtotic 
– High narrow peak
The Normal Distribution 
• Unimodal 
• Symetrical 
• Peak is neither high nor flat 
• “Bell-shaped curve” 
• The ideal distribution 
– And therefore “normal”
Looking at Frequency Distributions 
• Learn about the data set 
• Clean the data 
• Identify missing values 
• Test assumptions 
– About the distribution 
• Answer research questions 
– About the distribution
Quartiles 
• Calculated by dividing data into quarters 
– The median is the 2nd quartile 
• Quartile 1 is the point at which 25% of 
values are below and 75% of values are 
above 
• Quartile 3 is the point at which 75% of 
values are below and 25% of values are 
above
Part 2 
Describing and Displaying Data 
Measures of Central Tendency 
Univariate Statistics
Measures of Central Tendency 
• Tells you about the area of the distribution 
where the bulk of values fall 
• Measures include: 
– Mean 
– Median 
– Mode
Mode 
• The value that occurs most often 
• Limitations 
– Data may be multimodal 
– Mode can vary from one sample to another in 
the same population 
• Considered unstable
Median (Mdn) 
• Point that divides the distribution in half 
• Corresponds to the 50th percentile 
• 50% will be below the median and 50% 
above it 
• If the number of scores is odd 
– Median is the number exactly in the middle 
• If the number of scores is even 
– Median is the average of the 2 middle numbers
Median (Mdn) 
• Measures the location of the middle of 
the distribution 
• Not sensitive to actual numerical values 
– Not affected by outliers
Mean 
• Most common measure of central tendency 
• Most stable, provides the most accurate 
estimate 
– Assuming a normal distribution 
• Calculated by adding all values and dividing 
by the number of cases 
– aka average 
– Best understood by the general public
Mean 
• Affected by each value in the distribution 
• Intended for interval or ratio data 
– In some designs can be used for ordinal 
• The sum of the deviation scores from the 
mean always equals 0 
• Abbreviated x for samples 
– X for Population
Mean, Median, and Mode 
• Mean is preferred in a normal distribution 
– Extreme scores or outliers can result in a 
mean that doesn’t reflect central tendency 
• Skewed data 
• With skewed data, or extreme outliers use 
median 
– Example: Median home price
x, Mdn, Mode – Hypertension 
Study 
x f x f x f 
146 1 162 4 178 2 
148 2 164 5 180 2 
150 2 166 9 182 2 
152 2 168 5 184 1 
154 2 170 4 186 2 
156 2 172 3 188 1 
158 2 174 2 190 1 
160 3 176 2 192 0 
Sum of values (x) = Σ(x) = 9,989 
Number of cases = n = 60 
Mode = 166 
Mdn = 166 
Mean = 9991/60 
= 166.5
Quickly Assessing Distribution 
• If the mean, median, and mode are similar 
– Approximately normally distributed 
• If the median>mean 
– Negatively skewed 
• If the median<mean 
– Positively skewed 
• The mean is pulled in the direction of the 
skew
Effect of Skew 
MMooddee 
MMeeddiiaann 
MMeeaann 
MMooddee 
MMeeddiiaann 
MMeeaann 
TThhee mmeeaann iiss ppuulllleedd iinn tthhee ddiirreeccttiioonn 
ooff tthhee sskkeeww
Variability 
• Refers to how spread out the scores are in 
a distribution 
• Two distributions with the same mean can 
differ greatly in variability 
– Homogeneous: values are similar 
– Heterogeneous: values with more variability 
• Measures: 
– Range/Semiquartile Range 
– Variance 
– Standard Deviation
Variability – Spread or Dispersion 
10 
9 
8 
7 
subjects 
6 
5 
of number 4 
3 
2 
1 
0 
mm Hg 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
Range 
• Simplest of the measures of variability 
• Difference between the lowest and highest 
values in a distribution 
– 190-146 = 44 
• Sometimes reported as a minimum and 
maximum value 
– Range 146-190
Range 
• Limitations 
– Based on only 2 values, highest and lowest 
• Can be unstable when multiple samples are taken 
from the same population 
• Doesn’t tell you anything about what is happening 
in the middle 
– As the sample size increases, range is likely 
to increase 
• Greater chance of outlier
Standard Deviation - 
s, SD, Std Dev 
• A measure of how far values vary from the 
mean of a given sample 
– Tells you the average deviation 
– How much the scores deviate from the mean 
• Most widely used measure of variability 
• Takes into consideration every score in 
the distribution
Standard Deviation 
Standard deviation = s = Σ (X-mean)2 
N-1 
X X-mean (X-mean)2 
7 7-4=3 9 
5 5-4=1 1 
4 4-4=0 0 
3 3-4= -1 1 
1 1-4= -3 9 
N=5 Σ 20 
Mean=4 
s= 20 
5-1 
= √ 5 = 2.24
Standard Deviation 
• Taking the square root returns the value of 
the standard deviation to the original scale 
• The lower the standard deviation, the 
better measure the mean is as a summary 
of the data 
– The less variability there is among the scores
Variance 
• Simply s2 
• The standard deviation calculation before 
the square root is taken and is equal to: 
Σ (X-mean)2 / N-1
Standard Deviation - uses 
• Useful when looking at a single score in 
relation to a distribution 
• Normal Distribution 
– There are about 3 SD above and below the 
mean 
– A fixed percent of scores lie within each SD: 
• 68% within 1 SD above and below the mean 
• 95% within 2 SD above and below the mean 
• 99.7% within 3 SD above and below the mean
Normal Distribution with SD = 15 
and mean = 100 
2.5% 13.5% 34% 34% 13.5% 2.5% 
70 85 100 115 130 
68% 
95%
Standard Scores 
• Scores that represent relative distance 
from mean. Measures of position. 
Z= X-X/SD 
• Raw score minus mean divided by SD: 
gives score in SD units 
• Z score is # of SDs a given value of ‘X’ is 
away from mean. Z score of 1 is 1 SD 
above mean. 
• Z Distribution has mean = 0 and SD = 1
Standard Scores – Z scores 
• Allow for the standardization (in SD units) 
of values in a distribution relative to the 
mean 
• Standard Score Z = (x-x)/SD 
• Number of SD a given value of x is from 
the mean 
– Z score of 1 is 1 SD above the mean 
• Z distribution has mean=0 and SD=1
55 60 65 85 100 115 
Z Score = Z Score = 
-1 0 +1 
Score - Mean 
--------------------- 
SD 
Z Distribution 
Score - Mean 
--------------------- 
SD
Normal Z Distribution (Mean = 0, SD = 1) 
2.5% 13.5% 34% 34% 13.5% 2.5% 
-2 -1 0 +1 +2 
68% 
95%
Normal Distribution/Z Scores 
• The entire percent under the curve is 
100% 
– Probability of being somewhere under the 
curve is 100% 
• Most values will lie in the middle 
• Out at the ends we become less sure 
– Is a value out at 1% really representative?
Using Normal Distributions/ 
Z Scores 
• Transformation 
– The z score can be transformed to reset the 
mean and SD 
• Transformed Z = 10(Z) + 50 
– Now mean = 50 and SD=10 
• P-value 
– Likelihood of a given value falling at a 
particular point on the curve 
– We will come back to this
Using Normal Distributions/ 
Z Scores 
• Z score can tell you the probability of a 
value falling into a given area of the curve 
– Get z score 
– Match to % 
• Z= 2 corresponds to 95% 
– Gives the probability of the value being the 
true mean 
– Z-score tables
Parameters vs Statistics 
• Parameters 
– Computed for populations 
– Greek symbols used 
• μ =mean, σ = std dev 
• Statistics 
– Computed for samples 
– English symbols used 
• X =mean, s = std dev
Computers and 
Measures of Central Tendency 
• Statistical software is in widespread use 
but… 
• The operator (you) must be aware of 
levels of measurement etc 
– The computer doesn’t know 
– Have to choose the right method for type of 
data
“Bivariate” Statistics
“Bivariate” Statistics 
• Used to describe the relationship between 
2 variables (bi-variate) 
– 2 nominal variables 
– 1 nominal, 1 ratio/interval 
– 2 ratio/interval
Crosstabulation 
• Results in a contingency table 
– 2 dimensional frequency distribution 
• The simplest: 2X2 
– 2 nominal or ordinal variables 
• One heading columns 
• One heading rows
Crosstabulation - Example 
High 
School 
College total 
<$30,000 64 19 83 
≥$30,000 36 81 117 
total 100 100 200
Comparison of Group Means 
• Nominal & Interval or Ratio Variable 
• IV: nominal or ordinal 
– Sex, ethnicity, age group etc 
• DV: interval or ratio 
– Heart rate, BP, weight etc 
• Means and SD calculated for each category 
of the IV 
• NOTE: NO INFERENCE is made about 
significance of difference between categories
Comparison of Group Means 
education n mean SD Min Max 
High School 100 24,657 2,598 10,103 75,362 
College 100 36,431 7,912 15,256 126,754 
total 200 31,989 6,110 10,103 126,754
Correlation 
• A linear relationship between 2 variables 
– Interval or ratio variables 
• Can be plotted and displayed graphically 
– Scatter plot 
• Can be calculated statistically 
– Correlation coefficient 
– r
Scatterplot 
• Values for one variable on X axis 
• Values for the other variable on Y axis 
• Data plotted for each subject/case 
• Examine the plot for pattern 
– Data arrayed closely together indicates strong 
correlation
Positive Correlation 
• As one variable 
increases in value, so 
does the other 
• On the plot: 
– Diagonal line upwards 
and to the right 
• Example: 
– Age and BP
Negative Correlation 
• As one value 
increases the other 
decreases 
• On the plot: 
– Diagonal line down 
and to the right 
• Example: 
– Age and bone density
Scattered Scatterplots 
• Indicate little or no 
relationship between 
variables 
• Can be dispersed or 
concentrated
Non-linear Scatterplots 
• There is a relationship 
but… 
• Some relationships 
are not linear…. 
• May be curved 
– S 
– U 
– Up then flat 
– others
Outliers on Scatterplots 
• Scatterplots can also 
help identify where 
outliers are
Correlation Coefficient - r 
• Statistical measure used to 
– Determine if a relationship exists between two 
variables 
– Test a hypothesis about that relationship 
• Allows us to make a mathematical 
statement about the relationship 
– Do the variables vary together? 
• AKA Pearson Correlation Coefficient
Correlation Coefficient - 
Assumptions 
• Sample must accurately representative 
• The distributions must be approximately 
normal 
• Each value of X must have a 
corresponding value of Y 
– If many have X value but not Y value, analysis 
will be strongly biases
Correlation 
r = n(Σxy) – (Σx)(Σy) 
[n(Σx2) - (Σx)2 ] [n(Σy2) - (Σy)2 ] 
= cov(X,Y) 
var(X) x var(Y)
Correlation Example 
x y xy x2 y2 
8 -2 -16 64 4 
4 2 8 16 4 
5 1 5 25 1 
-1 6 -6 1 36 
1 4 4 1 16 
2 3 6 4 9 
6 -1 -6 36 1 
åx=25 åy=13 åxy=-5 åx2=147 åy2=71 
r= 7(-5) – (25)(13) 
[7(147)-(25)2] x [7(71)-(13)2] 
= -0.989
Correlation Coefficient 
• Range 
– -1 to 1 
• Positive correlation 
– 0 to1 
• Negative 
correlation 
– -1 to 0 
• The closer to each 
of these the 
stronger the 
correlation 
– -0.9: strong 
negative 
– -0.2: weak negative 
– 0: none 
– 0.2: weak positive 
– 0.9 strong positive
Correlation Coefficient – 
Significance 
• Depends on number of pairs 
• Varies for each r 
– r of 0.3 may be significant for n=1500 but not 
for n=40 
• Also depends on variance (SD) 
– Greater the variance, less significance 
• Generally: 
– 0.60 or –0.60 is strong for medical variables 
• Manufacturing requires 0.90 or greater
The Scatterplot and r 
• ALWAYS look at the 
scatterplot along with 
r 
• Each of these plots 
has r=0.70
Correlation 
• The square of the correlation coefficient, 
R2, indicates the variability in one variable 
that can be explained by the other 
– Example: age and BP 
• R2 = 0.49 (r=0.70) 
• 49% of the variation in BP is explained by age 
– aka Coefficient of Determination 
• Correlation does NOT imply causation

More Related Content

What's hot

Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
donthuraj
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
EhealthMoHS
 
Fundamentals of biostatistics
Fundamentals of biostatisticsFundamentals of biostatistics
Fundamentals of biostatisticsKingsuk Sarkar
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
DrZahid Khan
 
Conceptual foundations statistics and probability
Conceptual foundations   statistics and probabilityConceptual foundations   statistics and probability
Conceptual foundations statistics and probabilityAnkit Katiyar
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
Pritam Gupta
 
Biostats in ortho
Biostats in orthoBiostats in ortho
Biostats in ortho
Raunak Manjeet
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
santhoshikayithi
 
How to Analyse Data
How to Analyse DataHow to Analyse Data
How to Analyse Data
Amit Sharma
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
Indian dental academy
 
Chapter 1 usagpan statistics
Chapter 1 usagpan statisticsChapter 1 usagpan statistics
Chapter 1 usagpan statisticsarmygas
 
11. data management
11. data management11. data management
11. data management
Ashok Kulkarni
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAmira Talic
 
Introduction to biostatistic
Introduction to biostatisticIntroduction to biostatistic
Introduction to biostatistic
Joshua Anish
 
biostatistics
biostatisticsbiostatistics
biostatistics
Mehul Shinde
 
biostatstics :Type and presentation of data
biostatstics :Type and presentation of databiostatstics :Type and presentation of data
biostatstics :Type and presentation of datanaresh gill
 
Introduction to basics of bio statistics.
Introduction to basics of bio statistics.Introduction to basics of bio statistics.
Introduction to basics of bio statistics.
AB Rajar
 

What's hot (20)

Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
 
Fundamentals of biostatistics
Fundamentals of biostatisticsFundamentals of biostatistics
Fundamentals of biostatistics
 
Statistics
StatisticsStatistics
Statistics
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Conceptual foundations statistics and probability
Conceptual foundations   statistics and probabilityConceptual foundations   statistics and probability
Conceptual foundations statistics and probability
 
Intro to statistics
Intro to statisticsIntro to statistics
Intro to statistics
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Biostats in ortho
Biostats in orthoBiostats in ortho
Biostats in ortho
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
 
How to Analyse Data
How to Analyse DataHow to Analyse Data
How to Analyse Data
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Chapter 1 usagpan statistics
Chapter 1 usagpan statisticsChapter 1 usagpan statistics
Chapter 1 usagpan statistics
 
Bio statistics1
Bio statistics1Bio statistics1
Bio statistics1
 
11. data management
11. data management11. data management
11. data management
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Introduction to biostatistic
Introduction to biostatisticIntroduction to biostatistic
Introduction to biostatistic
 
biostatistics
biostatisticsbiostatistics
biostatistics
 
biostatstics :Type and presentation of data
biostatstics :Type and presentation of databiostatstics :Type and presentation of data
biostatstics :Type and presentation of data
 
Introduction to basics of bio statistics.
Introduction to basics of bio statistics.Introduction to basics of bio statistics.
Introduction to basics of bio statistics.
 

Viewers also liked

ECT presentation fmp copy
ECT presentation fmp copyECT presentation fmp copy
ECT presentation fmp copy
Howard Realubit
 
Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT)Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT)tale270
 
Electroconvulsive therapy and its present status
Electroconvulsive therapy and its present statusElectroconvulsive therapy and its present status
Electroconvulsive therapy and its present status
Subrata Naskar
 
cystic fibrosis
cystic fibrosiscystic fibrosis
cystic fibrosis
WahidahPuteriAbah
 
Cystic fibrosis presentation
Cystic fibrosis presentationCystic fibrosis presentation
Cystic fibrosis presentationparulshrestha
 
Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT) Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT) Meril Manuel
 

Viewers also liked (8)

ECT presentation fmp copy
ECT presentation fmp copyECT presentation fmp copy
ECT presentation fmp copy
 
Ect
EctEct
Ect
 
Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT)Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT)
 
Electroconvulsive therapy and its present status
Electroconvulsive therapy and its present statusElectroconvulsive therapy and its present status
Electroconvulsive therapy and its present status
 
cystic fibrosis
cystic fibrosiscystic fibrosis
cystic fibrosis
 
Cystic fibrosis
Cystic fibrosisCystic fibrosis
Cystic fibrosis
 
Cystic fibrosis presentation
Cystic fibrosis presentationCystic fibrosis presentation
Cystic fibrosis presentation
 
Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT) Electroconvulsive Therapy (ECT)
Electroconvulsive Therapy (ECT)
 

Similar to Bst322week1

Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
DiptoKumerSarker1
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
Kern Rocke
 
Common Statistical Terms - Biostatistics - Ravinandan A P.pdf
Common Statistical Terms - Biostatistics - Ravinandan A P.pdfCommon Statistical Terms - Biostatistics - Ravinandan A P.pdf
Common Statistical Terms - Biostatistics - Ravinandan A P.pdf
Ravinandan A P
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
Georgios Ath. Kounis
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
 
01 introduction stat
01 introduction stat01 introduction stat
01 introduction statantaraar2009
 
Intro_BiostatPG.ppt
Intro_BiostatPG.pptIntro_BiostatPG.ppt
Intro_BiostatPG.ppt
victor431494
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
MuddaAbdo1
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
Manoj Sharma
 
5 numerical descriptive statitics
5 numerical descriptive statitics5 numerical descriptive statitics
5 numerical descriptive statitics
Penny Jiang
 
Final Lecture - 1.ppt
Final Lecture - 1.pptFinal Lecture - 1.ppt
Final Lecture - 1.ppt
ssuserbe1d97
 
Hm306 week 4
Hm306 week 4Hm306 week 4
Hm306 week 4
BHUOnlineDepartment
 
Hm306 week 4
Hm306 week 4Hm306 week 4
Hm306 week 4
BealCollegeOnline
 
Week 2 measures of disease occurence
Week 2  measures of disease occurenceWeek 2  measures of disease occurence
Week 2 measures of disease occurence
Hamdi Alhakimi
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
MuhammadUsman653449
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Neny Isharyanti
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
Neny Isharyanti
 
Inferential statistics quantitative data - single sample and 2 groups
Inferential statistics   quantitative data - single sample and 2 groupsInferential statistics   quantitative data - single sample and 2 groups
Inferential statistics quantitative data - single sample and 2 groups
Dhritiman Chakrabarti
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
TripthiDubey
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
Ali Al Mousawi
 

Similar to Bst322week1 (20)

Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
Introduction to Data Management in Human Ecology
Introduction to Data Management in Human EcologyIntroduction to Data Management in Human Ecology
Introduction to Data Management in Human Ecology
 
Common Statistical Terms - Biostatistics - Ravinandan A P.pdf
Common Statistical Terms - Biostatistics - Ravinandan A P.pdfCommon Statistical Terms - Biostatistics - Ravinandan A P.pdf
Common Statistical Terms - Biostatistics - Ravinandan A P.pdf
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
01 introduction stat
01 introduction stat01 introduction stat
01 introduction stat
 
Intro_BiostatPG.ppt
Intro_BiostatPG.pptIntro_BiostatPG.ppt
Intro_BiostatPG.ppt
 
Introduction to statistics.pptx
Introduction to statistics.pptxIntroduction to statistics.pptx
Introduction to statistics.pptx
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
5 numerical descriptive statitics
5 numerical descriptive statitics5 numerical descriptive statitics
5 numerical descriptive statitics
 
Final Lecture - 1.ppt
Final Lecture - 1.pptFinal Lecture - 1.ppt
Final Lecture - 1.ppt
 
Hm306 week 4
Hm306 week 4Hm306 week 4
Hm306 week 4
 
Hm306 week 4
Hm306 week 4Hm306 week 4
Hm306 week 4
 
Week 2 measures of disease occurence
Week 2  measures of disease occurenceWeek 2  measures of disease occurence
Week 2 measures of disease occurence
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Inferential statistics quantitative data - single sample and 2 groups
Inferential statistics   quantitative data - single sample and 2 groupsInferential statistics   quantitative data - single sample and 2 groups
Inferential statistics quantitative data - single sample and 2 groups
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

Bst322week1

  • 2. Part 1 Introduction to the Course The Nature of Data
  • 3. Why Statistics? • Evidence-based practice! • Research provides evidence for changes in nursing/medical practice – Away from “that’s the way it has always been done”
  • 4. Integral to Research • Question (hypothesis) • Design • Data collection • Analysis • Answer to question – And often more questions asked!
  • 5. Data • Data: – Factual information, especially information organized for analysis or used to reason or make decisions; a fact or proposition used to draw a conclusion or make a decision • The American Heritage® Dictionary of the English Language, Fourth Edition Copyright © 2000 by Houghton Mifflin Company. – Datum: Singular of data • an item of factual information derived from measurement or research
  • 6. Two Types of Data • Qualitative – Non-numeric or narrative information • Example: transcripts of interviews • Maybe “scored” to be made quantitative • Quantitative – Numeric or quantifiable information • Example: weights of kindergartners
  • 7. Variable • A quantity capable of assuming a set of values • A characteristic or attribute of a person, object, etc that varies within a population under study • Examples: – Body temperature, BP, DOB, ABG, weight
  • 8. Independent and Dependent • Independent – The variable assumed to influence the outcome • It is independent of the outcome – In research, the manipulated variable • Dependent – The outcome variable of interest – In research, value assumed to be dependent on the independent variable (by hypothesis)
  • 9. Independent and Dependent • Examples: – What is the effect of smoking on the incidence of lung cancer? – Does high fiber diet reduce the risk of colon cancer? – Does AZT help prevent maternal transmission of HIV?
  • 10. Discrete vs Continuous • Discrete variable: has a finite number of values between two points • Continuous variable: has, in theory, an infinite number of values between two points
  • 11. Discrete vs Continuous • Examples: – Number of children – Body temperature – Hospital readmissions – Chemotherapy sessions – Body weight – DOB
  • 12. Measurement • The assignment of numbers to objects according to specified rules to characterize quantities of some attribute
  • 13. Measurement Rules • Common/familiar/accepted – Temperature, weight, height • Researcher designed – Particularly for new materials/ideas • Coding – The process of transforming raw data into standardized form for processing and analysis
  • 14. Advantages of Measurement • Objectivity – Objective measure can be independently verified by other researchers • Precision – Quantitative measures allow for reasonable precision • Communication – Facilitates communication of data and research
  • 15. Levels of Measurement/ Types of Variables • Nominal • Ordinal • Interval • Ratio
  • 16. Nominal Measurement/Variable • Nominal = Named • Lowest level • Assignment of characteristics into categories – Simply putting into boxes with no meaning of where the boxes fall in a line • Examples – Gender, marital status
  • 17. Ordinal Measurement/Variable • Ordinal=Order • Next in the hierarchy of measurement • Involves rank order of variable along some dimension • Examples – School grades – Clinical nursing levels
  • 18. Interval Measurement/Variable • Interval=equal distances • Attribute is rank-ordered on a scale that has equal distances between points on that scale • Examples – Temperature
  • 19. Ratio Measurement/Variables • Equal distances between score units and which has a true, meaningful zero point – A true ratio can be calculated • The highest level of measurement • Examples – Weight – Pulse
  • 20. Why care about type of measurement/variable? • Statistical tests are/have been developed to work and provide meaningful analysis for specific types of measurement and variable • The tests you choose to run should be based, in part, on the type of variables with which you work
  • 21. Which measurement? • A single variable may be measurable using different types of measurement • Rule of Thumb: use the highest level of measurement possible – Higher levels provide more information – Higher levels can be analyzed with more powerful statistical tools
  • 22. Data Analysis • Data starts out “raw” – unanalyzed • Processing – Coding, if appropriate – Data entry • Into database or matrix – Cleaning • Finding and correcting (if possible) errors in entry and coding – Analysis
  • 23. Sample vs Population • Sample – A subset of a population – Ideally selected to be representative of the population • Population – The entire set of individuals (objects, units, etc) having common characteristics
  • 24. Two Types of Statistics • Descriptive – Used to describe and summarize data set – Allows us to describe, compare, determine a relationship – Usually straightforward - %, averages, etc • Inferential – Permit us to infer whether a relationship observed in a sample is likely to occur in the population of concern – Are relationships “real”?
  • 25. Uses of Inferential Statistics • Draw conclusions about a single variable in a population • Evaluate relationships between variables in populations • Are the relationships “real”?
  • 26. Inferential Stats: Relationships • Existence – Is there a relationship between X and Y? • Magnitude – How strong is the relationship between X and Y? • Nature – What type of relationship is there between X and Y?
  • 27. Number of variables… • “Univariate” – One variable being described • “Bivariate” – Two variables being compared • NOTE: in epidemiology, this is also known as “univariate” • Mulitvariate – More than two variables being compared • Different statistical tests for each
  • 28. Purposes of Data Analysis • In research all usually get done to some extent – Clean data – Sample description – Assessment of bias – Evaluation of tools used to collect data – Evaluation of need for data transformations – Address the research question
  • 29. Describing the Data Set • Organize the data • Examine the patterns of distribution • Describe patterns of distribution • Asses the variability of the data
  • 30. Simplest Distribution: The Frequency Distribution • Lists categories of scores or values as well as counts of the number of each score or value – List and tally – By computer • Enter data • Run “frequency”
  • 31. Two Kinds of Frequency • Absolute – Number of times a score occurs – Symbol: f • Relative – Proportion of times a score occurs – Most commonly percent • % = (f/N) X 100 – f=frequency, N=sum of all frequencies
  • 32. Frequency Example: Blood Pressure (mm Hg) Readings in an Anti-Hypertensive Trial – Raw Data 166 160 166 162 168 148 164 174 164 188 176 170 166 172 168 172 150 190 164 150 164 146 178 154 166 148 156 164 180 166 172 170 180 156 162 176 184 166 174 158 186 158 166 170 168 178 178 154 166 152 168 160 168 166 152 160 170 146 186 176 n=60
  • 33. Frequency Distribution x f x f x f 146 1 162 4 178 2 148 2 164 5 180 2 150 2 166 9 182 2 152 2 168 5 184 1 154 2 170 4 186 2 156 2 172 3 188 1 158 2 174 2 190 1 160 3 176 2 192 0 n=60
  • 34. Relative Frequency (rf) Distribution x rf x rf x rf 146 0.03 162 0.03 178 0.05 148 0.03 164 0.08 180 0.03 150 0.03 166 0.15 182 152 0.03 168 0.08 184 0.02 154 0.03 170 0.07 186 0.03 156 0.03 172 0.05 188 0.02 158 0.03 174 0.03 190 0.01 160 0.05 176 0.05 192 n=60
  • 35. Cumulative Relative Frequency (Cf) Distribution x Cf x Cf x Cf 146 0.03 162 0.32 178 0.88 148 0.07 164 0.40 180 0.91 150 0.10 166 0.55 182 0.91 152 0.13 168 0.63 184 0.93 154 0.17 170 0.70 186 0.96 156 0.20 172 0.75 188 0.98 158 0.23 174 0.78 190 1 160 0.28 176 0.83 192 n=60
  • 36. Grouped Frequency Distribution • Values are grouped into intervals – Class intervals are all the same size – Class intervals are mutually exclusive – Useful when data is dispersed • Or there are restrictions on “small cell size” – For example: HIV/AIDS reporting – Loss of information with grouping • Anytime one moves from the individual level to group level
  • 37. Grouped Frequency Distribution Interval f <150mm Hg 4 150-158mm Hg 10 160-168mm Hg 24 170-178mm Hg 15 180-188mm Hg 6 ≥188mm Hg 1 n=60
  • 38. Displaying Data • Tables • Bar graphs • Pie charts • Histograms • Frequency Polygons – aka Line charts/graphs
  • 39. Bar Graph • Used primarily for nominal and ordinal data • Values across the X axis • Frequencies along the Y axis
  • 40. Bar Graph of Hypertension Data 10 9 8 7 subjects 6 5 of number 4 3 2 1 0 mm Hg 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 (Generated in Excel)
  • 41. Histogram • Like a bar graph • Used for continuous (interval or ratio) data – Rarely seen even for interval or ratio data – Not offered as an option in Excel • Bars touch • May use grouped data
  • 42. Bar Chart and Histogram Count (Generated in SPSS) 150 160 170 180 190 BP1 14 12 10 8 6 4 2 0 Frequency Mean = 166.43 Std. Dev. = 10.692 N = 60 146 148 150 152 154 156 158 160 162 164 16 168 170 172 174 176 178 180 184 186 18 190 BP1 10 8 6 4 2 0
  • 43. Frequency Polygon (aka Line Graph) • Used for interval and ratio data • X and Y axes the same as for bar charts • Marker placed at intersection of the value and frequency for a series of values • Markers then connected with a line
  • 44. Frequency Polygon of Hypertension Data 10 9 8 7 6 5 4 3 2 1 0 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190 mm Hg number of subjects
  • 45. Effective Graphical Display • Should accurately represent data • Should be easily understood – Not too busy or complicated • Should stand alone – Ideal and rare
  • 46. Distribution Shapes – 5 Basics • Modality • Symmetry and Skewness • Kurtosis • Central Tendency • Variability
  • 47. Modality – Basic Shape • Peaks or high points in the data • May have one or multiple peaks – Unimodal = 1 peak – Bimodal = 2 peaks – Multimodal = multiple peaks
  • 48. Symmetry • Symmetrical – If you draw a line through the center it produces mirror images – In real life: approximately the same distribution on either side of the center line • Asymmetrical – Distribution is lopsided or skewed
  • 49. Asymmetric Distribution: Skewness • Affected by outliers • Positive – The “tail” points to the right (positive direction) • Negative – The “tail” points to the left (negative direction)
  • 50. Kurtosis • Assumes symmetric distribution • Refers to how pointy the peak of the distribution is – How concentrated in the middle of the distribution • Platykurtic – Low, flattened peak • Leptokurtotic – High narrow peak
  • 51. The Normal Distribution • Unimodal • Symetrical • Peak is neither high nor flat • “Bell-shaped curve” • The ideal distribution – And therefore “normal”
  • 52. Looking at Frequency Distributions • Learn about the data set • Clean the data • Identify missing values • Test assumptions – About the distribution • Answer research questions – About the distribution
  • 53. Quartiles • Calculated by dividing data into quarters – The median is the 2nd quartile • Quartile 1 is the point at which 25% of values are below and 75% of values are above • Quartile 3 is the point at which 75% of values are below and 25% of values are above
  • 54. Part 2 Describing and Displaying Data Measures of Central Tendency Univariate Statistics
  • 55. Measures of Central Tendency • Tells you about the area of the distribution where the bulk of values fall • Measures include: – Mean – Median – Mode
  • 56. Mode • The value that occurs most often • Limitations – Data may be multimodal – Mode can vary from one sample to another in the same population • Considered unstable
  • 57. Median (Mdn) • Point that divides the distribution in half • Corresponds to the 50th percentile • 50% will be below the median and 50% above it • If the number of scores is odd – Median is the number exactly in the middle • If the number of scores is even – Median is the average of the 2 middle numbers
  • 58. Median (Mdn) • Measures the location of the middle of the distribution • Not sensitive to actual numerical values – Not affected by outliers
  • 59. Mean • Most common measure of central tendency • Most stable, provides the most accurate estimate – Assuming a normal distribution • Calculated by adding all values and dividing by the number of cases – aka average – Best understood by the general public
  • 60. Mean • Affected by each value in the distribution • Intended for interval or ratio data – In some designs can be used for ordinal • The sum of the deviation scores from the mean always equals 0 • Abbreviated x for samples – X for Population
  • 61. Mean, Median, and Mode • Mean is preferred in a normal distribution – Extreme scores or outliers can result in a mean that doesn’t reflect central tendency • Skewed data • With skewed data, or extreme outliers use median – Example: Median home price
  • 62. x, Mdn, Mode – Hypertension Study x f x f x f 146 1 162 4 178 2 148 2 164 5 180 2 150 2 166 9 182 2 152 2 168 5 184 1 154 2 170 4 186 2 156 2 172 3 188 1 158 2 174 2 190 1 160 3 176 2 192 0 Sum of values (x) = Σ(x) = 9,989 Number of cases = n = 60 Mode = 166 Mdn = 166 Mean = 9991/60 = 166.5
  • 63. Quickly Assessing Distribution • If the mean, median, and mode are similar – Approximately normally distributed • If the median>mean – Negatively skewed • If the median<mean – Positively skewed • The mean is pulled in the direction of the skew
  • 64. Effect of Skew MMooddee MMeeddiiaann MMeeaann MMooddee MMeeddiiaann MMeeaann TThhee mmeeaann iiss ppuulllleedd iinn tthhee ddiirreeccttiioonn ooff tthhee sskkeeww
  • 65. Variability • Refers to how spread out the scores are in a distribution • Two distributions with the same mean can differ greatly in variability – Homogeneous: values are similar – Heterogeneous: values with more variability • Measures: – Range/Semiquartile Range – Variance – Standard Deviation
  • 66. Variability – Spread or Dispersion 10 9 8 7 subjects 6 5 of number 4 3 2 1 0 mm Hg 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
  • 67. Range • Simplest of the measures of variability • Difference between the lowest and highest values in a distribution – 190-146 = 44 • Sometimes reported as a minimum and maximum value – Range 146-190
  • 68. Range • Limitations – Based on only 2 values, highest and lowest • Can be unstable when multiple samples are taken from the same population • Doesn’t tell you anything about what is happening in the middle – As the sample size increases, range is likely to increase • Greater chance of outlier
  • 69. Standard Deviation - s, SD, Std Dev • A measure of how far values vary from the mean of a given sample – Tells you the average deviation – How much the scores deviate from the mean • Most widely used measure of variability • Takes into consideration every score in the distribution
  • 70. Standard Deviation Standard deviation = s = Σ (X-mean)2 N-1 X X-mean (X-mean)2 7 7-4=3 9 5 5-4=1 1 4 4-4=0 0 3 3-4= -1 1 1 1-4= -3 9 N=5 Σ 20 Mean=4 s= 20 5-1 = √ 5 = 2.24
  • 71. Standard Deviation • Taking the square root returns the value of the standard deviation to the original scale • The lower the standard deviation, the better measure the mean is as a summary of the data – The less variability there is among the scores
  • 72. Variance • Simply s2 • The standard deviation calculation before the square root is taken and is equal to: Σ (X-mean)2 / N-1
  • 73. Standard Deviation - uses • Useful when looking at a single score in relation to a distribution • Normal Distribution – There are about 3 SD above and below the mean – A fixed percent of scores lie within each SD: • 68% within 1 SD above and below the mean • 95% within 2 SD above and below the mean • 99.7% within 3 SD above and below the mean
  • 74. Normal Distribution with SD = 15 and mean = 100 2.5% 13.5% 34% 34% 13.5% 2.5% 70 85 100 115 130 68% 95%
  • 75. Standard Scores • Scores that represent relative distance from mean. Measures of position. Z= X-X/SD • Raw score minus mean divided by SD: gives score in SD units • Z score is # of SDs a given value of ‘X’ is away from mean. Z score of 1 is 1 SD above mean. • Z Distribution has mean = 0 and SD = 1
  • 76. Standard Scores – Z scores • Allow for the standardization (in SD units) of values in a distribution relative to the mean • Standard Score Z = (x-x)/SD • Number of SD a given value of x is from the mean – Z score of 1 is 1 SD above the mean • Z distribution has mean=0 and SD=1
  • 77. 55 60 65 85 100 115 Z Score = Z Score = -1 0 +1 Score - Mean --------------------- SD Z Distribution Score - Mean --------------------- SD
  • 78. Normal Z Distribution (Mean = 0, SD = 1) 2.5% 13.5% 34% 34% 13.5% 2.5% -2 -1 0 +1 +2 68% 95%
  • 79. Normal Distribution/Z Scores • The entire percent under the curve is 100% – Probability of being somewhere under the curve is 100% • Most values will lie in the middle • Out at the ends we become less sure – Is a value out at 1% really representative?
  • 80. Using Normal Distributions/ Z Scores • Transformation – The z score can be transformed to reset the mean and SD • Transformed Z = 10(Z) + 50 – Now mean = 50 and SD=10 • P-value – Likelihood of a given value falling at a particular point on the curve – We will come back to this
  • 81. Using Normal Distributions/ Z Scores • Z score can tell you the probability of a value falling into a given area of the curve – Get z score – Match to % • Z= 2 corresponds to 95% – Gives the probability of the value being the true mean – Z-score tables
  • 82. Parameters vs Statistics • Parameters – Computed for populations – Greek symbols used • μ =mean, σ = std dev • Statistics – Computed for samples – English symbols used • X =mean, s = std dev
  • 83. Computers and Measures of Central Tendency • Statistical software is in widespread use but… • The operator (you) must be aware of levels of measurement etc – The computer doesn’t know – Have to choose the right method for type of data
  • 85. “Bivariate” Statistics • Used to describe the relationship between 2 variables (bi-variate) – 2 nominal variables – 1 nominal, 1 ratio/interval – 2 ratio/interval
  • 86. Crosstabulation • Results in a contingency table – 2 dimensional frequency distribution • The simplest: 2X2 – 2 nominal or ordinal variables • One heading columns • One heading rows
  • 87. Crosstabulation - Example High School College total <$30,000 64 19 83 ≥$30,000 36 81 117 total 100 100 200
  • 88. Comparison of Group Means • Nominal & Interval or Ratio Variable • IV: nominal or ordinal – Sex, ethnicity, age group etc • DV: interval or ratio – Heart rate, BP, weight etc • Means and SD calculated for each category of the IV • NOTE: NO INFERENCE is made about significance of difference between categories
  • 89. Comparison of Group Means education n mean SD Min Max High School 100 24,657 2,598 10,103 75,362 College 100 36,431 7,912 15,256 126,754 total 200 31,989 6,110 10,103 126,754
  • 90. Correlation • A linear relationship between 2 variables – Interval or ratio variables • Can be plotted and displayed graphically – Scatter plot • Can be calculated statistically – Correlation coefficient – r
  • 91. Scatterplot • Values for one variable on X axis • Values for the other variable on Y axis • Data plotted for each subject/case • Examine the plot for pattern – Data arrayed closely together indicates strong correlation
  • 92. Positive Correlation • As one variable increases in value, so does the other • On the plot: – Diagonal line upwards and to the right • Example: – Age and BP
  • 93. Negative Correlation • As one value increases the other decreases • On the plot: – Diagonal line down and to the right • Example: – Age and bone density
  • 94. Scattered Scatterplots • Indicate little or no relationship between variables • Can be dispersed or concentrated
  • 95. Non-linear Scatterplots • There is a relationship but… • Some relationships are not linear…. • May be curved – S – U – Up then flat – others
  • 96. Outliers on Scatterplots • Scatterplots can also help identify where outliers are
  • 97. Correlation Coefficient - r • Statistical measure used to – Determine if a relationship exists between two variables – Test a hypothesis about that relationship • Allows us to make a mathematical statement about the relationship – Do the variables vary together? • AKA Pearson Correlation Coefficient
  • 98. Correlation Coefficient - Assumptions • Sample must accurately representative • The distributions must be approximately normal • Each value of X must have a corresponding value of Y – If many have X value but not Y value, analysis will be strongly biases
  • 99. Correlation r = n(Σxy) – (Σx)(Σy) [n(Σx2) - (Σx)2 ] [n(Σy2) - (Σy)2 ] = cov(X,Y) var(X) x var(Y)
  • 100. Correlation Example x y xy x2 y2 8 -2 -16 64 4 4 2 8 16 4 5 1 5 25 1 -1 6 -6 1 36 1 4 4 1 16 2 3 6 4 9 6 -1 -6 36 1 åx=25 åy=13 åxy=-5 åx2=147 åy2=71 r= 7(-5) – (25)(13) [7(147)-(25)2] x [7(71)-(13)2] = -0.989
  • 101. Correlation Coefficient • Range – -1 to 1 • Positive correlation – 0 to1 • Negative correlation – -1 to 0 • The closer to each of these the stronger the correlation – -0.9: strong negative – -0.2: weak negative – 0: none – 0.2: weak positive – 0.9 strong positive
  • 102. Correlation Coefficient – Significance • Depends on number of pairs • Varies for each r – r of 0.3 may be significant for n=1500 but not for n=40 • Also depends on variance (SD) – Greater the variance, less significance • Generally: – 0.60 or –0.60 is strong for medical variables • Manufacturing requires 0.90 or greater
  • 103. The Scatterplot and r • ALWAYS look at the scatterplot along with r • Each of these plots has r=0.70
  • 104. Correlation • The square of the correlation coefficient, R2, indicates the variability in one variable that can be explained by the other – Example: age and BP • R2 = 0.49 (r=0.70) • 49% of the variation in BP is explained by age – aka Coefficient of Determination • Correlation does NOT imply causation