Like this presentation? Why not share!

# Chapter1:introduction to medical statistics

## on Dec 07, 2009

• 6,639 views

### Views

Total Views
6,639
Views on SlideShare
6,612
Embed Views
27

Likes
5
0
1

### 5 Embeds27

 http://www.slideshare.net 15 http://mbbs.weebly.com 7 http://www.weebly.com 3 http://mbbs5.weebly.com 1 http://translate.googleusercontent.com 1

### Report content

11 of 1 previous next

• Comment goes here.
Are you sure you want to
• Excellent for the basics !!
Very informative.
Thank you very much.
jayarajg
Are you sure you want to

## Chapter1:introduction to medical statisticsPresentation Transcript

• Introduction to Medical Statistics
• Chapter-1
•
•
• Statistics :
• The discipline concerned with the treatment of numerical data derived from groups of individuals (P. Armitage).
•
• The science and art of dealing with variation in data through collection, classification and analysis in such a way as to obtain reliable results ( JM Last).
• Medical Statistics:
• Application of mathematical statistics in
• the field of medicine
•
• Why we need to study statistics?
• Three reasons:
• (1)Basic requirement of medical research.
• (3)Data management and treatment.
• Basic concepts
• Homogeneity: All individuals have similar characteristics and belong to same category.
• Variation: the differences in height, weight…
1. Homogeneity and Variation
• Population : The whole collection of
• individuals that one intends to study
• ---- Homogeneity but with Variation
• Sample : A representative part of the population.
• Randomization : An important way to make the sample representative.
2. Population and sample
• Random
• By chance!
• Random event : the event may occur or may not occur in one experiment.
• Before one experiment, nobody is sure whether the event occurs or not.
• However, there must be some regulation in a large number of experiments.
• 3. Probability
• Measure the possibility of occurrence of a
• random event.
• A : random event
• P(A) : Probability of the random event A
• P(A)=1 , if an event always occurs.
• P(A)=0, if an event never occurs.
• Estimation of Probability----Frequency
• n : number of observations (large enough)
• m : number of occurrences of random event A
• m / n : relative frequency or frequency of random
• event A
• P(A)  frequency
• 4. Parameter and statistic
• Parameter : A measure of certain property of the
• Population. It is usually presented by Greek letters, such as μ , π ---- usually unknown
• Statistic : A measure of certain property of a sample.
• It is usually presented by Latin letters, such as s , p
• The Basic Steps of Statistical Work
• 1. Design of study
• 2. Collection of data
• 3. Data Sorting
• 4. Data Analysis
• Aim :
• Training statistical thinking
• Learning some skill for dealing with medical
• data
• Focus :
• Essential concepts and statistical thinking
• ---- lectures and practice session
• Skill on computer and statistical software
• ---- practice session
• Practice session
• ---- experiments and discussion
• Text book
• Fang Ji-Qian. Medical Statistics and Computer
• Experiments. Singapore: World Scientific 2005
• Reference book
• 1. 方积乾 . 医学统计学与电脑实验 ( 第三版 ). 上海科学技术出版社 2006
• 2. 方积乾 . 生物医学研究的统计方法 . 北京 : 高等教育出版社 2007
• SPSS 软件下载地址：
• ftp://202.116.102.7/pub/ 统计工具 /spss11
• 课件下载地址：
• ftp://202.116.102.6/ 统计 /2005 全英班
• 电脑开机密码为学号后四位
• Chapter   1 Descriptive Statistics
•
•
•
• Chapter   1 Descriptive Statistics
• Statistics: Statistical description
• Statistical inference
• Statistical description:
• Describes the feature of the sample.
• Main forms: tables, plots and numerical indexes
• 1.1 Variables and Data
• 1.1.1 Structure and feature of data
• (1) Basic observed unit
• A patient is defined as an observed unit.
• (2) Recording item
• Group : treatment
• Response variables ( 反应变量 ) :
• systolic pressure, diastolic, pressure, ECG and
• effectiveness
• Covariates ( 伴随变量 ) : age and gender
• Variables: describe the properties of individuals.
• Different types of variables  statistical methods
• 1.1.2 Types of variables
• 1. Quantitative Variable ( 定量变量 )
• Continuous variable ( 连续变量 )
• Values obtained through measurement : height,
• weight, blood pressure, pulse and …
• Taking values in a continuous interval.
• Discrete variable ( 离散变量 )
• Taking values in a set of integers.
• Example 1.1 The variable for gender can be
• defined with a binary variable X .
• Binary variable is a simplest special case of it.
• 2. Qualitative Variable ( 定性变量 )
• Categorical variable ( 分类变量 ) :
• Taking “values” within several possible categories, such as Gender (male, female), Occupation.
• Ordinal variable ( 有序变量 ) :
• There exists order among all possible categories, such as education (primary school, high school, university, postgraduate)
• 1.2 Frequency Table and Histogram
• Useful for description of sample data
• Intuitive basement of probability distribution.
• 1.2.1 Frequency table
• 1. Discrete-type frequency table
• Table 1.3 The frequency table for occupation of 108 patients
• Table 1.4 The frequency table for the results of certain semi-quantitative test among 150 patients
• 2. Continuous type frequency table
• Example 1.3 120 normal male adults were randomly selected from
• the residents of a county. Their red cell counts (1012 /L) were
• observed and listed as the follows:
• 5.12 5.13 4.58 4.31 4.09 4.41 4.33 4.58 4.24 5.45 4.32 4.84
• 4.91 5.14 5.25 4.89 4.79 4.90 5.09 4.04 5.14 5.46 4.66 4.20
• 4.21 3.73 5.17 5.79 5.46 4.49 4.85 5.28 4.78 4.32 4.94 5.21
• 4.68 5.09 4.68 4.91 5.13 5.26 3.84 4.17 4.56 3.52 6.00 4.05
• 4.92 4.87 4.28 4.46 5.03 5.69 5.25 4.56 5.53 4.58 4.86 4.97
• 4.70 4.28 4.37 5.33 4.78 4.75 5.39 5.27 4.89 6.18 4.13 5.22
• 4.44 4.13 4.43 4.02 5.86 5.12 5.36 3.86 4.68 5.48 5.31 4.53
• 4.83 4.11 3.29 4.18 4.13 4.06 3.42 4.68 4.52 5.19 3.70 5.51
• 4.64 4.92 4.93 4.90 3.92 5.04 4.70 4.54 3.95 4.40 4.31 3.77
• 4.16 4.58 5.35 3.71 5.27 4.52 5.21 4.37 4.80 4.75 3.86 5.69
• Please try to establish a frequency table for this set of data.
• (1) Range R
• maximum= 6.18,
• minimum=3.29,
• range R =6.18 － 3.29=2.89.
• (2) Length of sub-intervals i
• Divide the whole range into 8-15 sub-intervals.
• R /10=2.89/10= 0.289≈ 0.30
• then let i =0.30.
• 1.2.2 Frequency plot and histogram
• 1. Frequency plot for discrete variable
• – bar chart
• 2. Frequency plot for continuous variable – histogram
• 1.3 Measurement for average level
• Numerical characteristics ( 数字特征 ): Average level ( 平均水平 ) Variation ( 变异 )
• 1.3.1 Arithmetic mean ( 算术均数 )
• Useful when the histogram looks symmetric.
• Denote the observed values of the individuals with
• , the arithmetic mean
• (1.1)
• 1.3.2 Geometric mean ( 几何均数 )
• It is useful when the histogram of the logarithms
• is close to symmetric.
• Example The concentrations of certain antibody are measured for a set of sample and the
• corresponding titers are 4, 8, 16, 16, 64, 128.
• Arithmetic mean = 39.3
• Geometric mean = 20.16
• 1.3.3 Median ( 中位数 )
• When the histogram shows skew, the median can
• be applied to measure the average level.
• Median = the value in the middle
• Example1 Data set {1,1,2,2, 3 ,4,6,9,10}
• Median = 3
• Example2 Data set {1,1,2,2, 3,4 ,6,9,10,13}
• Median = ( 3 + 4 )/2=3.5
• When n is odd,
• Median = the observed value with rank ( n +1)/2
• When n is even,
• Median =  values with rank n /2+ values with rank ( n +1)/2  2
• How to calculate P percentile ( 百分位数 )?
• P 25 ?
• P 75 ?
• 1.4 Measurement for Variation
• 1.4.1 Range ( 极差 )
• R = maximal value - minimal value
• R is worse in robustness.
• Disadvantage : Based on only two observations, it
• ignores the observations within the two extremes.
• The more the observations, the greater the
• range is.
• 1.4.2 Inter- quartile range ( 四分位数差距 )
• Lower Quartile ( 下四分位数 ):
• 25 percentile, P 25 or
• Upper Quartile ( 上四分位数 ):
• 75 percentile, P 75 or
• Difference between two Quartiles
• = P 75 - P 25 = -
• = 13.120 – 8.083 = 5.037
• 1.4.3 Variance and standard deviation
• Deviation ( 偏差 ) from the mean:
• Squared deviation:
• Population variance ( 总体方差 ):
• average squared deviation throughout the population,
• Population standard deviation ( 总体标准差 ):
• When the population mean ( 总体均数 )
• is unknown, it is replaced by
• Squared deviation:
• Sample variance ( 样本方差 ) :
• average squared deviation throughout the sample
• Sample standard deviation ( 样本标准差 ) :
• Degrees of freedom ( 自由度 ) : ( n -1)
• Example The weight of male infant
• 2.85,2.90, 2.96, 3.00, 3.05, 3.18
Conventionally, mean and standard deviation are often expressed together as For instance, for height, mean and standard deviation are 170  6 (cm)
• 1.4.4 Coefficient of variation Example 9-10 For normal young males, comparing their height and weight, which one has more variation? Coefficient of variation ( 变异系数 ) is defined as
• 1.5 Relative Measures and Standardization Approaches
• 1.5.1 Ratio, frequency and intensity
• Relative measures are widely used in vital
• Statistics( 生命统计 ) and epidemiology( 流行病学 ).
• Caution : There are three types of relative
• measures although they are often named with “…
• rate”.
• Ratio ( 比 ):
• It is simply a ratio of any quantity to another
• For example, mass index ( 身体指数 )
• 2. Relative frequency ( 频率 )
• A special type of ratio:
• Both of the numerator( 分子 ) and denominator( 分母 ) are counted numbers;
• The numerator is a part of the denominator;
• Within the interval of [0,1]
• For example,
• 3. Intensity ( 强度 )
• Another special type of ratio:
• The denominator: total observed person-years
• ( 人 - 年 ) during certain period;
• The numerator: number of certain event happening during the period.
• Not necessary within the interval of [0,1]
• For example,
• Unit: “person/person-year”
• The mortality rate can be regarded as adjusted
• relative frequency per year.
• In general, intensity could be understood as
• “ relative frequency per unit of time”, reflecting
• the chance of certain event happening in a unit of
• time.
• 1.5.2 Crude death rate and standardization Table 1.9 Age specific mortality rates ( 年龄别死亡率 ) for two cities
• Which city has a higher mortality?
• 1. Direct standardization ( 直接标准化 )
• Select a “standard population”( 标准人口 )
• Taking the sum of populations of the two cities as a “standard population”
• If the mortality rate were applied to the “standard population” correspondingly, Expected numbers of death =?
• 2. Indirect standardization ( 间接标准化 )
• Standard mortality ratio (SMR) ( 标准化死亡比 )
• City A: SMR ＝ 63/58.12 ＝ 1.084
• Indirect standardized mortality rate ＝ 17.2×1.084 ＝ 18.64 (‰)
• City B: SMR ＝ 131/142.30 ＝ 0.920
• Indirect standardized mortality rate ＝ 17.2×0.920 ＝ 15.83 (‰)
• Summary
• 1. Statistics: Sample  Population
• 2. Frequency table and histogram
• 3. Average level
• Arithmetic mean, median, geometric mean
• 4. Variation
• Range, inter-quatile, standard deviation
• Coefficient of variation
• 5. Relative measures
• Ratio, frequency and intensity
• 6. Crude death rates are not compariable
• Two approaches for standardization
•