SlideShare a Scribd company logo
Statistics
• Is concerned with
• Collecting
• Organizing
• Summarizing
• Presenting and Analyzing data
• To draw valid conclusions &
making reasonable decisions
on the basis of such analysis
Collecting data
• Can collect data concerning
–Characteristics of a groups of
individuals or objects
–E.g. 100 blood donors donate 100
bottles of blood in Blood Bank
Organizing data
• Can organize data by classifying
different groups
– Sex and blood type of blood donors
– E.g. Male, Female and A,B,AB & O
Summarizing data
• Can summarize the number of
individual in each class
–E.g 60 males and 40 females
–15 A, 30 B, 5 AB and 50 O
Presenting data
• Can present data by
rate, ratio,
percentage, diagram
ect
• Male:Female ratio of
blood donors = 3:2
• Percentage of Blood
groups
• A = 15 %
• B = 30 %
• AB = 5 %
• O = 50 %
0
10
20
30
40
50
60
70
80
90
100
A B AB O
A
B
AB
O
Analyzing data
• From presentation, the findings can be
analyzed such as more male blood
donors than female
There are two types of statistics
• Descriptive statistics
• Describes and
summerizes data
• Inferential statistics
• Use sample of data
to help us draw
conclusions about
larger populations
0
10
20
30
40
50
60
70
0 to 10 11 to 20 21 to 30 31 to 40 41 to 50 51 to 60 >60
Age groups
Bar graph showing cholera outbreak by age group
Sample
Clinical trial for Antihypertensive drug
• Population with SBP = 180 mm Hg
• Random sample = 10 patients
• Give antihypertensive drug
• After drug, sample mean SBP = 170 mm Hg
• Can we conclude that the drug was effective not
without a statistical analysis?
• No (need to compute probability due to chance )
Descriptive statistics
• Help organize data in more meaningful
way
• Summerize data
• Investigate relationship between variables
• Serve as preliminary analysis before using
inferential technique
• But analysis techniques depend on types
of data
Types of data
• Nominal data
• Ordinal data
• Interval data
• Ratio data
Nominal data
• Refers to data that represent categories or
names
• There is no implied order to the categories
of nominal data
• E.g. Eye colour
• Race
• Gender
• Marital status
Ordinal data
• Refers to data that are ordered but the
space or intervals between data values
are not necessarily equal.
• E.g. Strongly agree
• Agree
• No opinion
• Disagree
• Strongly disagree
Interval data
• Refers the data the interval betweenvalues
are the same
• E.g. Fahrenheit temperature scale
• The difference between 70 degrees and
71 degrees is the same as the difference
between 32 and 33 degrees
• But the scale is not a Ratio scale because
40 degrees F is not twice as much as 20
degrees F ( There is no absolute zero )
Ratio data
• Ratio data do have meaningful ratios e.g.
Age is ratio data.
• Someone who is 40 yrs of age is twice as
old as someone who is 20 yrs
• Temperature Kelvin scale is ratio data
• Most data analysis techniques that apply
Ratio data also apply to interval data
Identify the type of data represented by
each of the following:
• Weight ( Kg )
• Temperature ( Celcius)
• Hair colour
• Job satisfaction index ( 1-5 )
• No. of Heart attack
• Calendar year
• R
• I
• N
• O
• R
• I
Frequency distribution
• Useful method for summerizing data in graphic
form
• Suppose we want to investigate relationship
between coffee drinking and heart rate ( pulse )
• First we need to know something about heart
rates in a “ normal “ population
• Next we define a population to investigate
• E.g Males between 30 and 40 yrs in Myanmar
• Take sample from population
• We find following 10 heart rates
• 72,52,63,68,66,72,74,81,76,56
• A frequency distribution will help us to
summerize these numbers and see
patterns in the values
• How many men had heart rate between 70
and 75? ______
50 55 60 65 70 75 80 85
3
2
1
3
• The choice of interval size depends
somewhat on the level of detail you want
the graph to show
• For instance, if we increase interval size to
10 we have the graph below.
• How many people have a heart rate
between 70 and 75 ?
50 60 70 80 90
4
3
2
1
Can’t tell.
Mean, Median and Mode
• Mean = The arithmatic mean is synonymous with average and is the
same calculation
• E.g Mean heart rate sample is
= 68.0
• The mean is common measure of central tendency
10
56
76
81
74
72
66
68
63
52
72 









HR
Median
• Median is the centre of the group of numbers.
That is half the numbers will be above the
median and half will be below
• To calculate the median, we first to sort out data
array. For the heart rate data:
72,52,63,68,66,72,74,81,76,56
• Sorting result in the following:
52, 56, 63, 66, 68, 72, 72, 74, 76, 81
• There is no middle number. In this case we take
the mean of two middle numbers
Median
Thus what is median ? =70
Mode
• The mode of the set data is the most
frequently occurring number
• When evaluating data the mode is rarely
used
• In heart rate data:
• 52,56,63,66,68,72,72,74,76,81
• What is the mode ? 72
Mean = 68 Median = 70 Mode = 72
• As you can see the three measures of
central tendency ( Mean, Median, Mode )
have different values
• They are used in different statistical
situations, depending on the nature of data
and statistical tests to be performed.
Population and samples
• A population is a group of subjects, usually
large, that the investigator is interested
in studying
• E.g Males in Myanmar between 30 & 40 yrs of age
• People in Shan state with bladder cancer
• People with systolic blood pressure over 180who do
not smoke
• It is impractical to study an entire
population. Hence researcher should take
a sample from population
• If a sample is properly drawn and is of
sufficient size, then we can make
inferences about the population by
studying the sample
Population
Sample
X X X X X
X X X
X X X X
X X X X X
X X X XX X
X X X
X X
As a rule of thumb we call properties of population =
parameters and properties of sample = statistics
• Population parameters
usually represented with
Greek letter
• μ population mean
• σ population S.D
• Sample statistics usually
represented with Roman
letters
• Sample mean
• s Sample S.D
X
Measures of dispersion
• While mean & median give useful information
about the centre of data, we also need to know
how spread out the numbers are about the
centre
• Consider the following data sets:
• Set 1: 60 40 30 50 60 40 70
• Set 2: 50 49 49 51 48 53 50
• Both have a mean of 50, but obviously set 1 is
more spread out than set 2
Range
• One simple measure of “ Spread “ or
“ Dispersion “ is RANGE
• This is simply the difference between the
highest and lowest values
• So in our two data sets
• Set 1: 60 40 30 50 60 40 70
• Set 2: 50 49 49 51 48 53 50
• What is the range of data in set 1 ?
• What is the range of data in set 2 ?
70 – 30 = 40
53 – 48 = 5
• However you will find that the range is not
often used, and for good reason it is too
sensitive to a single high or low data value
• Instead we suggest two alternatives:
• Inter quartile range
• Standard deviation
Inter quartile range
• The inter quartile range is similar to the range
except that it measures the difference between
the first and third quartiles
• To compute it, we first sort the data.
• Then find the data values correspondingly to the
first quarter of the numbers ( first quartile ) and
then top quarter ( third quartile )
• The inter quartile range is the distance between
these quartiles
• Given the following data set:
18 21 23 24 24 32 42 59
• We sort the data from lowest and highest
• Find the bottom quarter and top quarter of
the data
• Then determine the range between these
values
• What do you get for the inter quartile range ?
First quartile = 22 Third quartile = 37
13
Why is inter quartile range preferable
measure to the range ?
1. It is a smaller number
2. It is less prone to distortion by a single
large or small value
3. It is easier to calculate
– Enter 1, 2, 3
Yes, outliers in the data do not effect the inter quartile
range
Standard deviation
• The most common used measure of
dispersion is Standard Deviation
• The S.D can be thought of as the “
average “ deviation ( difference ) between
the mean of a sample and each data value
in the sample
• The actual formula squares all the deviations to
make them all positive and takes the square
root at the end
• Where = sample mean = summation operation
• = individual sample value
• n = number of data points in a sample
1
)
( 2




n
x
x
SD
i
x
i
x
• As an example , let’s compute the standard deviation of
the four values
• 1 3 5 7
• Step 1 – Calculate the mean = Σ x / n = 4
• Step 2 – Compute the deviation of each score from the
mean
Value Mean Deviation Step 3 – Square all
deviations and add
square deviation
1 4 -3 9
3 4 -1 1
5 4 +1 1
7 4 +3 9
20
• Step 4 – Divided by n – 1 = 20 / 3
• Step 5 – Take the square root
Review
• Step 1 – Calculate mean
• Step 2 – Compute deviation
• Step 3 – Square and sum
• Step 4 – Divide by n – 1
• Step 5 – Take square root
• By the way the quantity before we take the square root is called
Variance
• Variance = ( Standard deviation
58
.
2
3
/
20 
2
)
x
x
xi 
  2
)
( x
xi
)
1
/(
)
( 2


 n
x
xi
1
)
( 2



n
x
xi

More Related Content

Similar to IV STATISTICS I.pdf

PARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptxPARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptx
DrLasya
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
jeyanthisivakumar
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
santhoshikayithi
 
Stats - Intro to Quantitative
Stats -  Intro to Quantitative Stats -  Intro to Quantitative
Stats - Intro to Quantitative
Michigan State University
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
ssuserf0d95a
 
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxLecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptx
shakirRahman10
 
Intro statistics
Intro statisticsIntro statistics
Intro statistics
Arash Kamrani
 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6
Ruru Chowdhury
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Amira Talic
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
ezaldeen2013
 
PRESENTATION.pptx
PRESENTATION.pptxPRESENTATION.pptx
PRESENTATION.pptx
MedicalEducation7
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
SailajaReddyGunnam
 
Ch5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxCh5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptx
zerihunnana
 
LECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmathsLECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmaths
jafari12
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
Shaun Cochrane
 
Statistical analysis
Statistical  analysisStatistical  analysis
Statistical analysis
Princy Francis M
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
DrZahid Khan
 
Business statistics
Business statisticsBusiness statistics
Business statistics
Ravi Prakash
 
Statistics four
Statistics fourStatistics four
Statistics four
Mohamed Hefny
 
Descriptive Statistics.pptx
Descriptive Statistics.pptxDescriptive Statistics.pptx
Descriptive Statistics.pptx
test215275
 

Similar to IV STATISTICS I.pdf (20)

PARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptxPARAMETRIC TESTS.pptx
PARAMETRIC TESTS.pptx
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
 
Biostatistics ppt
Biostatistics  pptBiostatistics  ppt
Biostatistics ppt
 
Stats - Intro to Quantitative
Stats -  Intro to Quantitative Stats -  Intro to Quantitative
Stats - Intro to Quantitative
 
Medical Statistics.ppt
Medical Statistics.pptMedical Statistics.ppt
Medical Statistics.ppt
 
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxLecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptx
 
Intro statistics
Intro statisticsIntro statistics
Intro statistics
 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf7- Quantitative Research- Part 3.pdf
7- Quantitative Research- Part 3.pdf
 
PRESENTATION.pptx
PRESENTATION.pptxPRESENTATION.pptx
PRESENTATION.pptx
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Ch5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptxCh5-quantitative-data analysis.pptx
Ch5-quantitative-data analysis.pptx
 
LECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmathsLECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmaths
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
Statistical analysis
Statistical  analysisStatistical  analysis
Statistical analysis
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Business statistics
Business statisticsBusiness statistics
Business statistics
 
Statistics four
Statistics fourStatistics four
Statistics four
 
Descriptive Statistics.pptx
Descriptive Statistics.pptxDescriptive Statistics.pptx
Descriptive Statistics.pptx
 

Recently uploaded

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 

Recently uploaded (20)

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 

IV STATISTICS I.pdf

  • 1.
  • 2. Statistics • Is concerned with • Collecting • Organizing • Summarizing • Presenting and Analyzing data • To draw valid conclusions & making reasonable decisions on the basis of such analysis
  • 3. Collecting data • Can collect data concerning –Characteristics of a groups of individuals or objects –E.g. 100 blood donors donate 100 bottles of blood in Blood Bank
  • 4. Organizing data • Can organize data by classifying different groups – Sex and blood type of blood donors – E.g. Male, Female and A,B,AB & O
  • 5. Summarizing data • Can summarize the number of individual in each class –E.g 60 males and 40 females –15 A, 30 B, 5 AB and 50 O
  • 6. Presenting data • Can present data by rate, ratio, percentage, diagram ect • Male:Female ratio of blood donors = 3:2 • Percentage of Blood groups • A = 15 % • B = 30 % • AB = 5 % • O = 50 % 0 10 20 30 40 50 60 70 80 90 100 A B AB O A B AB O
  • 7. Analyzing data • From presentation, the findings can be analyzed such as more male blood donors than female
  • 8. There are two types of statistics • Descriptive statistics • Describes and summerizes data • Inferential statistics • Use sample of data to help us draw conclusions about larger populations 0 10 20 30 40 50 60 70 0 to 10 11 to 20 21 to 30 31 to 40 41 to 50 51 to 60 >60 Age groups Bar graph showing cholera outbreak by age group Sample
  • 9. Clinical trial for Antihypertensive drug • Population with SBP = 180 mm Hg • Random sample = 10 patients • Give antihypertensive drug • After drug, sample mean SBP = 170 mm Hg • Can we conclude that the drug was effective not without a statistical analysis? • No (need to compute probability due to chance )
  • 10. Descriptive statistics • Help organize data in more meaningful way • Summerize data • Investigate relationship between variables • Serve as preliminary analysis before using inferential technique • But analysis techniques depend on types of data
  • 11. Types of data • Nominal data • Ordinal data • Interval data • Ratio data
  • 12. Nominal data • Refers to data that represent categories or names • There is no implied order to the categories of nominal data • E.g. Eye colour • Race • Gender • Marital status
  • 13. Ordinal data • Refers to data that are ordered but the space or intervals between data values are not necessarily equal. • E.g. Strongly agree • Agree • No opinion • Disagree • Strongly disagree
  • 14. Interval data • Refers the data the interval betweenvalues are the same • E.g. Fahrenheit temperature scale • The difference between 70 degrees and 71 degrees is the same as the difference between 32 and 33 degrees • But the scale is not a Ratio scale because 40 degrees F is not twice as much as 20 degrees F ( There is no absolute zero )
  • 15. Ratio data • Ratio data do have meaningful ratios e.g. Age is ratio data. • Someone who is 40 yrs of age is twice as old as someone who is 20 yrs • Temperature Kelvin scale is ratio data • Most data analysis techniques that apply Ratio data also apply to interval data
  • 16. Identify the type of data represented by each of the following: • Weight ( Kg ) • Temperature ( Celcius) • Hair colour • Job satisfaction index ( 1-5 ) • No. of Heart attack • Calendar year • R • I • N • O • R • I
  • 17. Frequency distribution • Useful method for summerizing data in graphic form • Suppose we want to investigate relationship between coffee drinking and heart rate ( pulse ) • First we need to know something about heart rates in a “ normal “ population • Next we define a population to investigate • E.g Males between 30 and 40 yrs in Myanmar • Take sample from population
  • 18. • We find following 10 heart rates • 72,52,63,68,66,72,74,81,76,56 • A frequency distribution will help us to summerize these numbers and see patterns in the values • How many men had heart rate between 70 and 75? ______ 50 55 60 65 70 75 80 85 3 2 1 3
  • 19. • The choice of interval size depends somewhat on the level of detail you want the graph to show • For instance, if we increase interval size to 10 we have the graph below. • How many people have a heart rate between 70 and 75 ? 50 60 70 80 90 4 3 2 1 Can’t tell.
  • 20. Mean, Median and Mode • Mean = The arithmatic mean is synonymous with average and is the same calculation • E.g Mean heart rate sample is = 68.0 • The mean is common measure of central tendency 10 56 76 81 74 72 66 68 63 52 72           HR
  • 21. Median • Median is the centre of the group of numbers. That is half the numbers will be above the median and half will be below • To calculate the median, we first to sort out data array. For the heart rate data: 72,52,63,68,66,72,74,81,76,56 • Sorting result in the following: 52, 56, 63, 66, 68, 72, 72, 74, 76, 81 • There is no middle number. In this case we take the mean of two middle numbers Median Thus what is median ? =70
  • 22. Mode • The mode of the set data is the most frequently occurring number • When evaluating data the mode is rarely used • In heart rate data: • 52,56,63,66,68,72,72,74,76,81 • What is the mode ? 72
  • 23. Mean = 68 Median = 70 Mode = 72 • As you can see the three measures of central tendency ( Mean, Median, Mode ) have different values • They are used in different statistical situations, depending on the nature of data and statistical tests to be performed.
  • 24. Population and samples • A population is a group of subjects, usually large, that the investigator is interested in studying • E.g Males in Myanmar between 30 & 40 yrs of age • People in Shan state with bladder cancer • People with systolic blood pressure over 180who do not smoke
  • 25. • It is impractical to study an entire population. Hence researcher should take a sample from population • If a sample is properly drawn and is of sufficient size, then we can make inferences about the population by studying the sample Population Sample X X X X X X X X X X X X X X X X X X X X XX X X X X X X
  • 26. As a rule of thumb we call properties of population = parameters and properties of sample = statistics • Population parameters usually represented with Greek letter • μ population mean • σ population S.D • Sample statistics usually represented with Roman letters • Sample mean • s Sample S.D X
  • 27. Measures of dispersion • While mean & median give useful information about the centre of data, we also need to know how spread out the numbers are about the centre • Consider the following data sets: • Set 1: 60 40 30 50 60 40 70 • Set 2: 50 49 49 51 48 53 50 • Both have a mean of 50, but obviously set 1 is more spread out than set 2
  • 28. Range • One simple measure of “ Spread “ or “ Dispersion “ is RANGE • This is simply the difference between the highest and lowest values • So in our two data sets • Set 1: 60 40 30 50 60 40 70 • Set 2: 50 49 49 51 48 53 50 • What is the range of data in set 1 ? • What is the range of data in set 2 ? 70 – 30 = 40 53 – 48 = 5
  • 29. • However you will find that the range is not often used, and for good reason it is too sensitive to a single high or low data value • Instead we suggest two alternatives: • Inter quartile range • Standard deviation
  • 30. Inter quartile range • The inter quartile range is similar to the range except that it measures the difference between the first and third quartiles • To compute it, we first sort the data. • Then find the data values correspondingly to the first quarter of the numbers ( first quartile ) and then top quarter ( third quartile ) • The inter quartile range is the distance between these quartiles
  • 31. • Given the following data set: 18 21 23 24 24 32 42 59 • We sort the data from lowest and highest • Find the bottom quarter and top quarter of the data • Then determine the range between these values • What do you get for the inter quartile range ? First quartile = 22 Third quartile = 37 13
  • 32. Why is inter quartile range preferable measure to the range ? 1. It is a smaller number 2. It is less prone to distortion by a single large or small value 3. It is easier to calculate – Enter 1, 2, 3 Yes, outliers in the data do not effect the inter quartile range
  • 33. Standard deviation • The most common used measure of dispersion is Standard Deviation • The S.D can be thought of as the “ average “ deviation ( difference ) between the mean of a sample and each data value in the sample
  • 34. • The actual formula squares all the deviations to make them all positive and takes the square root at the end • Where = sample mean = summation operation • = individual sample value • n = number of data points in a sample 1 ) ( 2     n x x SD i x i x
  • 35. • As an example , let’s compute the standard deviation of the four values • 1 3 5 7 • Step 1 – Calculate the mean = Σ x / n = 4 • Step 2 – Compute the deviation of each score from the mean Value Mean Deviation Step 3 – Square all deviations and add square deviation 1 4 -3 9 3 4 -1 1 5 4 +1 1 7 4 +3 9 20
  • 36. • Step 4 – Divided by n – 1 = 20 / 3 • Step 5 – Take the square root Review • Step 1 – Calculate mean • Step 2 – Compute deviation • Step 3 – Square and sum • Step 4 – Divide by n – 1 • Step 5 – Take square root • By the way the quantity before we take the square root is called Variance • Variance = ( Standard deviation 58 . 2 3 / 20  2 ) x x xi    2 ) ( x xi ) 1 /( ) ( 2    n x xi 1 ) ( 2    n x xi