SlideShare a Scribd company logo
1 of 34
+
Basics in Statistics
Dr Maher Alaraj
+
Data Mining/Analytics Process
+
Statistics
 Definition
 The practice or science of collecting and analyzing numerical data in large
quantities, especially for the purpose of inferring proportions in a whole from
those in a representative sample.
 Descriptive statistics
 Describe the basic features of data in a study
 Provide summaries about the sample and measures
 Inferential statistics
 Investigate questions, models, and hypotheses
 Infer population characteristics based on sample
 Make judgments about what we observe
+
Some Concepts
Variable - any characteristic of an individual or entity. A variable can take
different values for different individuals. Variables can be categorical or
quantitative.
• Nominal - Categorical variables with no inherent order or ranking sequence such as
names or classes (e.g., gender). Value may be a numerical, but without numerical value (e.g.,
I, II, III). The only operation that can be applied to Nominal variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can be
compared for equality, or greater or less, but not how much greater or less.
• Interval - Values of the variable are ordered as in Ordinal, and additionally, differences
between values are meaningful, however, the scale is not absolutely anchored. Calendar
dates and temperatures on the Fahrenheit scale are examples. Addition and subtraction, but
not multiplication and division are meaningful operations.
• Continuous- Variables with all properties of Interval plus an absolute, non-arbitrary zero
point, e.g. age, weight, temperature (Kelvin). Addition, subtraction, multiplication, and division
are all meaningful operations.
+
Sampling
 What is your population of interest?
 To whom do you want to generalize your results?
 All students (18 and over)
 Undergraduates only
 Arts students
 Athletes
 Other
 Can you sample the entire population?
 A sample is “a smaller (but hopefully representative) collection of
units from a population used to determine truths about that
population” (Field, 2005)
 Why sample?
 Resources (time, money) and workload
 Gives results with known accuracy that can be calculated mathematically
+
6
Descriptive Statistics
 Descriptive statistics are used to summarize or condense a group of
scores
 They include measures of central tendency and measures of
variability
 Mode
 Median
 Mean
 Range
 Variance
 Standard Deviation
+
7
Central Tendency
 Measures of central tendency describe the average or common score
of a group of scores
 Common measures of central tendency include the mean, median,
and mode
+
8
Mean
 The mean is the arithmetic average of the scores
 The calculation of the mean considers both the number of scores and
their value
 The formula for the mean of the variable X is:
+
9
Mean
 Six men with high serum cholesterol participated in a study to examine
the effects of diet on cholesterol
 At the beginning of the study, their serum cholesterol levels (mg/dL)
were:
366, 327, 274, 292, 274, 230
 What is the mean?
+
10
Median
 The median is the middle point in an ordered distribution at which an
equal number of scores lie on each side of it
 It is also known as the 50th percentile (P50), or 2nd quartile (Q2)
 The position of the median (Mdn) can be calculated as follows:
+
11
Median
 Example: Calculate the median for the following measurements for height:
71”, 73”, 74”, 75”, 72”
 Step Two: Calculate the position of the median using the following formula:
 Step Three: Determine the value of the median by counting from either the
highest or the lowest score until the desired score is reached (in this case the
3rd score)
+
12
Median
 Suppose that in our previous distribution we had a sixth score as
follows:
71”, 72”, 73”, 74”, 74”, 75”
 What are the position and value of the median?
?
+
13
Median
 Consider the following example: Nine people each perform 40 sit-ups,
and one does 1,000
 The median score for the group is 40, and the mean (arithmetic average)
is 136
 The median would still be 40 even if the highest score were 2,000
instead of 40
 What can you learn from this?
The Median is Unaffected by Extreme Scores
+
14
Mode
 The mode is the most frequently occurring score
 Which of the following scores is the mode?
3, 7, 3, 9, 9, 3, 5, 1, 8, 5
 Similarly, for another data set (2, 4, 9, 6, 4, 6, 6, 2, 8, 2), there are two
modes; What are they?
+
15
Mode
 A distribution with a single mode is said to be unimodal
 A distribution with more than one mode is said to be bimodal,
trimodal, etc., or in general, multimodal
+
16
Variability
 Measures of variability describe the extent of similarity or difference in
a set of scores
 These measures include the range, standard deviation, and
variance
+
17
Standard Deviation (SD)
 Standard Deviation – a measure of the variability, or spread, of a set
of scores around the mean
 Intuitively, the sum of the differences between each score and the mean
(known as deviation scores) appears to be a good approach for
measuring variability around the mean
+
18
SD
 Symbolically, we can write this as
 Let’s use the scores 1, 2, 6, 6, and 15, where
M = 6
19
SD
Now let’s calculate the sum of the deviation scores:
= (1-6) + (2-6) + (6-6) + (6-6) + (15-6)
= (-5) + (-4) + (0) + (0) + (9)
= = -9 + 9 = 0
+
20
SD
 We can avoid this problem (deviation scores sum to 0) by
squaring each deviation score before summing them
 This would be written symbolically as
 Substituting our X scores again,
= (1-6)2 + (2-6)2 + (6-6)2 + (6-6)2 + (15-6)2
= (-5)2 + (-4)2 + (0)2 + (0)2 + (9)2
= 25 + 16 + 0 + 0 + 81
= 122
+
21
SD
 We then divide this value by n-1 to arrive at the mean squared
deviation
122/4 = 30.5
 We then take the square root of this value to bring the units
back to the raw score units
+
22
Variance
 The variance is the square of the standard deviation
 It is used most commonly with more advanced statistical procedures
such as regression analysis, analysis of variance (ANOVA), and the
determination of the reliability of a test
 The variance show of how far each value in the data set is from
the mean. Here is how it is defined:
+ Example calculation of variance and standard deviation on strength scores.
Subj Score (x) Deviation (x)2
1 216 22.7 515.29
2 144 -49.3 2430.49
3 183 -10.3 106.09
4 138 -55.3 3058.09
5 212 18.7 349.69
6 180 -13.3 176.89
7 200 6.7 44.89
8 264 70.7 4998.49
9 203 9.7 94.09
=1740 =0 =11774.01
3
.
193
9
1740
n
X
=
X 


s
x X
n
2
2
1
1177401
8
147175




 
( ) .
.
s
x X
n




( )
.
2
1
384
+
24
Range
 The range of a set of data is the difference between the highest
and lowest values in the set. To find the range, first order the
data from least to greatest. Then subtract the smallest value
from the largest value in the set.
Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9.
So the range is 9-3 = 6.
+
Quantiles
 one of the class of values of a variable that divides
the total frequency of a sample or population into a
given number of equal proportions
 Examples:
 Percentile
 Decile
 Quartile
 Quintile
+
Quantiles
 The 100-quantiles are called percentiles.
 The 10-quantiles are called deciles.
 The 5-quantiles are call quintiles.
 The 4-quantiles are called quartiles.
+
Percentiles
 The kth percentile is a scale value for a data series equal to the
p/100 quantile
 The 1st percentile cuts off lowest 1% of data
 The 98th percentile cuts off lowest 98% of data
 The 25th percentiles is the first quartile
 The 50th percentile is the median
+
Deciles
 each of ten equal groups into which a population can be
divided according to the distribution of values of a particular
variable.
 Represents 1/10 of the total population
 The 1st decile cuts off the lowest 10% of data
 The 9th decile cuts off the lowest 90% of data
+
Quartiles
 The quartiles divide the distribution into four
equal parts
 called fourths
 The total of 100% is broken into four equal parts: 25%,
50%, 75%, 100%.
 Lower Quartile is the 25th percentile. (0.25)
 Median Quartile is the 50th percentile. (0.50)
 Upper Quartile is the 75th percentile. (0.75)
+
Quintiles
 any of five equal groups into which a population can be divided
according to the distribution of values of a particular variable.
 Represents 20% or 1/5 of the given amount
+
Box Plot
 A visual tool that illustrates the distribution of a
univariate dataset.
 It illustrates the median, upper and lower
quantiles, upper and lower deciles, and any
outliers.
 Using R
 boxplot(dataset)
 quantile(dataset)
Box Plot
Practice
 One hundred randomly selected students were asked the number of movies they watched
the previous week. The results are as follows:
 Find the sample mean, median, and range of the sample.
 Find the standard deviation and the variance.
 Construct a barplot of the data.
 Find the first quartile.
 Find the second quartile. To which value it corresponds?
 Find the third quartile.
 Construct a box plot of the data.
 What percent of the students saw fewer than three movies?
 Find the 40th percentile.
 Find the 90th percentile.
# of Movies Frequency
0 20
1 36
2 24
3 16
4 4
20
36
24
16
4
0
5
10
15
20
25
30
35
40
0 1 2 3 4
Histogram for films frequencies
Frequency

More Related Content

Similar to Basics of Stats (2).pptx

Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1RajnishSingh367990
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSnagamani651296
 
These is info only ill be attaching the questions work CJ 301 – .docx
These is info only ill be attaching the questions work CJ 301 – .docxThese is info only ill be attaching the questions work CJ 301 – .docx
These is info only ill be attaching the questions work CJ 301 – .docxmeagantobias
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of dataUnsa Shakir
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbersUlster BOCES
 
Central tendency _dispersion
Central tendency _dispersionCentral tendency _dispersion
Central tendency _dispersionKirti Gupta
 
CJ 301 – Measures of DispersionVariability Think back to the .docx
CJ 301 – Measures of DispersionVariability Think back to the .docxCJ 301 – Measures of DispersionVariability Think back to the .docx
CJ 301 – Measures of DispersionVariability Think back to the .docxmonicafrancis71118
 
best for normal distribution.ppt
best for normal distribution.pptbest for normal distribution.ppt
best for normal distribution.pptDejeneDay
 
statical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptstatical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptNazarudinManik1
 
Measures of Central Tendency.ppt
Measures of Central Tendency.pptMeasures of Central Tendency.ppt
Measures of Central Tendency.pptAdamRayManlunas1
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic StatisticsYee Bee Choo
 
Frequency distribution, central tendency, measures of dispersion
Frequency distribution, central tendency, measures of dispersionFrequency distribution, central tendency, measures of dispersion
Frequency distribution, central tendency, measures of dispersionDhwani Shah
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
 

Similar to Basics of Stats (2).pptx (20)

Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
These is info only ill be attaching the questions work CJ 301 – .docx
These is info only ill be attaching the questions work CJ 301 – .docxThese is info only ill be attaching the questions work CJ 301 – .docx
These is info only ill be attaching the questions work CJ 301 – .docx
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbers
 
Central tendency _dispersion
Central tendency _dispersionCentral tendency _dispersion
Central tendency _dispersion
 
CJ 301 – Measures of DispersionVariability Think back to the .docx
CJ 301 – Measures of DispersionVariability Think back to the .docxCJ 301 – Measures of DispersionVariability Think back to the .docx
CJ 301 – Measures of DispersionVariability Think back to the .docx
 
best for normal distribution.ppt
best for normal distribution.pptbest for normal distribution.ppt
best for normal distribution.ppt
 
statical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptstatical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.ppt
 
Measures of Central Tendency.ppt
Measures of Central Tendency.pptMeasures of Central Tendency.ppt
Measures of Central Tendency.ppt
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
 
Frequency distribution, central tendency, measures of dispersion
Frequency distribution, central tendency, measures of dispersionFrequency distribution, central tendency, measures of dispersion
Frequency distribution, central tendency, measures of dispersion
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
DescriptiveStatistics.pdf
DescriptiveStatistics.pdfDescriptiveStatistics.pdf
DescriptiveStatistics.pdf
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 

Basics of Stats (2).pptx

  • 3. + Statistics  Definition  The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.  Descriptive statistics  Describe the basic features of data in a study  Provide summaries about the sample and measures  Inferential statistics  Investigate questions, models, and hypotheses  Infer population characteristics based on sample  Make judgments about what we observe
  • 4. + Some Concepts Variable - any characteristic of an individual or entity. A variable can take different values for different individuals. Variables can be categorical or quantitative. • Nominal - Categorical variables with no inherent order or ranking sequence such as names or classes (e.g., gender). Value may be a numerical, but without numerical value (e.g., I, II, III). The only operation that can be applied to Nominal variables is enumeration. • Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can be compared for equality, or greater or less, but not how much greater or less. • Interval - Values of the variable are ordered as in Ordinal, and additionally, differences between values are meaningful, however, the scale is not absolutely anchored. Calendar dates and temperatures on the Fahrenheit scale are examples. Addition and subtraction, but not multiplication and division are meaningful operations. • Continuous- Variables with all properties of Interval plus an absolute, non-arbitrary zero point, e.g. age, weight, temperature (Kelvin). Addition, subtraction, multiplication, and division are all meaningful operations.
  • 5. + Sampling  What is your population of interest?  To whom do you want to generalize your results?  All students (18 and over)  Undergraduates only  Arts students  Athletes  Other  Can you sample the entire population?  A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005)  Why sample?  Resources (time, money) and workload  Gives results with known accuracy that can be calculated mathematically
  • 6. + 6 Descriptive Statistics  Descriptive statistics are used to summarize or condense a group of scores  They include measures of central tendency and measures of variability  Mode  Median  Mean  Range  Variance  Standard Deviation
  • 7. + 7 Central Tendency  Measures of central tendency describe the average or common score of a group of scores  Common measures of central tendency include the mean, median, and mode
  • 8. + 8 Mean  The mean is the arithmetic average of the scores  The calculation of the mean considers both the number of scores and their value  The formula for the mean of the variable X is:
  • 9. + 9 Mean  Six men with high serum cholesterol participated in a study to examine the effects of diet on cholesterol  At the beginning of the study, their serum cholesterol levels (mg/dL) were: 366, 327, 274, 292, 274, 230  What is the mean?
  • 10. + 10 Median  The median is the middle point in an ordered distribution at which an equal number of scores lie on each side of it  It is also known as the 50th percentile (P50), or 2nd quartile (Q2)  The position of the median (Mdn) can be calculated as follows:
  • 11. + 11 Median  Example: Calculate the median for the following measurements for height: 71”, 73”, 74”, 75”, 72”  Step Two: Calculate the position of the median using the following formula:  Step Three: Determine the value of the median by counting from either the highest or the lowest score until the desired score is reached (in this case the 3rd score)
  • 12. + 12 Median  Suppose that in our previous distribution we had a sixth score as follows: 71”, 72”, 73”, 74”, 74”, 75”  What are the position and value of the median? ?
  • 13. + 13 Median  Consider the following example: Nine people each perform 40 sit-ups, and one does 1,000  The median score for the group is 40, and the mean (arithmetic average) is 136  The median would still be 40 even if the highest score were 2,000 instead of 40  What can you learn from this? The Median is Unaffected by Extreme Scores
  • 14. + 14 Mode  The mode is the most frequently occurring score  Which of the following scores is the mode? 3, 7, 3, 9, 9, 3, 5, 1, 8, 5  Similarly, for another data set (2, 4, 9, 6, 4, 6, 6, 2, 8, 2), there are two modes; What are they?
  • 15. + 15 Mode  A distribution with a single mode is said to be unimodal  A distribution with more than one mode is said to be bimodal, trimodal, etc., or in general, multimodal
  • 16. + 16 Variability  Measures of variability describe the extent of similarity or difference in a set of scores  These measures include the range, standard deviation, and variance
  • 17. + 17 Standard Deviation (SD)  Standard Deviation – a measure of the variability, or spread, of a set of scores around the mean  Intuitively, the sum of the differences between each score and the mean (known as deviation scores) appears to be a good approach for measuring variability around the mean
  • 18. + 18 SD  Symbolically, we can write this as  Let’s use the scores 1, 2, 6, 6, and 15, where M = 6
  • 19. 19 SD Now let’s calculate the sum of the deviation scores: = (1-6) + (2-6) + (6-6) + (6-6) + (15-6) = (-5) + (-4) + (0) + (0) + (9) = = -9 + 9 = 0
  • 20. + 20 SD  We can avoid this problem (deviation scores sum to 0) by squaring each deviation score before summing them  This would be written symbolically as  Substituting our X scores again, = (1-6)2 + (2-6)2 + (6-6)2 + (6-6)2 + (15-6)2 = (-5)2 + (-4)2 + (0)2 + (0)2 + (9)2 = 25 + 16 + 0 + 0 + 81 = 122
  • 21. + 21 SD  We then divide this value by n-1 to arrive at the mean squared deviation 122/4 = 30.5  We then take the square root of this value to bring the units back to the raw score units
  • 22. + 22 Variance  The variance is the square of the standard deviation  It is used most commonly with more advanced statistical procedures such as regression analysis, analysis of variance (ANOVA), and the determination of the reliability of a test  The variance show of how far each value in the data set is from the mean. Here is how it is defined:
  • 23. + Example calculation of variance and standard deviation on strength scores. Subj Score (x) Deviation (x)2 1 216 22.7 515.29 2 144 -49.3 2430.49 3 183 -10.3 106.09 4 138 -55.3 3058.09 5 212 18.7 349.69 6 180 -13.3 176.89 7 200 6.7 44.89 8 264 70.7 4998.49 9 203 9.7 94.09 =1740 =0 =11774.01 3 . 193 9 1740 n X = X    s x X n 2 2 1 1177401 8 147175       ( ) . . s x X n     ( ) . 2 1 384
  • 24. + 24 Range  The range of a set of data is the difference between the highest and lowest values in the set. To find the range, first order the data from least to greatest. Then subtract the smallest value from the largest value in the set. Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9. So the range is 9-3 = 6.
  • 25. + Quantiles  one of the class of values of a variable that divides the total frequency of a sample or population into a given number of equal proportions  Examples:  Percentile  Decile  Quartile  Quintile
  • 26. + Quantiles  The 100-quantiles are called percentiles.  The 10-quantiles are called deciles.  The 5-quantiles are call quintiles.  The 4-quantiles are called quartiles.
  • 27. + Percentiles  The kth percentile is a scale value for a data series equal to the p/100 quantile  The 1st percentile cuts off lowest 1% of data  The 98th percentile cuts off lowest 98% of data  The 25th percentiles is the first quartile  The 50th percentile is the median
  • 28. + Deciles  each of ten equal groups into which a population can be divided according to the distribution of values of a particular variable.  Represents 1/10 of the total population  The 1st decile cuts off the lowest 10% of data  The 9th decile cuts off the lowest 90% of data
  • 29. + Quartiles  The quartiles divide the distribution into four equal parts  called fourths  The total of 100% is broken into four equal parts: 25%, 50%, 75%, 100%.  Lower Quartile is the 25th percentile. (0.25)  Median Quartile is the 50th percentile. (0.50)  Upper Quartile is the 75th percentile. (0.75)
  • 30. + Quintiles  any of five equal groups into which a population can be divided according to the distribution of values of a particular variable.  Represents 20% or 1/5 of the given amount
  • 31. + Box Plot  A visual tool that illustrates the distribution of a univariate dataset.  It illustrates the median, upper and lower quantiles, upper and lower deciles, and any outliers.  Using R  boxplot(dataset)  quantile(dataset)
  • 33. Practice  One hundred randomly selected students were asked the number of movies they watched the previous week. The results are as follows:  Find the sample mean, median, and range of the sample.  Find the standard deviation and the variance.  Construct a barplot of the data.  Find the first quartile.  Find the second quartile. To which value it corresponds?  Find the third quartile.  Construct a box plot of the data.  What percent of the students saw fewer than three movies?  Find the 40th percentile.  Find the 90th percentile. # of Movies Frequency 0 20 1 36 2 24 3 16 4 4
  • 34. 20 36 24 16 4 0 5 10 15 20 25 30 35 40 0 1 2 3 4 Histogram for films frequencies Frequency