SlideShare a Scribd company logo
1 of 48
Descriptive Statistics
Descriptive Statistics
β€’ Tabular, graphical, or numerical summaries of data.
Age
Mean 42.57
Median 40
Mode 40
Standard Deviation 10.63
Sample Variance 113.01
Range 44
Minimum 21
Maximum 65
Frequency
Female 12
Male 18
Grand Total 30
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5
Frequency
Opinion
Bar Chart for Opinions
Summarizing Data for Categorical Variables
β€’ Let us focus on Tabular and Graphical summaries first. We will deal
with numerical summaries later.
β€’ Tabular:
β€’ Frequency distribution
β€’ Relative frequency distribution
β€’ Percent frequency distribution
β€’ Graphical:
β€’ Bar chart
β€’ Pie chart
Frequency Distribution
β€’ A frequency distribution is a tabular summary of data showing the
number (frequency) of observations in each of several non-
overlapping categories or classes.
Opinion Frequency
Strongly disagree 8
Disagree 4
Neutral 6
Agree 7
Strongly agree 5
Grand Total 30
Relative Frequency Distribution
Relative frequency of a class =
Frequency of the class
Total number of observations
Percent frequency of a class =
Frequency of the class
Total number of observations
Γ— 100 %
Opinion Frequency Relative frequency Percent Frequency
Strongly disagree 8 0.27 27%
Disagree 4 0.13 13%
Neutral 6 0.20 20%
Agree 7 0.23 23%
Strongly agree 5 0.17 17%
Grand Total 30 1.00 100%
Bar Chart
0
5
10
15
20
25
Elderly Middle-aged Young
FREQUENCY
AGE CATEGORY
Number of people in each age category
Pie Chart
Age distribution of people
Elderly Middle-aged Young
Summarizing Data for Quantitative Variables
β€’ Let us focus on Tabular and Graphical summaries first. We will deal
with numerical summaries later.
β€’ Tabular:
β€’ Frequency distribution
β€’ Relative frequency distribution
β€’ Percent frequency distribution
β€’ Graphical:
β€’ Histogram
Frequency Distribution
β€’ We need to bin/bucket the quantitative variable of interest.
β€’ Three Steps:
1. Determine the number of nonoverlapping classes.
2. Determine the width of each class.
3. Determine the class limits.
β€’ Choosing the number of classes is tricky! It is done by trial and error.
β€’ Five to twenty classes are preferred. (Not too few, not too many, just
enough to informatively show the variation in the frequencies.)
Frequency Distribution
Approximate class width =
Largest data value βˆ’ Smallest data value
Number of classes
Frequency Distribution
Class Frequency
[31000, 35200] 1
(35200, 39400] 3
(39400, 43600] 2
(43600, 47800] 7
(47800, 52000] 3
(52000, 56200] 4
(56200, 60400] 3
(60400, 64600] 4
(64600, 68800] 1
(68800, 73000] 0
(73000, 77200] 0
(77200, 81400] 2
Relative/Percent Frequency
Class Frequency Rel. Freq. Perc. Freq.
[31000, 35200] 1 0.033 3.33
(35200, 39400] 3 0.100 10.00
(39400, 43600] 2 0.067 6.67
(43600, 47800] 7 0.233 23.33
(47800, 52000] 3 0.100 10.00
(52000, 56200] 4 0.133 13.33
(56200, 60400] 3 0.100 10.00
(60400, 64600] 4 0.133 13.33
(64600, 68800] 1 0.033 3.33
(68800, 73000] 0 0.000 0.00
(73000, 77200] 0 0.000 0.00
(77200, 81400] 2 0.067 6.67
Histogram
Skewness
β€’ To which side is the tail of the distribution longer or more drawn out?
β€’ Positive/Right skew
β€’ Negative/Left skew
β€’ Zero skewness means symmetric distribution.
Skewness
Summarizing Data for Two Categorical
Variables
β€’ Tabular
β€’ Crosstabulation
β€’ Graphical
β€’ Side-by-side bar chart
β€’ Stacked bar chart
Crosstabulation
Strongly agree Agree Neutral Disagree Strongly disagree Grand Total
Elderly 0 0 0 0 3 3
Middle-aged 4 6 5 2 4 21
Young 1 1 1 2 1 6
Grand Total 5 7 6 4 8 30
Side-by-side Bar Chart
0
1
2
3
4
5
6
7
Strongly agree Agree Neutral Disagree Strongly disagree
Frequency
Opinions
Opinions vs. age categories
Elderly Middle-aged Young
Stacked Bar Chart
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Strongly agree Agree Neutral Disagree Strongly disagree
Percentage
Opinions
Opinions vs. age categories
Elderly Middle-aged Young
Scatterplot: Visualizing the Relationship
Between Two Quantitative Variables
$0
$10,000
$20,000
$30,000
$40,000
$50,000
$60,000
$70,000
$80,000
$90,000
0 10 20 30 40 50 60 70
Salary
Age (years)
Salary vs. Age
Creating Effective Graphical Displays
β€’ Give the display a clear and concise title.
β€’ Keep the display simple.
β€’ Clearly label each axis and provide the units of measure.
β€’ If colors are used, make sure they are distinct.
β€’ If multiple colors or line types are used, provide a legend.
Statistical Inference (Recap)
Population
Sample
Population parameter
E.g., Population average income πœ‡
Draw
Infer
Sample statistic
E.g., Sample average income π‘₯
A sample statistic is a point estimator of the corresponding
population parameter.
Descriptive Statistics: Numerical Measures
β€’ Measures of location:
β€’ Measures of central location: (A
single number which indicates a
typical value of the data.)
β€’ Sample mean
β€’ Sample median
β€’ Sample mode
β€’ Sample percentiles
β€’ Sample quartiles
β€’ Measures of variability: (A single
number which indicates the
variability in the data.)
β€’ Sample range
β€’ Sample IQR
β€’ Sample variance
β€’ Sample standard deviation
β€’ Measures of distribution shape: (A
single number which lets us know
the shape of the distribution of the
data.)
β€’ Skewness
β€’ Kurtosis
Some Common Notation
β€’ Let π‘₯ represent a variable of interest.
β€’ Let 𝑛 be the number of observations in the sample. This is the sample
size.
β€’ Let π‘₯𝑖 be the π‘–π‘‘β„Ž observation.
β€’ Let 𝑁 be the number of observations in the population. This is the
size of the population.
Measures of Location
β€’ Measures of central location: (A single number which indicates a
typical value of the data.)
β€’ Sample mean
β€’ Sample median
β€’ Sample mode
β€’ Sample percentiles
β€’ Sample quartiles
Sample Mean
Sample mean π‘₯ = 𝑖=1
𝑛
π‘₯𝑖
𝑛
Population mean πœ‡ = 𝑖=1
𝑁
π‘₯𝑖
𝑁
Sample Median
β€’ The median of a data set is the value in the middle when the data
items are arranged in ascending order.
β€’ The median divides the dataset into two parts, each with
approximately 50% of observations.
β€’ Arrange the data in ascending order (smallest value to largest value).
β€’ For an odd number of observations, the median is the middle value.
β€’ For an even number of observations, the median is the average of the two
middle values.
Sample Mode
β€’ The mode of a data set is the value that occurs with greatest
frequency.
Sample Percentile
β€’ The π‘π‘‘β„Ž
percentile is a value such that at least 𝒑 percent of the
observations are less than or equal to this value and at least (𝟏𝟎𝟎 βˆ’
𝒑) percent of the observations are greater than or equal to this value
Sample Percentile
β€’ Arrange the data in ascending order.
β€’ Location of the π‘π‘‘β„Ž percentile:
𝐿𝑝 =
𝑝
100
(𝑛 + 1)
Sample Quartiles
β€’ The quartiles divide the dataset into four parts, each with
approximately 25% of observations.
β€’ First Quartile 𝑄1 = 25th Percentile
β€’ Second Quartile 𝑄2 = 50th Percentile
β€’ Third Quartile 𝑄3 = 75th Percentile
Measures of Variability
β€’ Measures of variability: (A single number which indicates the
variability in the data.)
β€’ Sample range
β€’ Sample IQR
β€’ Sample variance
β€’ Sample standard deviation
Sample Range
Sample Range = Largest value – Smallest Value
Sample Interquartile Range (IQR)
𝐼𝑄𝑅 = 𝑄3 βˆ’ 𝑄1
Box Plot
Q1
Median
Q3
Max value less
than inner fence
Min value greater
than inner fence
Q3 + 1.5*IQR
Inner fence
Q3 + 3*IQR
Outer fence
Q1 – 1.5*IQR
Inner fence
Q1 – 3*IQR
Outer fence
Major outlier Minor outlier
Sample Variance
Sample variance 𝑠2 = 𝑖=1
𝑛
π‘₯π‘–βˆ’π‘₯ 2
π‘›βˆ’1
Population variance 𝜎2
= 𝑖=1
𝑁
π‘₯π‘–βˆ’π‘₯ 2
𝑁
Sample Standard Deviation
Sample standard deviation 𝑠 = 𝑠2
Sample standard deviation 𝜎 = 𝜎2
Chebyshev’s Theorem
β€’ At least (1 βˆ’
1
𝑧2) of the data values must be within 𝑧 standard
deviations of the mean, where 𝑧 is any value greater than 1.
Suppose that you are interested in analyzing the amount of time spent
by users browsing through Swiggy before they come to a decision
about what to order. You know that the average time spent browsing is
6.9 minutes. Suppose that the standard deviation is 1.2 minutes.
β€’ What can you say about the percentage of users who spend between
4.5 minutes and 9.3 minutes browsing Swiggy?
β€’ What can you say about the percentage of users who spend between
5.4 minutes and 9.3 minutes browsing Swiggy?
Measures of Association Between Two
Variables
β€’ Covariance
β€’ Correlation
Covariance
β€’ Covariance is a descriptive measure of the strength of linear association
between two variables.
Sample covariance 𝑠π‘₯𝑦 = 𝑖=1
𝑛
π‘₯π‘–βˆ’π‘₯ π‘¦π‘–βˆ’π‘¦
π‘›βˆ’1
Population Covariance 𝜎π‘₯𝑦 = 𝑖=1
𝑁
π‘₯π‘–βˆ’πœ‡π‘₯ π‘¦π‘–βˆ’πœ‡π‘¦
𝑁
β€’ +ve value οƒ  +ve relationship
β€’ -ve value οƒ  -ve relationship
β€’ Sensitive to units of measurement of the variables!
Correlation
β€’ Correlation coefficient is a dimensionless measure of the strength of linear association
between two variables.
Sample correlation coefficient π‘Ÿπ‘₯𝑦 =
𝑠π‘₯𝑦
𝑠π‘₯𝑠𝑦
Population correlation coefficient 𝜌π‘₯𝑦 =
𝜎π‘₯𝑦
𝜎π‘₯πœŽπ‘¦
β€’ Bounded between [-1, 1]
β€’ Values close to 0 indicate weak linear relationship.
β€’ Values close to 1 indicate strong positive linear relationship.
β€’ Values close to -1 indicate strong negative linear relationship.

More Related Content

Similar to Session 3&4.pptx

Ch1 The Nature of Statistics
Ch1 The Nature of StatisticsCh1 The Nature of Statistics
Ch1 The Nature of StatisticsFarhan Alfin
Β 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesParag Shah
Β 
Chapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.pptChapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.pptNurinaSWGotami
Β 
Group-4-Report-Frequency-Distribution.ppt
Group-4-Report-Frequency-Distribution.pptGroup-4-Report-Frequency-Distribution.ppt
Group-4-Report-Frequency-Distribution.pptNectorMoradaRapsingB
Β 
Presentation of data
Presentation of dataPresentation of data
Presentation of dataDhruvPatel1020
Β 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture PackShaun Cochrane
Β 
Statistical analysis
Statistical  analysisStatistical  analysis
Statistical analysisPrincy Francis M
Β 
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxLecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxshakirRahman10
Β 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributionsjasondroesch
Β 
Descriptive statistics -review(2)
Descriptive statistics -review(2)Descriptive statistics -review(2)
Descriptive statistics -review(2)Hanimarcelo slideshare
Β 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
Β 
Univariate analysis:Medical statistics Part IV
Univariate analysis:Medical statistics Part IVUnivariate analysis:Medical statistics Part IV
Univariate analysis:Medical statistics Part IVRamachandra Barik
Β 
2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdfYomifDeksisaHerpa
Β 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
Β 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6Ruru Chowdhury
Β 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive StatisticsSanju Rusara Seneviratne
Β 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxSailajaReddyGunnam
Β 

Similar to Session 3&4.pptx (20)

Ch1 The Nature of Statistics
Ch1 The Nature of StatisticsCh1 The Nature of Statistics
Ch1 The Nature of Statistics
Β 
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical SciencesExploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Exploratory Data Analysis for Biotechnology and Pharmaceutical Sciences
Β 
Chapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.pptChapter 3 Ken Black 2.ppt
Chapter 3 Ken Black 2.ppt
Β 
Group-4-Report-Frequency-Distribution.ppt
Group-4-Report-Frequency-Distribution.pptGroup-4-Report-Frequency-Distribution.ppt
Group-4-Report-Frequency-Distribution.ppt
Β 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
Β 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
Β 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
Β 
Statistical analysis
Statistical  analysisStatistical  analysis
Statistical analysis
Β 
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptxLecture 3 Measures of Central Tendency and Dispersion.pptx
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Β 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
Β 
Descriptive statistics -review(2)
Descriptive statistics -review(2)Descriptive statistics -review(2)
Descriptive statistics -review(2)
Β 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
Β 
Univariate analysis:Medical statistics Part IV
Univariate analysis:Medical statistics Part IVUnivariate analysis:Medical statistics Part IV
Univariate analysis:Medical statistics Part IV
Β 
2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf
Β 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
Β 
Stats - Intro to Quantitative
Stats -  Intro to Quantitative Stats -  Intro to Quantitative
Stats - Intro to Quantitative
Β 
Statistics.ppt
Statistics.pptStatistics.ppt
Statistics.ppt
Β 
Statr sessions 4 to 6
Statr sessions 4 to 6Statr sessions 4 to 6
Statr sessions 4 to 6
Β 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
Β 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
Β 

Recently uploaded

Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
Β 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
Β 
Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
Β 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
Β 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
Β 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
Β 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
Β 
VIP Call Girls In Saharaganj ( Lucknow ) πŸ” 8923113531 πŸ” Cash Payment (COD) πŸ‘’
VIP Call Girls In Saharaganj ( Lucknow  ) πŸ” 8923113531 πŸ”  Cash Payment (COD) πŸ‘’VIP Call Girls In Saharaganj ( Lucknow  ) πŸ” 8923113531 πŸ”  Cash Payment (COD) πŸ‘’
VIP Call Girls In Saharaganj ( Lucknow ) πŸ” 8923113531 πŸ” Cash Payment (COD) πŸ‘’anilsa9823
Β 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
Β 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
Β 
Best Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaBest Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaShree Krishna Exports
Β 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
Β 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
Β 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
Β 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
Β 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdftbatkhuu1
Β 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
Β 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
Β 

Recently uploaded (20)

Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
Β 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
Β 
VVVIP Call Girls In Greater Kailash ➑️ Delhi ➑️ 9999965857 πŸš€ No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➑️ Delhi ➑️ 9999965857 πŸš€ No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➑️ Delhi ➑️ 9999965857 πŸš€ No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➑️ Delhi ➑️ 9999965857 πŸš€ No Advance 24HRS...
Β 
Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow πŸ’‹ Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Β 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
Β 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
Β 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Β 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
Β 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
Β 
VIP Call Girls In Saharaganj ( Lucknow ) πŸ” 8923113531 πŸ” Cash Payment (COD) πŸ‘’
VIP Call Girls In Saharaganj ( Lucknow  ) πŸ” 8923113531 πŸ”  Cash Payment (COD) πŸ‘’VIP Call Girls In Saharaganj ( Lucknow  ) πŸ” 8923113531 πŸ”  Cash Payment (COD) πŸ‘’
VIP Call Girls In Saharaganj ( Lucknow ) πŸ” 8923113531 πŸ” Cash Payment (COD) πŸ‘’
Β 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Β 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
Β 
Best Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaBest Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in India
Β 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
Β 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
Β 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
Β 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
Β 
A305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdfA305_A2_file_Batkhuu progress report.pdf
A305_A2_file_Batkhuu progress report.pdf
Β 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
Β 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Β 

Session 3&4.pptx

  • 2. Descriptive Statistics β€’ Tabular, graphical, or numerical summaries of data. Age Mean 42.57 Median 40 Mode 40 Standard Deviation 10.63 Sample Variance 113.01 Range 44 Minimum 21 Maximum 65 Frequency Female 12 Male 18 Grand Total 30 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 Frequency Opinion Bar Chart for Opinions
  • 3. Summarizing Data for Categorical Variables β€’ Let us focus on Tabular and Graphical summaries first. We will deal with numerical summaries later. β€’ Tabular: β€’ Frequency distribution β€’ Relative frequency distribution β€’ Percent frequency distribution β€’ Graphical: β€’ Bar chart β€’ Pie chart
  • 4. Frequency Distribution β€’ A frequency distribution is a tabular summary of data showing the number (frequency) of observations in each of several non- overlapping categories or classes. Opinion Frequency Strongly disagree 8 Disagree 4 Neutral 6 Agree 7 Strongly agree 5 Grand Total 30
  • 5. Relative Frequency Distribution Relative frequency of a class = Frequency of the class Total number of observations Percent frequency of a class = Frequency of the class Total number of observations Γ— 100 %
  • 6. Opinion Frequency Relative frequency Percent Frequency Strongly disagree 8 0.27 27% Disagree 4 0.13 13% Neutral 6 0.20 20% Agree 7 0.23 23% Strongly agree 5 0.17 17% Grand Total 30 1.00 100%
  • 7. Bar Chart 0 5 10 15 20 25 Elderly Middle-aged Young FREQUENCY AGE CATEGORY Number of people in each age category
  • 8. Pie Chart Age distribution of people Elderly Middle-aged Young
  • 9. Summarizing Data for Quantitative Variables β€’ Let us focus on Tabular and Graphical summaries first. We will deal with numerical summaries later. β€’ Tabular: β€’ Frequency distribution β€’ Relative frequency distribution β€’ Percent frequency distribution β€’ Graphical: β€’ Histogram
  • 10. Frequency Distribution β€’ We need to bin/bucket the quantitative variable of interest. β€’ Three Steps: 1. Determine the number of nonoverlapping classes. 2. Determine the width of each class. 3. Determine the class limits. β€’ Choosing the number of classes is tricky! It is done by trial and error. β€’ Five to twenty classes are preferred. (Not too few, not too many, just enough to informatively show the variation in the frequencies.)
  • 11. Frequency Distribution Approximate class width = Largest data value βˆ’ Smallest data value Number of classes
  • 12. Frequency Distribution Class Frequency [31000, 35200] 1 (35200, 39400] 3 (39400, 43600] 2 (43600, 47800] 7 (47800, 52000] 3 (52000, 56200] 4 (56200, 60400] 3 (60400, 64600] 4 (64600, 68800] 1 (68800, 73000] 0 (73000, 77200] 0 (77200, 81400] 2
  • 13. Relative/Percent Frequency Class Frequency Rel. Freq. Perc. Freq. [31000, 35200] 1 0.033 3.33 (35200, 39400] 3 0.100 10.00 (39400, 43600] 2 0.067 6.67 (43600, 47800] 7 0.233 23.33 (47800, 52000] 3 0.100 10.00 (52000, 56200] 4 0.133 13.33 (56200, 60400] 3 0.100 10.00 (60400, 64600] 4 0.133 13.33 (64600, 68800] 1 0.033 3.33 (68800, 73000] 0 0.000 0.00 (73000, 77200] 0 0.000 0.00 (77200, 81400] 2 0.067 6.67
  • 15. Skewness β€’ To which side is the tail of the distribution longer or more drawn out? β€’ Positive/Right skew β€’ Negative/Left skew β€’ Zero skewness means symmetric distribution.
  • 17. Summarizing Data for Two Categorical Variables β€’ Tabular β€’ Crosstabulation β€’ Graphical β€’ Side-by-side bar chart β€’ Stacked bar chart
  • 18. Crosstabulation Strongly agree Agree Neutral Disagree Strongly disagree Grand Total Elderly 0 0 0 0 3 3 Middle-aged 4 6 5 2 4 21 Young 1 1 1 2 1 6 Grand Total 5 7 6 4 8 30
  • 19. Side-by-side Bar Chart 0 1 2 3 4 5 6 7 Strongly agree Agree Neutral Disagree Strongly disagree Frequency Opinions Opinions vs. age categories Elderly Middle-aged Young
  • 20. Stacked Bar Chart 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Strongly agree Agree Neutral Disagree Strongly disagree Percentage Opinions Opinions vs. age categories Elderly Middle-aged Young
  • 21. Scatterplot: Visualizing the Relationship Between Two Quantitative Variables $0 $10,000 $20,000 $30,000 $40,000 $50,000 $60,000 $70,000 $80,000 $90,000 0 10 20 30 40 50 60 70 Salary Age (years) Salary vs. Age
  • 22.
  • 23. Creating Effective Graphical Displays β€’ Give the display a clear and concise title. β€’ Keep the display simple. β€’ Clearly label each axis and provide the units of measure. β€’ If colors are used, make sure they are distinct. β€’ If multiple colors or line types are used, provide a legend.
  • 24. Statistical Inference (Recap) Population Sample Population parameter E.g., Population average income πœ‡ Draw Infer Sample statistic E.g., Sample average income π‘₯ A sample statistic is a point estimator of the corresponding population parameter.
  • 25. Descriptive Statistics: Numerical Measures β€’ Measures of location: β€’ Measures of central location: (A single number which indicates a typical value of the data.) β€’ Sample mean β€’ Sample median β€’ Sample mode β€’ Sample percentiles β€’ Sample quartiles β€’ Measures of variability: (A single number which indicates the variability in the data.) β€’ Sample range β€’ Sample IQR β€’ Sample variance β€’ Sample standard deviation β€’ Measures of distribution shape: (A single number which lets us know the shape of the distribution of the data.) β€’ Skewness β€’ Kurtosis
  • 26. Some Common Notation β€’ Let π‘₯ represent a variable of interest. β€’ Let 𝑛 be the number of observations in the sample. This is the sample size. β€’ Let π‘₯𝑖 be the π‘–π‘‘β„Ž observation. β€’ Let 𝑁 be the number of observations in the population. This is the size of the population.
  • 27. Measures of Location β€’ Measures of central location: (A single number which indicates a typical value of the data.) β€’ Sample mean β€’ Sample median β€’ Sample mode β€’ Sample percentiles β€’ Sample quartiles
  • 28. Sample Mean Sample mean π‘₯ = 𝑖=1 𝑛 π‘₯𝑖 𝑛 Population mean πœ‡ = 𝑖=1 𝑁 π‘₯𝑖 𝑁
  • 29. Sample Median β€’ The median of a data set is the value in the middle when the data items are arranged in ascending order. β€’ The median divides the dataset into two parts, each with approximately 50% of observations. β€’ Arrange the data in ascending order (smallest value to largest value). β€’ For an odd number of observations, the median is the middle value. β€’ For an even number of observations, the median is the average of the two middle values.
  • 30. Sample Mode β€’ The mode of a data set is the value that occurs with greatest frequency.
  • 31. Sample Percentile β€’ The π‘π‘‘β„Ž percentile is a value such that at least 𝒑 percent of the observations are less than or equal to this value and at least (𝟏𝟎𝟎 βˆ’ 𝒑) percent of the observations are greater than or equal to this value
  • 32. Sample Percentile β€’ Arrange the data in ascending order. β€’ Location of the π‘π‘‘β„Ž percentile: 𝐿𝑝 = 𝑝 100 (𝑛 + 1)
  • 33. Sample Quartiles β€’ The quartiles divide the dataset into four parts, each with approximately 25% of observations. β€’ First Quartile 𝑄1 = 25th Percentile β€’ Second Quartile 𝑄2 = 50th Percentile β€’ Third Quartile 𝑄3 = 75th Percentile
  • 34.
  • 35. Measures of Variability β€’ Measures of variability: (A single number which indicates the variability in the data.) β€’ Sample range β€’ Sample IQR β€’ Sample variance β€’ Sample standard deviation
  • 36. Sample Range Sample Range = Largest value – Smallest Value
  • 37. Sample Interquartile Range (IQR) 𝐼𝑄𝑅 = 𝑄3 βˆ’ 𝑄1
  • 38. Box Plot Q1 Median Q3 Max value less than inner fence Min value greater than inner fence Q3 + 1.5*IQR Inner fence Q3 + 3*IQR Outer fence Q1 – 1.5*IQR Inner fence Q1 – 3*IQR Outer fence Major outlier Minor outlier
  • 39. Sample Variance Sample variance 𝑠2 = 𝑖=1 𝑛 π‘₯π‘–βˆ’π‘₯ 2 π‘›βˆ’1 Population variance 𝜎2 = 𝑖=1 𝑁 π‘₯π‘–βˆ’π‘₯ 2 𝑁
  • 40. Sample Standard Deviation Sample standard deviation 𝑠 = 𝑠2 Sample standard deviation 𝜎 = 𝜎2
  • 41. Chebyshev’s Theorem β€’ At least (1 βˆ’ 1 𝑧2) of the data values must be within 𝑧 standard deviations of the mean, where 𝑧 is any value greater than 1.
  • 42. Suppose that you are interested in analyzing the amount of time spent by users browsing through Swiggy before they come to a decision about what to order. You know that the average time spent browsing is 6.9 minutes. Suppose that the standard deviation is 1.2 minutes. β€’ What can you say about the percentage of users who spend between 4.5 minutes and 9.3 minutes browsing Swiggy? β€’ What can you say about the percentage of users who spend between 5.4 minutes and 9.3 minutes browsing Swiggy?
  • 43. Measures of Association Between Two Variables β€’ Covariance β€’ Correlation
  • 44. Covariance β€’ Covariance is a descriptive measure of the strength of linear association between two variables. Sample covariance 𝑠π‘₯𝑦 = 𝑖=1 𝑛 π‘₯π‘–βˆ’π‘₯ π‘¦π‘–βˆ’π‘¦ π‘›βˆ’1 Population Covariance 𝜎π‘₯𝑦 = 𝑖=1 𝑁 π‘₯π‘–βˆ’πœ‡π‘₯ π‘¦π‘–βˆ’πœ‡π‘¦ 𝑁 β€’ +ve value οƒ  +ve relationship β€’ -ve value οƒ  -ve relationship β€’ Sensitive to units of measurement of the variables!
  • 45.
  • 46.
  • 47.
  • 48. Correlation β€’ Correlation coefficient is a dimensionless measure of the strength of linear association between two variables. Sample correlation coefficient π‘Ÿπ‘₯𝑦 = 𝑠π‘₯𝑦 𝑠π‘₯𝑠𝑦 Population correlation coefficient 𝜌π‘₯𝑦 = 𝜎π‘₯𝑦 𝜎π‘₯πœŽπ‘¦ β€’ Bounded between [-1, 1] β€’ Values close to 0 indicate weak linear relationship. β€’ Values close to 1 indicate strong positive linear relationship. β€’ Values close to -1 indicate strong negative linear relationship.