SlideShare a Scribd company logo
1 of 38
APPLIED STATISTICS (FOR
HUMANITIES)
Muhammad Ghazi
Spring 2023
Lecture 3: Descriptive Statistics
WHAT ARE
DESCRIPTIVE
STATISTICS?
Descriptive statistics are methods to
summarize data
Allows us to tell something about the
data without showing the full dataset
In practice, first thing we do when we
get a dataset
Helps better understand what we’re
dealing with
Analysis when we
understand the data well
Analysis when we don’t
understand the data well
DESCRIPTIVE
STATISTICS WE
WILL STUDY
Counts and percentages
Central tendency
• Mean
• Median
• Mode
Measures of spread
• Variance
• Standard deviation
• Percentiles
DESCRIPTIVE STATISTICS (A ROUGH
FRAMEWORK)
Qualitative
variables
Counts
Percentages
Quantitative
variables
Central
tendency
Mean
Median
Mode
Spread
Variance
Std Dev
Percentiles
COUNTS AND PERCENTAGES
• Most basic way to describe qualitative / categorical variables
• Counts are the number of observations in each category
• Percentages express these as a fraction of total observations
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑥 =
𝑁𝑜. 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑡ℎ𝑎𝑡 𝑎𝑟𝑒 𝑥
𝑇𝑜𝑡𝑎𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
COUNTS AND PERCENTAGES: EXAMPLE
• Consider the following dataset
• How can we describe gender? Political affiliation?
ID Name Age Gender Income Political affiliation Voted in the last election?
1 Alfred 67 Male $ 28,500 Liberal Yes
2 Peter 27 Male $ 275,000 Conservative Yes
3 George 18 Male $ 31,000 Liberal No
4 Jannet 38 Female $ 39,000 Liberal Yes
5 Meagan 19 Female $ 52,000 Moderate Yes
6 Ivan 35 Male $ 27,000 Conservative No
7 Jenny 78 Female $ 38,000 Conservative Yes
8 Sam 43 Male $ 33,000 Conservative Yes
9 Emily 39 Female $ 37,000 Moderate No
10 Hellen 57 Female $ 43,000 Liberal Yes
COUNTS AND PERCENTAGES: EXAMPLE
• There are 5 males and 5 females in the previous dataset
• The most common way to formally present counts and percentages is to
use tables:
Gender No. of people Percentage
Male 5 50%
Female 5 50%
TOTAL: 10 100%
CROSS TABULATIONS
• Often we will need to summarize information from two categorical
variables
• For example, how many males are politically liberal?
• This type of table is called a cross tabulation (cross tab)
• One variable will be in rows, while the other in columns
• Consider the cross tab of political affiliation and gender
Male Female
Liberal 2 2
Moderate 3 1
Conservative 0 2
Cross tabulating
counts
Cross tabulating
counts
Cross tabulating
percentages
CROSS TABULATIONS: PERCENTAGES
• Counts in cross tabulations are simple
• Percentages are sometimes not obvious
• What percentage we use depends on what our frame of reference is
• For example, are we asking what percentage of males are liberal leaning?
• In this case we will divide the no. of males who are liberal by total no. of
males
• Or what percentage of liberals are males?
• In this case we will divide the no. of males who are liberal by total no. of
liberal people
• In practice, both are correct and which one we use depends on the context
COUNTS AND PERCENTAGES: IN EXCEL
Covered in the excel tutorial
CENTRAL TENDENCY
• Often the most informative to describe numerical variable is to describe
where the ‘center’ is
• The most common way of computing the center is to either use:
• Mean
• Median
• Mode (less common)
MEANS
• A mean is a simple average of all numbers
𝑥 =
𝑖=1
𝑖=𝑁
𝑥𝑖
𝑛
1. Add up all the numbers in the variable
2. Divide by the number of observations in the variable
MEAN: EXAMPLE
• Calculate the mean of 10,27,12,9,18,21,92
MEAN: EXAMPLE
• Calculate the mean of 10,27,12,9,18,21,92
𝑀𝑒𝑎𝑛 𝑥 =
10 + 12 + 9 + 18 + 21 + 27 + 92
7
=
189
7
= 27
MEDIAN
• The number in the middle
1. Check if there are an odd or even number of observations
2. Order the numbers from smallest to largest.
3. If the data set contains an odd number of numbers, the one exactly in
the middle is the median.
4. If the data set contains an even number of numbers, take the two
numbers that appear exactly in the middle and average them to find the
median.
MEDIAN: EXAMPLES
• Calculate the median of 10,27,12,9,18,21,92
• Calculate the median of 21,15,20,14
MEDIAN: EXAMPLES
• Calculate the median of 10,27,12,9,18,21,92
1. There are 7 numbers (odd no.)
2. Order: 9,10,12,18,21,27,92
3. Middle number (median) is 9,10,12,18,21,27,92
• Calculate the median of 21,15,20,14
1. There are 4 numbers (even no.)
2. Order: 14,15,20,21
3. Middle numbers are 14,15,20,21
4. Take their average to get median:
15+20
2
= 17.5
MODE
• The number that occurs the most number of times
• Calculate the modal value of: 3,3,3,3,3,4,5,6,3,2,1
• Normally used for categorical variables
Category Frequency
A 10
B 21
C 5
PICKING BEST
MEASURE OF CENTER
• Calculating mean, median or mode is
easy
• Picking the right measure is the tricky
bit
Mean
Median
Mode
WHEN TO USE MEAN OR MEDIAN (OR
MODE)
Mean
The default method
+ Universal and intuitive
+ Mathematically sound
(we'll see later)
- Susceptible to outliers
Median
Report when data is very
skewed or has noticeable
outliers. How do we
know?
Incomes are usually
reported as median
Mode
Less common
Report when categories
OR one dominant figure
Usually with other
measures
KEEP CONTEXT IN MIND
BE FLEXIBLE!
MEAN, MEDIAN OR MODE
• Let’s go back to our dataset
• What’s the best central tendency measure to report for:
• Income
• Age
• Political affiliation
ID Name Age Gender Income Political affiliation Voted in the last election?
1 Alfred 67 Male $ 28,500 Liberal Yes
2 Peter 27 Male $ 275,000 Conservative Yes
3 George 18 Male $ 31,000 Liberal No
4 Jannet 38 Female $ 39,000 Liberal Yes
5 Meagan 19 Female $ 52,000 Moderate Yes
6 Ivan 35 Male $ 27,000 Conservative No
7 Jenny 78 Female $ 38,000 Conservative Yes
8 Sam 43 Male $ 33,000 Conservative Yes
9 Emily 39 Female $ 37,000 Moderate No
10 Hellen 57 Female $ 43,000 Liberal Yes
MEAN, MEDIAN OR MODE
• Let’s go back to our dataset
• What’s the best central tendency measure to report for:
• Income: Median
• Age: Mean or median
• Political affiliation: Mode
ID Name Age Gender Income Political affiliation Voted in the last election?
1 Alfred 67 Male $ 28,500 Liberal Yes
2 Peter 27 Male $ 275,000 Conservative Yes
3 George 18 Male $ 31,000 Liberal No
4 Jannet 38 Female $ 39,000 Liberal Yes
5 Meagan 19 Female $ 52,000 Moderate Yes
6 Ivan 35 Male $ 27,000 Conservative No
7 Jenny 78 Female $ 38,000 Conservative Yes
8 Sam 43 Male $ 33,000 Conservative Yes
9 Emily 39 Female $ 37,000 Moderate No
10 Hellen 57 Female $ 43,000 Liberal Yes
EXERCISE: BEST MEASURE OF CENTER
What is the best measure of central tendency for the following:
1. Length of Christopher Nolan movies in minutes
2. U.S. household income
3. Platform with most engagement
MEASURES OF SPREAD
• Consider two sets of numbers
Set A : { 1 4 6 7 12 }
Set B: { 5 6 6 6 7 }
• What is the mean of each set?
MEASURED OF SPREAD
Set A : { 1 4 6 7 12 }
Set B: { 5 6 6 6 7 }
• If the mean of both the sets is the same, what’s the difference between
the two?
• Set A is obviously more ‘spread out’ than Set B
• Is there some way we can quantify this?
MEASURED OF SPREAD
• We can try taking the difference of each number in the set from the mean of
the set
• And sum up these differences
• But positive and negative differences will cancel out
Set A Mean
Deviation (Difference
from mean)
1 6 -5
4 6 -2
6 6 0
7 6 1
12 6 6
SUM: 0
MEASURED OF SPREAD
• Instead, we can take the ‘squared differences from mean’
• And sum them up
• Divide by number of observations (minus 1) 66/4 = 16.5
• This is called variance
Set A Mean Difference from mean Squared difference from mean
1 6 -5 25
4 6 -2 4
6 6 0 0
7 6 1 1
12 6 6 36
SUM: 0 66
*in practice we always divide by no. of observations - 1 to account for the fact that this is a sample
(more on this later in the course)
MEASURED OF SPREAD
• Do the same for Set B
• Divide by number of observations 2/4 = 0.5
Set B Mean Difference from mean Squared difference from mean
5 6 -1 1
6 6 0 0
6 6 0 0
6 6 0 0
7 6 1 1
SUM: 0 2
MEASURED OF SPREAD
Set A : { 1 4 6 7 12 }
Set B: { 5 6 6 6 7 }
• The variance for set A is 16.5 but for Set B is only 0.5
• This means Set A is more spread out than Set B
MEASURED OF
SPREAD
Variances don’t mean much
Instead, if we take it’s root √, we get a
standard deviation
Standard deviation is the universal way of
quantifying spread
A high standard deviation means that
observations are spread away from the mean
A low standard deviation indicates
observations are close to the mean
RECAP OF VARIANCE AND STANDARD
DEVIATION
1. Find the mean of the variable (𝑥)
2. Subtract the mean from each value (xi − 𝑥)
3. Square this difference xi − 𝑥 2
4. Sum this squared difference for all values xi − 𝑥
2
5. Divide by the number of observations minus 1* to get the variance:
Variance =
xi−𝑥
2
𝑛−1
6. Take the square root of variance to get the standard deviation
Standard Deviation = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
*We divide by n-1 instead of n as a standard practice to correct for the fact that the data was
collected as a sample (more on this later)
CAUTION:
ALWAYS
DIVIDE BY
N-1 FOR STD
DEV AND
VARIANCE!
CAUTION:
ALWAYS
DIVIDE BY
N-1 FOR STD
DEV AND
VARIANCE!
CAUTION:
ALWAYS
DIVIDE BY
N-1 FOR STD
DEV AND
VARIANCE!
EXERCISE:
• Calculate the standard deviation of the following
variable:
{10, 12, 23, 45, 120}
• On Paper
• On Excel

More Related Content

Similar to Lecture 3 - Descriptive statistics Spring 2023.pptx

Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic StatisticsYee Bee Choo
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptxjeyanthisivakumar
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbersUlster BOCES
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendencyjasondroesch
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyNilanjan Bhaumik
 
Day 4 normal curve and standard scores
Day 4 normal curve and standard scoresDay 4 normal curve and standard scores
Day 4 normal curve and standard scoresElih Sutisna Yanto
 
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhhpolar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhhNathanAndreiBoongali
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Statistics for machine learning shifa noorulain
Statistics for machine learning   shifa noorulainStatistics for machine learning   shifa noorulain
Statistics for machine learning shifa noorulainShifaNoorUlAin1
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionMayuri Joshi
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptDiptoKumerSarker1
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variabilityjasondroesch
 
Central tendency _dispersion
Central tendency _dispersionCentral tendency _dispersion
Central tendency _dispersionKirti Gupta
 
Measure of Central Tendency
Measure of Central Tendency Measure of Central Tendency
Measure of Central Tendency Umme Habiba
 
Ch2 Data Description
Ch2 Data DescriptionCh2 Data Description
Ch2 Data DescriptionFarhan Alfin
 

Similar to Lecture 3 - Descriptive statistics Spring 2023.pptx (20)

Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
 
3. Statistical Analysis.pptx
3. Statistical Analysis.pptx3. Statistical Analysis.pptx
3. Statistical Analysis.pptx
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbers
 
Quantitative research
Quantitative researchQuantitative research
Quantitative research
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Day 4 normal curve and standard scores
Day 4 normal curve and standard scoresDay 4 normal curve and standard scores
Day 4 normal curve and standard scores
 
DescriptiveStatistics.pdf
DescriptiveStatistics.pdfDescriptiveStatistics.pdf
DescriptiveStatistics.pdf
 
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhhpolar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
polar pojhjgfnbhggnbh hnhghgnhbhnhbjnhhhhhh
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistics for machine learning shifa noorulain
Statistics for machine learning   shifa noorulainStatistics for machine learning   shifa noorulain
Statistics for machine learning shifa noorulain
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
Measures of Variability
Measures of VariabilityMeasures of Variability
Measures of Variability
 
Statistics
StatisticsStatistics
Statistics
 
Central tendency _dispersion
Central tendency _dispersionCentral tendency _dispersion
Central tendency _dispersion
 
Measure of Central Tendency
Measure of Central Tendency Measure of Central Tendency
Measure of Central Tendency
 
Ch2 Data Description
Ch2 Data DescriptionCh2 Data Description
Ch2 Data Description
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Lecture 3 - Descriptive statistics Spring 2023.pptx

  • 1. APPLIED STATISTICS (FOR HUMANITIES) Muhammad Ghazi Spring 2023 Lecture 3: Descriptive Statistics
  • 2. WHAT ARE DESCRIPTIVE STATISTICS? Descriptive statistics are methods to summarize data Allows us to tell something about the data without showing the full dataset In practice, first thing we do when we get a dataset Helps better understand what we’re dealing with
  • 3. Analysis when we understand the data well Analysis when we don’t understand the data well
  • 4. DESCRIPTIVE STATISTICS WE WILL STUDY Counts and percentages Central tendency • Mean • Median • Mode Measures of spread • Variance • Standard deviation • Percentiles
  • 5. DESCRIPTIVE STATISTICS (A ROUGH FRAMEWORK) Qualitative variables Counts Percentages Quantitative variables Central tendency Mean Median Mode Spread Variance Std Dev Percentiles
  • 6. COUNTS AND PERCENTAGES • Most basic way to describe qualitative / categorical variables • Counts are the number of observations in each category • Percentages express these as a fraction of total observations 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑜𝑓 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑥 = 𝑁𝑜. 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑡ℎ𝑎𝑡 𝑎𝑟𝑒 𝑥 𝑇𝑜𝑡𝑎𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
  • 7. COUNTS AND PERCENTAGES: EXAMPLE • Consider the following dataset • How can we describe gender? Political affiliation? ID Name Age Gender Income Political affiliation Voted in the last election? 1 Alfred 67 Male $ 28,500 Liberal Yes 2 Peter 27 Male $ 275,000 Conservative Yes 3 George 18 Male $ 31,000 Liberal No 4 Jannet 38 Female $ 39,000 Liberal Yes 5 Meagan 19 Female $ 52,000 Moderate Yes 6 Ivan 35 Male $ 27,000 Conservative No 7 Jenny 78 Female $ 38,000 Conservative Yes 8 Sam 43 Male $ 33,000 Conservative Yes 9 Emily 39 Female $ 37,000 Moderate No 10 Hellen 57 Female $ 43,000 Liberal Yes
  • 8. COUNTS AND PERCENTAGES: EXAMPLE • There are 5 males and 5 females in the previous dataset • The most common way to formally present counts and percentages is to use tables: Gender No. of people Percentage Male 5 50% Female 5 50% TOTAL: 10 100%
  • 9. CROSS TABULATIONS • Often we will need to summarize information from two categorical variables • For example, how many males are politically liberal? • This type of table is called a cross tabulation (cross tab) • One variable will be in rows, while the other in columns • Consider the cross tab of political affiliation and gender Male Female Liberal 2 2 Moderate 3 1 Conservative 0 2
  • 12. CROSS TABULATIONS: PERCENTAGES • Counts in cross tabulations are simple • Percentages are sometimes not obvious • What percentage we use depends on what our frame of reference is • For example, are we asking what percentage of males are liberal leaning? • In this case we will divide the no. of males who are liberal by total no. of males • Or what percentage of liberals are males? • In this case we will divide the no. of males who are liberal by total no. of liberal people • In practice, both are correct and which one we use depends on the context
  • 13. COUNTS AND PERCENTAGES: IN EXCEL Covered in the excel tutorial
  • 14. CENTRAL TENDENCY • Often the most informative to describe numerical variable is to describe where the ‘center’ is • The most common way of computing the center is to either use: • Mean • Median • Mode (less common)
  • 15. MEANS • A mean is a simple average of all numbers 𝑥 = 𝑖=1 𝑖=𝑁 𝑥𝑖 𝑛 1. Add up all the numbers in the variable 2. Divide by the number of observations in the variable
  • 16. MEAN: EXAMPLE • Calculate the mean of 10,27,12,9,18,21,92
  • 17. MEAN: EXAMPLE • Calculate the mean of 10,27,12,9,18,21,92 𝑀𝑒𝑎𝑛 𝑥 = 10 + 12 + 9 + 18 + 21 + 27 + 92 7 = 189 7 = 27
  • 18. MEDIAN • The number in the middle 1. Check if there are an odd or even number of observations 2. Order the numbers from smallest to largest. 3. If the data set contains an odd number of numbers, the one exactly in the middle is the median. 4. If the data set contains an even number of numbers, take the two numbers that appear exactly in the middle and average them to find the median.
  • 19. MEDIAN: EXAMPLES • Calculate the median of 10,27,12,9,18,21,92 • Calculate the median of 21,15,20,14
  • 20. MEDIAN: EXAMPLES • Calculate the median of 10,27,12,9,18,21,92 1. There are 7 numbers (odd no.) 2. Order: 9,10,12,18,21,27,92 3. Middle number (median) is 9,10,12,18,21,27,92 • Calculate the median of 21,15,20,14 1. There are 4 numbers (even no.) 2. Order: 14,15,20,21 3. Middle numbers are 14,15,20,21 4. Take their average to get median: 15+20 2 = 17.5
  • 21. MODE • The number that occurs the most number of times • Calculate the modal value of: 3,3,3,3,3,4,5,6,3,2,1 • Normally used for categorical variables Category Frequency A 10 B 21 C 5
  • 22. PICKING BEST MEASURE OF CENTER • Calculating mean, median or mode is easy • Picking the right measure is the tricky bit Mean Median Mode
  • 23. WHEN TO USE MEAN OR MEDIAN (OR MODE) Mean The default method + Universal and intuitive + Mathematically sound (we'll see later) - Susceptible to outliers Median Report when data is very skewed or has noticeable outliers. How do we know? Incomes are usually reported as median Mode Less common Report when categories OR one dominant figure Usually with other measures KEEP CONTEXT IN MIND BE FLEXIBLE!
  • 24. MEAN, MEDIAN OR MODE • Let’s go back to our dataset • What’s the best central tendency measure to report for: • Income • Age • Political affiliation ID Name Age Gender Income Political affiliation Voted in the last election? 1 Alfred 67 Male $ 28,500 Liberal Yes 2 Peter 27 Male $ 275,000 Conservative Yes 3 George 18 Male $ 31,000 Liberal No 4 Jannet 38 Female $ 39,000 Liberal Yes 5 Meagan 19 Female $ 52,000 Moderate Yes 6 Ivan 35 Male $ 27,000 Conservative No 7 Jenny 78 Female $ 38,000 Conservative Yes 8 Sam 43 Male $ 33,000 Conservative Yes 9 Emily 39 Female $ 37,000 Moderate No 10 Hellen 57 Female $ 43,000 Liberal Yes
  • 25. MEAN, MEDIAN OR MODE • Let’s go back to our dataset • What’s the best central tendency measure to report for: • Income: Median • Age: Mean or median • Political affiliation: Mode ID Name Age Gender Income Political affiliation Voted in the last election? 1 Alfred 67 Male $ 28,500 Liberal Yes 2 Peter 27 Male $ 275,000 Conservative Yes 3 George 18 Male $ 31,000 Liberal No 4 Jannet 38 Female $ 39,000 Liberal Yes 5 Meagan 19 Female $ 52,000 Moderate Yes 6 Ivan 35 Male $ 27,000 Conservative No 7 Jenny 78 Female $ 38,000 Conservative Yes 8 Sam 43 Male $ 33,000 Conservative Yes 9 Emily 39 Female $ 37,000 Moderate No 10 Hellen 57 Female $ 43,000 Liberal Yes
  • 26. EXERCISE: BEST MEASURE OF CENTER What is the best measure of central tendency for the following: 1. Length of Christopher Nolan movies in minutes 2. U.S. household income 3. Platform with most engagement
  • 27. MEASURES OF SPREAD • Consider two sets of numbers Set A : { 1 4 6 7 12 } Set B: { 5 6 6 6 7 } • What is the mean of each set?
  • 28. MEASURED OF SPREAD Set A : { 1 4 6 7 12 } Set B: { 5 6 6 6 7 } • If the mean of both the sets is the same, what’s the difference between the two? • Set A is obviously more ‘spread out’ than Set B • Is there some way we can quantify this?
  • 29. MEASURED OF SPREAD • We can try taking the difference of each number in the set from the mean of the set • And sum up these differences • But positive and negative differences will cancel out Set A Mean Deviation (Difference from mean) 1 6 -5 4 6 -2 6 6 0 7 6 1 12 6 6 SUM: 0
  • 30. MEASURED OF SPREAD • Instead, we can take the ‘squared differences from mean’ • And sum them up • Divide by number of observations (minus 1) 66/4 = 16.5 • This is called variance Set A Mean Difference from mean Squared difference from mean 1 6 -5 25 4 6 -2 4 6 6 0 0 7 6 1 1 12 6 6 36 SUM: 0 66 *in practice we always divide by no. of observations - 1 to account for the fact that this is a sample (more on this later in the course)
  • 31. MEASURED OF SPREAD • Do the same for Set B • Divide by number of observations 2/4 = 0.5 Set B Mean Difference from mean Squared difference from mean 5 6 -1 1 6 6 0 0 6 6 0 0 6 6 0 0 7 6 1 1 SUM: 0 2
  • 32. MEASURED OF SPREAD Set A : { 1 4 6 7 12 } Set B: { 5 6 6 6 7 } • The variance for set A is 16.5 but for Set B is only 0.5 • This means Set A is more spread out than Set B
  • 33. MEASURED OF SPREAD Variances don’t mean much Instead, if we take it’s root √, we get a standard deviation Standard deviation is the universal way of quantifying spread A high standard deviation means that observations are spread away from the mean A low standard deviation indicates observations are close to the mean
  • 34. RECAP OF VARIANCE AND STANDARD DEVIATION 1. Find the mean of the variable (𝑥) 2. Subtract the mean from each value (xi − 𝑥) 3. Square this difference xi − 𝑥 2 4. Sum this squared difference for all values xi − 𝑥 2 5. Divide by the number of observations minus 1* to get the variance: Variance = xi−𝑥 2 𝑛−1 6. Take the square root of variance to get the standard deviation Standard Deviation = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 *We divide by n-1 instead of n as a standard practice to correct for the fact that the data was collected as a sample (more on this later)
  • 35. CAUTION: ALWAYS DIVIDE BY N-1 FOR STD DEV AND VARIANCE!
  • 36. CAUTION: ALWAYS DIVIDE BY N-1 FOR STD DEV AND VARIANCE!
  • 37. CAUTION: ALWAYS DIVIDE BY N-1 FOR STD DEV AND VARIANCE!
  • 38. EXERCISE: • Calculate the standard deviation of the following variable: {10, 12, 23, 45, 120} • On Paper • On Excel