SlideShare a Scribd company logo
Data Summary Metrics
Mean, Median, Mode and More
• Populations & Samples, Parameters & Statistics
• Summarizing the Data with a Metric
• Discrepancy and Error
• Estimating the Summary Metrics: Minimizing the Error
• Arithmetic Mean, Median, and Mode
• Geometric Mean, Harmonic Mean and Mid-Range
• Breakdown Points of the Arithmetic Mean and the Median
MandarGadre,July2013.
-- Mandar Gadre
(July 2013)
MandarGadre,July2013.
Population and Sample
The Whole Population
A “Sample” Dataset
Parameter: is a certain property of the population as a whole.
e.g. the median age of all Indian citizens.
Statistic: is an estimate of that property, drawn from the sample dataset.
e.g. estimated median, calculated from the ages of, say 1 million citizens.
MandarGadre,July2013.
Summarizing the Data
• If a certain property of every individual in the population has
only one value, no need to summarize!
• For non-identical values of the property (almost everywhere in
the real world), we look for a “Summary Statistic” such as a
mean.
• But how do we choose the summary statistic, S?
Xi, i = 1 to N
S
MandarGadre,July2013.
Discrepancy
• Discrepancy ei is the “deviation” of an individual data-point xi
from this “Summary Statistic”.
We take s as the candidate summary statistic.
You could define your own way of calculating discrepancy!
• Three common ways –
1. Comparison
Is the individual reading same as the candidate?
ei = 1, xi ≠ s
ei = 0, xi = s
2. Absolute difference from the candidate
ei = |xi – s|
3. Square of the difference from the candidate
ei = (xi – s)2
MandarGadre,July2013.
Error
• Error E is the aggregate of individual discrepancies ei of all the
individual data-points xi from the Candidate Summary Statistic s.
• E for the three types of discrepancies would be –
1. Comparison with the Candidate
E = ∑i ei, where
ei = 1, xi ≠ s; ei = 0, xi = s
2. Absolute difference from the Candidate
E = ∑i |xi – s|
3. Square of the difference from the Candidate
E = ∑i (xi – s)2
MandarGadre,July2013.
Calculating S
• We define S, Summary Statistic, as the value of s for which E is
minimized.
• We have given special names for the three types of Summary Statistics
arising from the three types of discrepancies:
Arising from Comparison:
S, such that the error
E = ∑ i ei, (where ei = 1, xi ≠ s; ei = 0, xi = s) is minimized.
It turns out that the value of s that occurs the most frequently will
minimize this error. There may be one or more such values.
We call this the Mode.
e.g. if we want to sell single size men’s t-shirts, we can get them made
with size equal to the mode.
MandarGadre,July2013.
Arising from Absolute Difference:
S, such that the error E = ∑ i |xi – s| is minimized.
This is similar to the absolute value function and the derivative is the
signum function!
To find the minimum, we make the derivative (signum function) zero –
which happens at the middle reading when all the data-points are
arranged in increasing order.
We call this the Median.
If we want to summarize income-per-household in a huge country like
India (data-set with severe outliers which do not require higher
weightage) we will use the median.
MandarGadre,July2013.
Arising from Square of Absolute Difference:
S, such that the error E = ∑i (xi – s)2 is minimized.
Making the derivative zero gives us –
∑i 2(xi – S) = 0 or N*S = ∑i xi or S = (∑i xi) / N
We call this the Arithmetic Mean.
If we want to summarize the height of children in a kindergarten class,
we may use the mean: the data is normally distributed and most likely
there aren’t any extreme outliers (though in such cases the median
and the mode are not any worse summary statistics to use).
+++
Mode is used while capturing categorical/nominal data.
Mean is used to capture the effect of extreme outliers.
Median is used for datasets with extreme outliers which need not be
given any higher weightage.
The outliers sway the arithmetic mean much more than they sway the
median, because of the square-of-the-distance.
MandarGadre,July2013.
Other Summary Statistics
• Geometric Mean:
Defined only for dataset with all positive numbers, it is the Nth
root of the product of all N data-points.
G = ( ∏ (xi) ) 1/N
It is used while summarizing/aggregating data with different
categories and scales involved. E.g. rating companies on various
metrics taken together.
Or where the data-points show compounding behavior e.g.
summarizing performance of a stock over the past N years.
MandarGadre,July2013.
Other Summary Statistics
• Harmonic Mean:
Defined only for dataset with all non-zero numbers, it is the
reciprocal of the arithmetic mean of reciprocals of xi.
H = 1 / (1/N(∑i (1/xi)) )
It is used while summarizing rates. e.g. the average speed of
aircraft between numerous Mumbai-London trips; or the
average rate (in ml/min) at which a blood donor fills a bag over
multiple visits.
MandarGadre,July2013.
Other Summary Statistics
• Mid-Range
Defined as the arithmetic mean of the maximum and minimum
data-points
Mid-Range = ½ (xmax + xmin)
This is one of the least efficient (since it ignores all the data-
points except for min and max) and the least robust (since it
only depends on the extreme data-points and will be swayed if
they are extreme outliers) statistics.
It is used in process control. e.g. where the process is tightly
controlled and the outliers are already handled/trimmed out.
MandarGadre,July2013.
Robust Statistics
• Robust Statistics are those summary statistics which are insensitive to
which sample we choose from the population; or to the presence of
contaminated/bad/incorrect data in that sample. ‘Breakdown Point’
represents the degree of robustness of a statistic.
• Breakdown Point is the largest proportion of contaminated data-
points (e.g. an arbitrarily large data-point) a statistic can handle
before yielding an absurd result (e.g. an arbitrarily large statistic).
• Since the arithmetic mean depends on all the values and is swayed by
changing even one value among N, its Breakdown Point is 0.
• The median is the strongest statistic, with its Breakdown Point at 50%.
(If more than 50% of the data is contaminated, a statistic cannot be defined
anyway since there is no way to distinguish between the actual underlying
distribution and the contaminated one.)

More Related Content

What's hot

Akhiesh maurya
Akhiesh mauryaAkhiesh maurya
Akhiesh maurya
Akhilesh Maurya
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- Complete
Irfan Yaqoob
 
Basic statisctis -Anandh Shankar
Basic statisctis -Anandh ShankarBasic statisctis -Anandh Shankar
Basic statisctis -Anandh Shankar
Anandh Shankar Sundararajan
 
Biology statistics made_simple_using_excel
Biology statistics made_simple_using_excelBiology statistics made_simple_using_excel
Biology statistics made_simple_using_excel
haramaya university
 
Introduction concepts of Statistics
Introduction concepts of StatisticsIntroduction concepts of Statistics
Introduction concepts of Statistics
Saurabh Patni
 
Basic understanding of Plots and diagrams used in data interpretation
 Basic understanding of Plots and diagrams used in data interpretation   Basic understanding of Plots and diagrams used in data interpretation
Basic understanding of Plots and diagrams used in data interpretation
Subedi Suraj
 
Introduction to statistics by Rehman Ali and its group
Introduction to statistics by Rehman Ali and its groupIntroduction to statistics by Rehman Ali and its group
Introduction to statistics by Rehman Ali and its group
rehman ali
 
Macromolecules
MacromoleculesMacromolecules
Macromolecules
ganctil
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1
Minal Jadeja
 
Introduction to Business Statistics
Introduction to Business StatisticsIntroduction to Business Statistics
Introduction to Business Statistics
Megha Mishra
 
Data presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative ResearchData presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative Research
Jimnaira Abanto
 
Bba 2001
Bba 2001Bba 2001
Bba 2001
Monie Joey
 
Visualizations in Exploratory Data Analysis
Visualizations in Exploratory Data AnalysisVisualizations in Exploratory Data Analysis
Visualizations in Exploratory Data Analysis
OluwatobiAdefami
 
Ses 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsSes 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statistics
metnashikiom2011-13
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Learnbay Datascience
 
Data Presentation
Data PresentationData Presentation
Data Presentation
cheergalsal
 
R for statistics session 1
R for statistics session 1R for statistics session 1
R for statistics session 1
Ashwini Mathur
 
A power point presentation on statistics
A power point presentation on statisticsA power point presentation on statistics
A power point presentation on statistics
Kriace Ward
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
KennyAnnGraceBatianc
 
Business statistics what and why
Business statistics what and whyBusiness statistics what and why
Business statistics what and why
dibasharmin
 

What's hot (20)

Akhiesh maurya
Akhiesh mauryaAkhiesh maurya
Akhiesh maurya
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- Complete
 
Basic statisctis -Anandh Shankar
Basic statisctis -Anandh ShankarBasic statisctis -Anandh Shankar
Basic statisctis -Anandh Shankar
 
Biology statistics made_simple_using_excel
Biology statistics made_simple_using_excelBiology statistics made_simple_using_excel
Biology statistics made_simple_using_excel
 
Introduction concepts of Statistics
Introduction concepts of StatisticsIntroduction concepts of Statistics
Introduction concepts of Statistics
 
Basic understanding of Plots and diagrams used in data interpretation
 Basic understanding of Plots and diagrams used in data interpretation   Basic understanding of Plots and diagrams used in data interpretation
Basic understanding of Plots and diagrams used in data interpretation
 
Introduction to statistics by Rehman Ali and its group
Introduction to statistics by Rehman Ali and its groupIntroduction to statistics by Rehman Ali and its group
Introduction to statistics by Rehman Ali and its group
 
Macromolecules
MacromoleculesMacromolecules
Macromolecules
 
Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1Introduction to statistics for social sciences 1
Introduction to statistics for social sciences 1
 
Introduction to Business Statistics
Introduction to Business StatisticsIntroduction to Business Statistics
Introduction to Business Statistics
 
Data presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative ResearchData presentation and interpretation I Quantitative Research
Data presentation and interpretation I Quantitative Research
 
Bba 2001
Bba 2001Bba 2001
Bba 2001
 
Visualizations in Exploratory Data Analysis
Visualizations in Exploratory Data AnalysisVisualizations in Exploratory Data Analysis
Visualizations in Exploratory Data Analysis
 
Ses 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statisticsSes 1 basic fundamentals of mathematics and statistics
Ses 1 basic fundamentals of mathematics and statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data Presentation
Data PresentationData Presentation
Data Presentation
 
R for statistics session 1
R for statistics session 1R for statistics session 1
R for statistics session 1
 
A power point presentation on statistics
A power point presentation on statisticsA power point presentation on statistics
A power point presentation on statistics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Business statistics what and why
Business statistics what and whyBusiness statistics what and why
Business statistics what and why
 

Similar to Data summary metrics

Measure of Central Tendency
Measure of Central TendencyMeasure of Central Tendency
Measure of Central Tendency
Basudev Sharma
 
Statistics XII Math Project.pdf
Statistics XII Math Project.pdfStatistics XII Math Project.pdf
Statistics XII Math Project.pdf
zakafu9
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
theadarshagarwal
 
Statistics digital text book
Statistics digital text bookStatistics digital text book
Statistics digital text book
deepuplr
 
DATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCHDATA COLLECTION IN RESEARCH
Unit 1 Introduction
Unit 1 IntroductionUnit 1 Introduction
Unit 1 Introduction
Rai University
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
JohnLester81
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimation
Tech_MX
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
Bahzad5
 
Ders 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptxDers 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptx
Ergin Akalpler
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
thaersyam
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
PreciousAntonetteOrp
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
Mohammad Ihmeidan
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
Balaji P
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
CallplanetsDeveloper
 
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_HappinessUnit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
ourbusiness0014
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneurs
Dr. Trilok Kumar Jain
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
HennaAnsari
 
initial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxinitial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docx
JeniceStuckeyoo
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing research
Nursing Path
 

Similar to Data summary metrics (20)

Measure of Central Tendency
Measure of Central TendencyMeasure of Central Tendency
Measure of Central Tendency
 
Statistics XII Math Project.pdf
Statistics XII Math Project.pdfStatistics XII Math Project.pdf
Statistics XII Math Project.pdf
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
Statistics digital text book
Statistics digital text bookStatistics digital text book
Statistics digital text book
 
DATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCHDATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCH
 
Unit 1 Introduction
Unit 1 IntroductionUnit 1 Introduction
Unit 1 Introduction
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
 
Theory of estimation
Theory of estimationTheory of estimation
Theory of estimation
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Ders 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptxDers 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptx
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
 
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_HappinessUnit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneurs
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
 
initial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docxinitial postWhat are the characteristics, uses, advantages, and di.docx
initial postWhat are the characteristics, uses, advantages, and di.docx
 
Statistics in nursing research
Statistics in nursing researchStatistics in nursing research
Statistics in nursing research
 

Recently uploaded

Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 

Recently uploaded (20)

Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 

Data summary metrics

  • 1. Data Summary Metrics Mean, Median, Mode and More • Populations & Samples, Parameters & Statistics • Summarizing the Data with a Metric • Discrepancy and Error • Estimating the Summary Metrics: Minimizing the Error • Arithmetic Mean, Median, and Mode • Geometric Mean, Harmonic Mean and Mid-Range • Breakdown Points of the Arithmetic Mean and the Median MandarGadre,July2013. -- Mandar Gadre (July 2013)
  • 2. MandarGadre,July2013. Population and Sample The Whole Population A “Sample” Dataset Parameter: is a certain property of the population as a whole. e.g. the median age of all Indian citizens. Statistic: is an estimate of that property, drawn from the sample dataset. e.g. estimated median, calculated from the ages of, say 1 million citizens.
  • 3. MandarGadre,July2013. Summarizing the Data • If a certain property of every individual in the population has only one value, no need to summarize! • For non-identical values of the property (almost everywhere in the real world), we look for a “Summary Statistic” such as a mean. • But how do we choose the summary statistic, S? Xi, i = 1 to N S
  • 4. MandarGadre,July2013. Discrepancy • Discrepancy ei is the “deviation” of an individual data-point xi from this “Summary Statistic”. We take s as the candidate summary statistic. You could define your own way of calculating discrepancy! • Three common ways – 1. Comparison Is the individual reading same as the candidate? ei = 1, xi ≠ s ei = 0, xi = s 2. Absolute difference from the candidate ei = |xi – s| 3. Square of the difference from the candidate ei = (xi – s)2
  • 5. MandarGadre,July2013. Error • Error E is the aggregate of individual discrepancies ei of all the individual data-points xi from the Candidate Summary Statistic s. • E for the three types of discrepancies would be – 1. Comparison with the Candidate E = ∑i ei, where ei = 1, xi ≠ s; ei = 0, xi = s 2. Absolute difference from the Candidate E = ∑i |xi – s| 3. Square of the difference from the Candidate E = ∑i (xi – s)2
  • 6. MandarGadre,July2013. Calculating S • We define S, Summary Statistic, as the value of s for which E is minimized. • We have given special names for the three types of Summary Statistics arising from the three types of discrepancies: Arising from Comparison: S, such that the error E = ∑ i ei, (where ei = 1, xi ≠ s; ei = 0, xi = s) is minimized. It turns out that the value of s that occurs the most frequently will minimize this error. There may be one or more such values. We call this the Mode. e.g. if we want to sell single size men’s t-shirts, we can get them made with size equal to the mode.
  • 7. MandarGadre,July2013. Arising from Absolute Difference: S, such that the error E = ∑ i |xi – s| is minimized. This is similar to the absolute value function and the derivative is the signum function! To find the minimum, we make the derivative (signum function) zero – which happens at the middle reading when all the data-points are arranged in increasing order. We call this the Median. If we want to summarize income-per-household in a huge country like India (data-set with severe outliers which do not require higher weightage) we will use the median.
  • 8. MandarGadre,July2013. Arising from Square of Absolute Difference: S, such that the error E = ∑i (xi – s)2 is minimized. Making the derivative zero gives us – ∑i 2(xi – S) = 0 or N*S = ∑i xi or S = (∑i xi) / N We call this the Arithmetic Mean. If we want to summarize the height of children in a kindergarten class, we may use the mean: the data is normally distributed and most likely there aren’t any extreme outliers (though in such cases the median and the mode are not any worse summary statistics to use). +++ Mode is used while capturing categorical/nominal data. Mean is used to capture the effect of extreme outliers. Median is used for datasets with extreme outliers which need not be given any higher weightage. The outliers sway the arithmetic mean much more than they sway the median, because of the square-of-the-distance.
  • 9. MandarGadre,July2013. Other Summary Statistics • Geometric Mean: Defined only for dataset with all positive numbers, it is the Nth root of the product of all N data-points. G = ( ∏ (xi) ) 1/N It is used while summarizing/aggregating data with different categories and scales involved. E.g. rating companies on various metrics taken together. Or where the data-points show compounding behavior e.g. summarizing performance of a stock over the past N years.
  • 10. MandarGadre,July2013. Other Summary Statistics • Harmonic Mean: Defined only for dataset with all non-zero numbers, it is the reciprocal of the arithmetic mean of reciprocals of xi. H = 1 / (1/N(∑i (1/xi)) ) It is used while summarizing rates. e.g. the average speed of aircraft between numerous Mumbai-London trips; or the average rate (in ml/min) at which a blood donor fills a bag over multiple visits.
  • 11. MandarGadre,July2013. Other Summary Statistics • Mid-Range Defined as the arithmetic mean of the maximum and minimum data-points Mid-Range = ½ (xmax + xmin) This is one of the least efficient (since it ignores all the data- points except for min and max) and the least robust (since it only depends on the extreme data-points and will be swayed if they are extreme outliers) statistics. It is used in process control. e.g. where the process is tightly controlled and the outliers are already handled/trimmed out.
  • 12. MandarGadre,July2013. Robust Statistics • Robust Statistics are those summary statistics which are insensitive to which sample we choose from the population; or to the presence of contaminated/bad/incorrect data in that sample. ‘Breakdown Point’ represents the degree of robustness of a statistic. • Breakdown Point is the largest proportion of contaminated data- points (e.g. an arbitrarily large data-point) a statistic can handle before yielding an absurd result (e.g. an arbitrarily large statistic). • Since the arithmetic mean depends on all the values and is swayed by changing even one value among N, its Breakdown Point is 0. • The median is the strongest statistic, with its Breakdown Point at 50%. (If more than 50% of the data is contaminated, a statistic cannot be defined anyway since there is no way to distinguish between the actual underlying distribution and the contaminated one.)