SlideShare a Scribd company logo
1 of 38
Download to read offline
Summarising research data using
descriptive statistics
Open Educational Resource
Dr Leonard Ho
ACRC Systematic Reviewer, Usher Institute
Objectives
• To understand different types of variables
• To calculate the central tendency and dispersion of continuous data
• To present data with appropriate diagrams
2
Useful books
3
From John Wiley & Sons: https://media.wiley.com/product_data/coverImage300/19/08654287/0865428719.jpg From Amazon UK: https://m.media-amazon.com/images/I/41B+txZryWL._SX283_BO1,204,203,200_.jpg From Amazon UK: https://m.media-amazon.com/images/I/61L9H502OYL.jpg
Statistics
• “The science of collecting, summarising, presenting and interpreting data,
and of using them to estimate the magnitude of associations and test
hypotheses”
4
From John Wiley & Sons: https://media.wiley.com/product_data/coverImage300/19/08654287/0865428719.jpg
Two types of statistics
• Descriptive statistics
• Summarising and describing the behaviour of data in a dataset
• Mean, standard deviation…
• Inferential statistics
• Making predictions and testing hypotheses with the data in a dataset
• Regressions, chi-square tests…
5
Types of variables
• Variable is a quantity or characteristic that can be
measured or observed
• Quantitative (numeric) variable contains data that
describe a measurable quantity
• Qualitative (categorical) variable contains data that
describe a characteristic
6
Variables
Quantitative Qualitative
Quantitative variable
• Continuous variable
• Contains data that lie on a continuum and can take any
values
• Height, weight...
• Very common in scientific research
• Discrete variable
• Contains data that do not lie on a continuum and can
only take whole numbers (integers)
• Number of strokes per day…
• “Not splitable”
7
Variables
Quantitative
Continuous Discrete
Qualitative
Qualitative variable
• Ordinal variable
• Contains data that take any categories and there is an
intrinsic ordering of the categories
• Educational level (Primary < Secondary < Tertiary)…
• Nominal variable
• Contains data that take any categories but there is no
intrinsic ordering of the categories
• Location (Aberdeen, Edinburgh, Glasgow)…
8
Variables
Quantitative
Continuous Discrete
Qualitative
Ordinal Nominal
“Mooing pill”
• We are going to sell a drug that makes people moo like Highland cattle
• We conducted a randomised controlled trial on 1,500 people in Scotland
• Our dataset contains the following variables:
• Sex (Male / Female)
• Body mass index (kg/m2)
• Educational level (Primary / Secondary / Tertiary)
• Number of pills necessary to trigger mooing
9
From Visit Scotland: https://www.visitscotland.com/blog/wp-content/uploads/2019/10/HC-on-coastal-road.jpg
Our dataset
10
Label Variable Type of variable
Sex Sex Nominal
BMI Body mass index Continuous
Edu_level Educational level Ordinal
No_pill Number of pills Discrete
How do we describe BMI?
Description of continuous data
• Describe the central tendency
• Average of data
• Describe the dispersion
• Spread of data
11
Central tendency (Median)
• Median is the midway value of a list of ordered data (ascending or descending)
• “Midway” refers to the middle number or the average of two middle numbers
• [1, 1, 1, 3, 4, 4, 7, 7, 7]: Median is 4
• [1, 1, 1, 3, 3, 4, 4, 7, 7, 7]: Median is 3.5
• Divides the list of data into upper and lower halves
• Not affected by extreme values
• [–11111, 1, 1, 3, 4, 4, 7, 7, 99999]: Median is still 4
12
Central tendency (Mean)
• Mean is the sum of a list of data divided by the total number of data
• [1, 1, 1, 3, 4, 4, 7, 7, 7]: Mean is 3.89
• Affected by extreme values
• [1, 1, 1, 3, 4, 4, 7, 7, 77777]: Mean is 8645
13
Central tendency (Mode)
• Mode is the value that occurs most often in a list of data
• [1, 1, 3, 4, 4, 7, 7, 7]: Mode is 7
• May have ≥ 1 modal value
• More relevant for integers (like discrete variables)
• If we round data, we would lose much information!
14
Calculated from original data Calculated from rounded data
Presenting data with histogram
• Histogram
• Shows the distribution of data by plotting the
data in rectangles, “bins” (not bars),
corresponding to categories along the x-axis
• The bins have heights that are proportional to
the frequencies of observations
• No gaps between bins because the categories
are on a continuum!
15
Central tendencies of BMI
• Central tendencies of BMI (kg/m2) among our 1,500 participants
16
Median ≈ Mean
Normally distributed (roughly)
Distribution of data
17
Mean: 22.95
Median: 23.08
From Biology For Life: https://www.biologyforlife.com/uploads/2/2/3/9/22392738/c101b0da6ea1a0dab31f80d9963b0368_orig.png
Central tendencies of BMI
18
Mean ; Median ; Mode
Slightly negatively skewed
Description of continuous data
• Describe the central tendency
• Average of data
• Describe the dispersion
• Spread of data
19
Dispersion (Range)
• Range is the difference between the maximum and the minimum values in a list
of data
• [1, 1, 1, 3, 4, 4, 7, 7, 7]: Range is 6
• Affected by extreme values
• [1, 1, 1, 3, 4, 4, 7, 7, 77777]: Range is 77776
20
Dispersion (Interquartile range)
• Interquartile range (IQR) summarises the spread of
the middle 50% of data in an ordered list
• Not affected by extreme data
• Difference between the upper (Q3) and the lower
(Q1) quartiles in a list of ordered data
• Q3: Between the maximum and median (Q2)
• Q1: Between the minimum and Q2
21
Q1 = 2 Max = 8
Min = 1 Q2 = 4 Q3 = 7
IQR = 7 – 2 = 5
[1, 1, 2, 3, 4, 4, 5, 6, 7, 7, 8]
Q1 = 2 Max = 9
Min = 1 Q2 = 4.5 Q3 = 7
IQR = 7 – 2 = 5
[1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9]
There are many ways to calculate quartiles!
Interquartile range of BMI
• IQR of BMI (kg/m2) among our 1,500 participants
22
Presenting data with box plot (1)
• Box plot (Box and whisker plot)
• At least shows 5 pieces of summary information about a
list of data:
• Median = Horizontal line in box
• Upper quartile = Top edge of the box
• Lower quartile = Lower edge of box
• Maximum = Top of whisker
• Minimum = Bottom of whisker
23
From the University of Newcastle: https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/images/Box_and_whiskers_explanation_inkscape(2).png
Whisker
Box
Presenting data with box plot (2)
• Box plot (Box and whisker plot)
• Always check whether there are outliers
• Observations that are far away from the others
• Common definitions of outliers:
• Lower outlier(s) < Q1 − (1.5*IQR)
• Upper outlier(s) > Q3 + (1.5*IQR)
• Remember to amend the minimum and maximum values
on the plot
• Minimum value becomes the value right above the cut-off
for lower outliers
• Maximum value becomes the value right below the cut-off
for upper outliers
24
Dispersion (Standard deviation) (1)
• Standard deviation (SD) describes the spread of
data around the mean and the average difference
between the mean and each observation
• The larger the SD, the more spread the data
• We use all data to calculate SD
• (Think about how we calculate range and IQR)
• Affected by extreme values
25
From Cuemath: https://d138zd1ktt9iqe.cloudfront.net/media/seo_landing_files/standard-deviation-formula-1626765976.png
Dispersion (Standard deviation) (2)
Larger SD:
• Flatter distribution
26
Smaller SD:
• Narrower distribution
Dispersion (Standard deviation) (3)
• The 68–95–99 Rule
• 68% of data falls within ± 1*SD
• 95% of data falls within ± 2*SD
• 99% of data falls within ± 3*SD
• (Applicable to normal distribution)
27
Mean = 22.95
Mean + 1*SD = 26.42
Mean + 2*SD = 29.89
Mean – 1*SD = 19.48
Mean – 2*SD = 16.01
Our data may not completely fulfil this rule
because their distribution is slightly skewed
From Biology For Life: https://www.biologyforlife.com/uploads/2/2/3/9/22392738/sd2_orig.png
Dispersion of BMI
• Dispersion of BMI (kg/m2) among our 1,500 participants
28
Measurement Dispersion Interpretation
Range 23.55
The difference between the
highest and the lowest BMI
IQR 4.67
The range of the middle 50% of
BMI data around the median
SD 3.47
95% of BMI data falls between
16.01 and 29.89 around the mean
Our dataset
29
Label Variable Type of variable
Sex Sex Nominal
BMI Body mass index Continuous
Edu_level Educational level Ordinal
No_pill Number of pills Discrete
How do we describe sex, educational level,
and number of pills?
Description of non-continuous data
• Frequency and percentage are useful in describing non-continuous variables
• Modes can also be used to show the most common category
30
Presenting data with bar chart
• Bar chart
• Shows the distribution of observations in different
categories of a variable where every observation belongs
to one category
• Each category is given its own bar, and the length of the
bar is proportional to the frequency of observations
within that category
• (How is it different from histogram?)
• For stacked bar charts:
• Always shows the % contribution of each sub-bar
• Always avoid showing > 3 sub-bars in each population
31
From National Records of Scotland:
https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/population/population-estimates/mid-year-population-estimates/mid-2021
Presenting data with pie chart
• Pie chart
• Shows the distribution of observations in different
categories of a variable where every observation belongs
to one category
• Area of each slice proportional to the frequency of
observations within that category
• Only useful when there are ≥ 3 categories
• Become hard to read if there are > 10 categories
• Please, never use 3D pie charts
• (They are not beautiful at all and sometimes misleading)
32
Categorising continuous data (1)
• We may categorise our continuous variable
according to pre-specified rules
• For better communication
• For decision-making
33
BMI (kg/m2)
• Underweight: < 18.5 kg/m2
• Normal: 18.5 to 24.9 kg/m2
• Overweight: 25.0 to 29.9 kg/m2
• Obese: > 30.0 kg/m2
• Not obese: < 30 kg/m2
• Obese: ≥ 30 kg/m2
Categorising continuous data (2)
• Loss of information
• Cut-off values may be arbitrary
• If we must categorise, make sure that we:
• also provide the central tendency and dispersion of the
continuous variable
• clearly state the cut-off values and their justifications
34
Summary (1)
• Continuous variable
• Contains data that lie on a continuum
• Can take any values
• Discrete variable
• Contains data that do not lie on a continuum
• Can only take integers
• Ordinal variable
• Contains data that take any categories
• There is an intrinsic ordering of the categories
• Nominal variable
• Contains data that take any categories
• There is no intrinsic ordering of the categories
35
Variables
Quantitative
Continuous Discrete
Qualitative
Ordinal Nominal
Summary (2)
• Continuous variable
• Central tendency summarised by median and mean
• Dispersion summarised by IQR (and range) and SD
• Visually presented by histogram and box plot
• Non-continuous variable
• Observations summarised by frequency and percentage, and mode
• Visually presented by bar chart and pie chart
36
Useful books
37
From John Wiley & Sons: https://media.wiley.com/product_data/coverImage300/19/08654287/0865428719.jpg From Amazon UK: https://m.media-amazon.com/images/I/41B+txZryWL._SX283_BO1,204,203,200_.jpg From Amazon UK: https://m.media-amazon.com/images/I/61L9H502OYL.jpg
Summarising research data using
descriptive statistics
Open Educational Resource
Dr Leonard Ho
ACRC Systematic Reviewer, Usher Institute

More Related Content

Similar to OER Descriptive Statistics (University of Edinburgh)

Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Manzur Ashraf
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAmira Talic
 
Biostatistics Class.pptx
Biostatistics Class.pptxBiostatistics Class.pptx
Biostatistics Class.pptxLgbYdder
 
Biostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxBiostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxMohammedAbdela7
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptxTeddyteddy53
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)Chhom Karath
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and SummaryDrZahid Khan
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Methods of data presentation.pptx
Methods of data presentation.pptxMethods of data presentation.pptx
Methods of data presentation.pptxssuserbd4d1e
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptTripthiDubey
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingOffice for National Statistics
 

Similar to OER Descriptive Statistics (University of Edinburgh) (20)

Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Biostatistics Class.pptx
Biostatistics Class.pptxBiostatistics Class.pptx
Biostatistics Class.pptx
 
Biostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptxBiostatistics_descriptive stats.pptx
Biostatistics_descriptive stats.pptx
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptx
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Data Display and Summary
Data Display and SummaryData Display and Summary
Data Display and Summary
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Intro to Statistics.pptx
Intro to Statistics.pptxIntro to Statistics.pptx
Intro to Statistics.pptx
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Methods of data presentation.pptx
Methods of data presentation.pptxMethods of data presentation.pptx
Methods of data presentation.pptx
 
IV STATISTICS I.pdf
IV STATISTICS I.pdfIV STATISTICS I.pdf
IV STATISTICS I.pdf
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

OER Descriptive Statistics (University of Edinburgh)

  • 1. Summarising research data using descriptive statistics Open Educational Resource Dr Leonard Ho ACRC Systematic Reviewer, Usher Institute
  • 2. Objectives • To understand different types of variables • To calculate the central tendency and dispersion of continuous data • To present data with appropriate diagrams 2
  • 3. Useful books 3 From John Wiley & Sons: https://media.wiley.com/product_data/coverImage300/19/08654287/0865428719.jpg From Amazon UK: https://m.media-amazon.com/images/I/41B+txZryWL._SX283_BO1,204,203,200_.jpg From Amazon UK: https://m.media-amazon.com/images/I/61L9H502OYL.jpg
  • 4. Statistics • “The science of collecting, summarising, presenting and interpreting data, and of using them to estimate the magnitude of associations and test hypotheses” 4 From John Wiley & Sons: https://media.wiley.com/product_data/coverImage300/19/08654287/0865428719.jpg
  • 5. Two types of statistics • Descriptive statistics • Summarising and describing the behaviour of data in a dataset • Mean, standard deviation… • Inferential statistics • Making predictions and testing hypotheses with the data in a dataset • Regressions, chi-square tests… 5
  • 6. Types of variables • Variable is a quantity or characteristic that can be measured or observed • Quantitative (numeric) variable contains data that describe a measurable quantity • Qualitative (categorical) variable contains data that describe a characteristic 6 Variables Quantitative Qualitative
  • 7. Quantitative variable • Continuous variable • Contains data that lie on a continuum and can take any values • Height, weight... • Very common in scientific research • Discrete variable • Contains data that do not lie on a continuum and can only take whole numbers (integers) • Number of strokes per day… • “Not splitable” 7 Variables Quantitative Continuous Discrete Qualitative
  • 8. Qualitative variable • Ordinal variable • Contains data that take any categories and there is an intrinsic ordering of the categories • Educational level (Primary < Secondary < Tertiary)… • Nominal variable • Contains data that take any categories but there is no intrinsic ordering of the categories • Location (Aberdeen, Edinburgh, Glasgow)… 8 Variables Quantitative Continuous Discrete Qualitative Ordinal Nominal
  • 9. “Mooing pill” • We are going to sell a drug that makes people moo like Highland cattle • We conducted a randomised controlled trial on 1,500 people in Scotland • Our dataset contains the following variables: • Sex (Male / Female) • Body mass index (kg/m2) • Educational level (Primary / Secondary / Tertiary) • Number of pills necessary to trigger mooing 9 From Visit Scotland: https://www.visitscotland.com/blog/wp-content/uploads/2019/10/HC-on-coastal-road.jpg
  • 10. Our dataset 10 Label Variable Type of variable Sex Sex Nominal BMI Body mass index Continuous Edu_level Educational level Ordinal No_pill Number of pills Discrete How do we describe BMI?
  • 11. Description of continuous data • Describe the central tendency • Average of data • Describe the dispersion • Spread of data 11
  • 12. Central tendency (Median) • Median is the midway value of a list of ordered data (ascending or descending) • “Midway” refers to the middle number or the average of two middle numbers • [1, 1, 1, 3, 4, 4, 7, 7, 7]: Median is 4 • [1, 1, 1, 3, 3, 4, 4, 7, 7, 7]: Median is 3.5 • Divides the list of data into upper and lower halves • Not affected by extreme values • [–11111, 1, 1, 3, 4, 4, 7, 7, 99999]: Median is still 4 12
  • 13. Central tendency (Mean) • Mean is the sum of a list of data divided by the total number of data • [1, 1, 1, 3, 4, 4, 7, 7, 7]: Mean is 3.89 • Affected by extreme values • [1, 1, 1, 3, 4, 4, 7, 7, 77777]: Mean is 8645 13
  • 14. Central tendency (Mode) • Mode is the value that occurs most often in a list of data • [1, 1, 3, 4, 4, 7, 7, 7]: Mode is 7 • May have ≥ 1 modal value • More relevant for integers (like discrete variables) • If we round data, we would lose much information! 14 Calculated from original data Calculated from rounded data
  • 15. Presenting data with histogram • Histogram • Shows the distribution of data by plotting the data in rectangles, “bins” (not bars), corresponding to categories along the x-axis • The bins have heights that are proportional to the frequencies of observations • No gaps between bins because the categories are on a continuum! 15
  • 16. Central tendencies of BMI • Central tendencies of BMI (kg/m2) among our 1,500 participants 16 Median ≈ Mean Normally distributed (roughly)
  • 17. Distribution of data 17 Mean: 22.95 Median: 23.08 From Biology For Life: https://www.biologyforlife.com/uploads/2/2/3/9/22392738/c101b0da6ea1a0dab31f80d9963b0368_orig.png
  • 18. Central tendencies of BMI 18 Mean ; Median ; Mode Slightly negatively skewed
  • 19. Description of continuous data • Describe the central tendency • Average of data • Describe the dispersion • Spread of data 19
  • 20. Dispersion (Range) • Range is the difference between the maximum and the minimum values in a list of data • [1, 1, 1, 3, 4, 4, 7, 7, 7]: Range is 6 • Affected by extreme values • [1, 1, 1, 3, 4, 4, 7, 7, 77777]: Range is 77776 20
  • 21. Dispersion (Interquartile range) • Interquartile range (IQR) summarises the spread of the middle 50% of data in an ordered list • Not affected by extreme data • Difference between the upper (Q3) and the lower (Q1) quartiles in a list of ordered data • Q3: Between the maximum and median (Q2) • Q1: Between the minimum and Q2 21 Q1 = 2 Max = 8 Min = 1 Q2 = 4 Q3 = 7 IQR = 7 – 2 = 5 [1, 1, 2, 3, 4, 4, 5, 6, 7, 7, 8] Q1 = 2 Max = 9 Min = 1 Q2 = 4.5 Q3 = 7 IQR = 7 – 2 = 5 [1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 7, 7, 8, 9] There are many ways to calculate quartiles!
  • 22. Interquartile range of BMI • IQR of BMI (kg/m2) among our 1,500 participants 22
  • 23. Presenting data with box plot (1) • Box plot (Box and whisker plot) • At least shows 5 pieces of summary information about a list of data: • Median = Horizontal line in box • Upper quartile = Top edge of the box • Lower quartile = Lower edge of box • Maximum = Top of whisker • Minimum = Bottom of whisker 23 From the University of Newcastle: https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/images/Box_and_whiskers_explanation_inkscape(2).png Whisker Box
  • 24. Presenting data with box plot (2) • Box plot (Box and whisker plot) • Always check whether there are outliers • Observations that are far away from the others • Common definitions of outliers: • Lower outlier(s) < Q1 − (1.5*IQR) • Upper outlier(s) > Q3 + (1.5*IQR) • Remember to amend the minimum and maximum values on the plot • Minimum value becomes the value right above the cut-off for lower outliers • Maximum value becomes the value right below the cut-off for upper outliers 24
  • 25. Dispersion (Standard deviation) (1) • Standard deviation (SD) describes the spread of data around the mean and the average difference between the mean and each observation • The larger the SD, the more spread the data • We use all data to calculate SD • (Think about how we calculate range and IQR) • Affected by extreme values 25 From Cuemath: https://d138zd1ktt9iqe.cloudfront.net/media/seo_landing_files/standard-deviation-formula-1626765976.png
  • 26. Dispersion (Standard deviation) (2) Larger SD: • Flatter distribution 26 Smaller SD: • Narrower distribution
  • 27. Dispersion (Standard deviation) (3) • The 68–95–99 Rule • 68% of data falls within ± 1*SD • 95% of data falls within ± 2*SD • 99% of data falls within ± 3*SD • (Applicable to normal distribution) 27 Mean = 22.95 Mean + 1*SD = 26.42 Mean + 2*SD = 29.89 Mean – 1*SD = 19.48 Mean – 2*SD = 16.01 Our data may not completely fulfil this rule because their distribution is slightly skewed From Biology For Life: https://www.biologyforlife.com/uploads/2/2/3/9/22392738/sd2_orig.png
  • 28. Dispersion of BMI • Dispersion of BMI (kg/m2) among our 1,500 participants 28 Measurement Dispersion Interpretation Range 23.55 The difference between the highest and the lowest BMI IQR 4.67 The range of the middle 50% of BMI data around the median SD 3.47 95% of BMI data falls between 16.01 and 29.89 around the mean
  • 29. Our dataset 29 Label Variable Type of variable Sex Sex Nominal BMI Body mass index Continuous Edu_level Educational level Ordinal No_pill Number of pills Discrete How do we describe sex, educational level, and number of pills?
  • 30. Description of non-continuous data • Frequency and percentage are useful in describing non-continuous variables • Modes can also be used to show the most common category 30
  • 31. Presenting data with bar chart • Bar chart • Shows the distribution of observations in different categories of a variable where every observation belongs to one category • Each category is given its own bar, and the length of the bar is proportional to the frequency of observations within that category • (How is it different from histogram?) • For stacked bar charts: • Always shows the % contribution of each sub-bar • Always avoid showing > 3 sub-bars in each population 31 From National Records of Scotland: https://www.nrscotland.gov.uk/statistics-and-data/statistics/statistics-by-theme/population/population-estimates/mid-year-population-estimates/mid-2021
  • 32. Presenting data with pie chart • Pie chart • Shows the distribution of observations in different categories of a variable where every observation belongs to one category • Area of each slice proportional to the frequency of observations within that category • Only useful when there are ≥ 3 categories • Become hard to read if there are > 10 categories • Please, never use 3D pie charts • (They are not beautiful at all and sometimes misleading) 32
  • 33. Categorising continuous data (1) • We may categorise our continuous variable according to pre-specified rules • For better communication • For decision-making 33 BMI (kg/m2) • Underweight: < 18.5 kg/m2 • Normal: 18.5 to 24.9 kg/m2 • Overweight: 25.0 to 29.9 kg/m2 • Obese: > 30.0 kg/m2 • Not obese: < 30 kg/m2 • Obese: ≥ 30 kg/m2
  • 34. Categorising continuous data (2) • Loss of information • Cut-off values may be arbitrary • If we must categorise, make sure that we: • also provide the central tendency and dispersion of the continuous variable • clearly state the cut-off values and their justifications 34
  • 35. Summary (1) • Continuous variable • Contains data that lie on a continuum • Can take any values • Discrete variable • Contains data that do not lie on a continuum • Can only take integers • Ordinal variable • Contains data that take any categories • There is an intrinsic ordering of the categories • Nominal variable • Contains data that take any categories • There is no intrinsic ordering of the categories 35 Variables Quantitative Continuous Discrete Qualitative Ordinal Nominal
  • 36. Summary (2) • Continuous variable • Central tendency summarised by median and mean • Dispersion summarised by IQR (and range) and SD • Visually presented by histogram and box plot • Non-continuous variable • Observations summarised by frequency and percentage, and mode • Visually presented by bar chart and pie chart 36
  • 37. Useful books 37 From John Wiley & Sons: https://media.wiley.com/product_data/coverImage300/19/08654287/0865428719.jpg From Amazon UK: https://m.media-amazon.com/images/I/41B+txZryWL._SX283_BO1,204,203,200_.jpg From Amazon UK: https://m.media-amazon.com/images/I/61L9H502OYL.jpg
  • 38. Summarising research data using descriptive statistics Open Educational Resource Dr Leonard Ho ACRC Systematic Reviewer, Usher Institute