1 of 49

## What's hot

Introduction To Statistics
Introduction To Statisticsalbertlaporte

Descriptive Statistics
Descriptive Statisticsguest290abe

Introduction to SAS
Introduction to SASizahn

Introduction to basics of bio statistics.
Introduction to basics of bio statistics.AB Rajar

determination of sample size
determination of sample sizeJijo Varghese

Introduction to Statistics
Introduction to StatisticsRobert Tinaro

Data analysis
Data analysisLizzyL1

Survival Analysis Using SPSS
Survival Analysis Using SPSSNermin Osman

Stat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology

Deciding on a medical research topic: your first challenge
Deciding on a medical research topic: your first challengeAzmi Mohd Tamil

Review & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue

Simple understanding of biostatistics
Simple understanding of biostatisticsHamdi Alhakimi

### What's hot(20)

Introduction To Statistics
Introduction To Statistics

Descriptive Statistics
Descriptive Statistics

What is statistics
What is statistics

Introduction to SAS
Introduction to SAS

INTRODUCTION TO SAS
INTRODUCTION TO SAS

Sampling Technique.pptx
Sampling Technique.pptx

Introduction to basics of bio statistics.
Introduction to basics of bio statistics.

determination of sample size
determination of sample size

Introduction to Statistics
Introduction to Statistics

Data analysis
Data analysis

On p-values
On p-values

Analysis Of Medical Data
Analysis Of Medical Data

Survival Analysis Using SPSS
Survival Analysis Using SPSS

Data
Data

Stat3 central tendency & dispersion
Stat3 central tendency & dispersion

Presentation of data
Presentation of data

Deciding on a medical research topic: your first challenge
Deciding on a medical research topic: your first challenge

Review & Hypothesis Testing
Review & Hypothesis Testing

1.2 types of data
1.2 types of data

Simple understanding of biostatistics
Simple understanding of biostatistics

## Similar to Statistics for Non-Statisticians

Quantitative Methods for Lawyers - Class #6 - Basic Statistics + Probability ...
Quantitative Methods for Lawyers - Class #6 - Basic Statistics + Probability ...Daniel Katz

Making Statistics Work For Us: Item Bias, Decision Making, and Data-Driven Si...
Making Statistics Work For Us: Item Bias, Decision Making, and Data-Driven Si...Quinn Lathrop

Morestatistics22 091208004743-phpapp01
Morestatistics22 091208004743-phpapp01mandrewmartin

Statistical thinking
Statistical thinkingmij1120

Chapter8 Introduction to Estimation Hypothesis Testing.pdf
Chapter8 Introduction to Estimation Hypothesis Testing.pdfmekkimekki5

6 estimation hypothesis testing t test
6 estimation hypothesis testing t testPenny Jiang

Statistice Chapter 02[1]
Statistice Chapter 02[1]plisasm

Normal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler

03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdfSugumarSarDurai

Review Z Test Ci 1
Review Z Test Ci 1shoffma5

### Similar to Statistics for Non-Statisticians(20)

Quantitative Methods for Lawyers - Class #6 - Basic Statistics + Probability ...
Quantitative Methods for Lawyers - Class #6 - Basic Statistics + Probability ...

Making Statistics Work For Us: Item Bias, Decision Making, and Data-Driven Si...
Making Statistics Work For Us: Item Bias, Decision Making, and Data-Driven Si...

Morestatistics22 091208004743-phpapp01
Morestatistics22 091208004743-phpapp01

Statistical thinking
Statistical thinking

Sampling
Sampling

Chapter8 Introduction to Estimation Hypothesis Testing.pdf
Chapter8 Introduction to Estimation Hypothesis Testing.pdf

6 estimation hypothesis testing t test
6 estimation hypothesis testing t test

Statistice Chapter 02[1]
Statistice Chapter 02[1]

05inference_2011.ppt
05inference_2011.ppt

Normal and standard normal distribution
Normal and standard normal distribution

02a one sample_t-test
02a one sample_t-test

03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf

Chapter 5
Chapter 5

Sampling Distribution
Sampling Distribution

Applied statistics part 1
Applied statistics part 1

Binomial Probability Distributions
Binomial Probability Distributions

More Statistics
More Statistics

Ds vs Is discuss 3.1
Ds vs Is discuss 3.1

Review Z Test Ci 1
Review Z Test Ci 1

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56

Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics

Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787

Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...boychatmate1

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22

Rithik Kumar Singh codealpha pythohn.pdf

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics

Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole

What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf

Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf

Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf

Principles and Practices of Data Visualization
Principles and Practices of Data Visualization

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...

Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works

Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...
Introduction to Mongo DB-open-­‐source, high-­‐performance, document-­‐orient...

2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use

DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc

Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model

Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction

Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...

### Editor's Notes

1. Before we can talk about statistics, we need to talk about probability. Specifically, we need to talk about the two kinds of probability distributions:
2. Rolling dice always gives an integer value. A coin toss is either heads or tails. A chart of all the possible outcomes illustrates a discrete distribution.
3. Every time you toss a coin, it comes up either heads or tails – unless a seagull snatches it out of the air, or it sticks sideways in the ground. If it’s a fair coin, we expect to get heads 50% of the time and tails 50% of the time. Whenever we have two possible outcomes, we have a Bernoulli Distribution.
4. If we toss a coin 10 times and count the number of times it comes up heads, and then do that same thing over and over again, this is how the results should look. We have a very small (but not zero!) chance of 10 heads or 10 tails; five of each is the most likely outcome, and we should see that about 25% of the time.
5. We see the same thing with dice. If we roll one die, there are six possible outcomes, and each outcome has a probability of 1/6. But if we roll two dice, we have 10 possible outcomes; 7 is the most likely outcome, while 2 and 12 are the least likely.
6. If we are measuring blood sugar, or temperature, or many other things, the result is a continuous distribution. The number of possible outcomes is infinite (even though it may be bounded).
7. This graph shows the number of days in 2017 with various amounts of rainfall at RDU airport. Days with no rain are not included here; We see most rainy days have less than a quarter-inch of rainfall, and as the amount of rain increases, the number of days decreases.
8. This curve has some interesting properties… for one thing, its length is infinite but the area under the curve is exactly 1.
9. This graph represents the height, in inches, of 205 men measured in England in the1880’s. The data was collected by Francis Galton, a cousin of Charles Darwin. Galton studied the heights of 205 men, their wives, and their adult children. Galton, F. (1886). Regression Towards Mediocrity in Hereditary Stature Journal of the Anthropological Institute, 15, 246-263
10. Here, I’ve colored sections of the curve to show the inflection points. The red part of the curve is concave-down; the blue parts are concave-up. The boundary between red and blue is the inflection point, where the curve changes direction.
11. I’ve added a line at the center of the graph, labeled “mu” because statisticians love Greek letters. Mu refers to the population mean, or average, and it coincides with the peak of the normal curve. I’ve also added vertical lines at the inflection points. The distance from mu to these lines is called the standard deviation, or sigma (because statisticians love Greek letters). In any normal distribution, 68% of the population will fall within 1 standard deviation of the mean.
12. I’ve added lines here for 2 and 3 standard deviations from the mean. 95% of the population falls within 2 (actually, 1.96) standard deviations, and 99.7% within 3 standard deviations.
13. How can the sample mean have a distribution? Isn’t it just one number? If we take REPEATED samples, we will get a different mean from each different sample. So the mean from our random sample is one observation chosen at random from this distribution.
14. Enough about probability. Let’s talk about statistics. First, we’ll talk about descriptive statistics. By “descriptive statistics,” we mean numbers that tell us something about a population or a sample.
15. I really like data visualizations, they can provide a lot of information in a compact, easily digested form.
16. For example, here’s some information about North Carolina from the census bureau. (point out features of this infographic)
17. Does anyone know who this is? Are there any nurses here?
18. Yes, that’s Florence Nightingale. Most people know her as the founder of modern nursing, but she was also a pioneer in the use of statistics.
19. This is Nightingale’s famous “Diagram of the causes of mortality in the army of the East,” which she made during the Crimean war. The pink areas represent the number of soldier’s who died from battle wounds, the blue areas represent the number of soldiers who died from poor sanitation or infections, and the black areas represent deaths from all other causes. After she insisted that nurses and doctors wash their hands, and after the sanitation commission flushed out the sewers and improved ventilation in the hospitals, the death rate dropped dramatically. This chart was instrumental in bringing about those reforms.
20. This is a map drawn by Charles Minard in 1869. Noted information designer Edward Tufte says it "may well be the best statistical graphic ever drawn.” (point out features of the map)
21. Not everything can, or should be, made into a chart. Sometimes you just need to see the numbers. There are certain numbers that are essential to understanding the distribution of a data set.
22. Measures of location, or central tendency: mean (arithmetic average), median (half the data are below, half above) and mode (most common occurring number) Skewed data results from outliers. Consider an extreme example, a village with 100 workers and one factory owner. The workers are each paid \$10/year, and the factory owner makes \$1,000,000 per year. The mean wage is \$9,911 per year, but the median and the mode are both \$10.
23. Measures of spread: variance, standard deviation In the variance formula we measure the distance from each data point to the mean… some will be negative, some positive, so we square them to keep them from cancelling out. Then we add together all of those squared differences, and divide by the number of data points. This average squared distance is called the variance. If we take the square root of the variance, we get the standard deviation.
24. Fence: 1-1/2 times the interquartile range from the median. Points beyond the fence are marked as outliers.
25. Population examples: Everyone in North Carolina; Adults over 50 with high blood pressure; All of the Medicaid claims filed by a specific provider between 1Jul2017 and 31Dec2017. Parameters: A number that describes the population: Median age of people in North Carolina; Average Systolic BP of A50+; Amount Medicaid overpayed the provider. Sample: We can’t measure the entire population, so we draw a random sample. Statistics: Numbers that describe the sample (rather than the population): We use statistics measured on a random sample to infer the parameters of the population.
26. There are a lot of different sampling methods, but it is important that they be random in order to avoid biasing the results.
27. Do we believe the results of this survey? Why? Website surveys like this are not representative of the population, because the respondents are not chosen at random.
28. In a simple random sample, every member of the population has the same probability of being selected. In a stratified random sample, every member of a subgroup (strata) in the population has the same probability of being selected as every other member of the same subgroup.
29. These are formulas for the standard deviation of two different types of data… what they have in common is “n”, the number of observations in the sample. The bigger this number gets, the smaller the spread of the data.
30. George Edward Pelham was a British statistician, who has been called "one of the great statistical minds of the 20th century“ All models are wrong: there are no perfect spheres in nature. Some are useful: We can divide the earth’s surface as if it were a sphere, and the results are good enough to locate objects with our GPS systems.
31. As we said before, we measure a variable across our sample, calculate a statistic, and use that statistic to estimate the parameter for the population. The result is never exact but the good news is that we can describe just how inexact it is!
32. We can describe the inherent uncertainty in our data using confidence intervals; we can use our results to test a hypothesis, specify the results using a p-value. Definitions: on the slide Explain graphs briefly
33. Type 1 Error: alpha (because we love Greek letters!) is the probability of making a type 1 error. False positive; Type II Error: beta (because we love Greek letters!) is the probability of making a type 2 error. False negative. Notice that the null hypothesis is never proven; we either reject it or we fail to reject it. Just like in a courtroom trial, where the defendant is never found innocent, only “not guilty.” Courtroom: Null Hypothesis = Defendant is innocent. Prosecutor has to prove guilt in order to reject the null hypothesis.
34. Ha Ha!
35. Imagine a clinical experiment where we can conclusively prove that a new drug will lower blood pressure. But it only lowers it by an average of 1 point, say from 140 to 139. The result is statistically significant but nobody cares, because it is not clinically important.
36. Clinical trials are set up to look for a “clinically important difference.” Sample size is chosen so that if that difference exists, there will be a specific probability of detecting it. This is called the “power” of the trial. Power is 1 minus beta. Confidence is 1 minus alpha.
37. P: the probability that, if the null hypothesis is true, we would observe results at least as extreme as the ones we have observed. It is common to use 0.05 as the cutoff for statistical significance, but this is arbitrary. Also, by increasing the sample size, we can ALWAYS get a result with p < 0.05 or any other arbitrary level.
38. Here again is the Galton data on the heights of adult males in England in the 1880’s. It follows a normal distribution (more or less), and we note that there are a couple of men in his sample who are unusually short, and one who is unusually tall. The tall guy here is 6 feet 6 inches, by the way. The mean height is about 69 inches and the standard deviation is 2 and a half inches. Our tall guy is about 3.6 standard deviations taller than the average. Based on our normal distribution we can calculate that he would be taller than 99.98% of the population.
39. Just for fun, I’ve added one data point to the graph: Shaq! At 85” tall, Shaq is 6.4 standard deviations above the average. He’s taller than 99.999999% of the population!
40. Here we have data points plotted across two correlated variables. The circled point is not an extreme outlier in either dimension, but it’s far away from the mass of spots in the ellipse. We can generalize this to any number of dimensions, but it’s hard to visualize. But we can express it mathematically, and the difference between the chosen point and the center of the data is called the Mahalanobis distance.
Current LanguageEnglish
Español
Portugues
Français
Deutsche