This document discusses the importance of properly analyzing and visualizing data when conducting statistical tests and reporting results. It recommends displaying raw data through dot plots instead of bar graphs to avoid concealing variance. The document discusses how the mean may not always be the best descriptor of data and how providing confidence intervals around measures provides important context about uncertainty. It also emphasizes choosing statistical tests wisely based on the characteristics of the data and justifying choices. Overall, the document stresses the importance of exploring data visually and using appropriate analyses and reporting to avoid drawing incorrect conclusions.
Boston DataSwap 2013 -- Network Visualization in NodeXLcodydunne
This document summarizes Cody Dunne's presentation on network visualization tools in NodeXL. It discusses how NodeXL can be used to visualize complex relationships in limited screen space through motif simplification. It also describes how NodeXL allows users to explore groups in a network by showing their size, membership and relationships using new Group-in-a-Box layouts. Finally, it lists the various features now available in the NodeXL software, including simplification, layouts, statistics, filtering and more.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
Data confusion (how to confuse yourself and others with data analysis)Vijay Kukrety
The document discusses various ways that data can be misused or misinterpreted, including through the use of misleading or non-informative graphs, misapplying averages, and using inappropriate statistical methods. It provides examples of bad graphs and analyses to avoid, and emphasizes the importance of properly collecting and presenting data to draw accurate conclusions. Key topics covered include distinguishing descriptive, enumerative, and analytical studies; understanding outliers and regression analysis; and avoiding forcing linear models on nonlinear data relationships.
Statistics is the science of collecting, analyzing, and drawing conclusions from data. It involves exploring numerical facts like measurements and counts to make sense of patterns in the real world. Studying statistics helps people understand trends, avoid being misled, and make better decisions with information.
The document discusses objectives and concepts related to statistical analysis in biology, including:
- Types of data, graphs, and statistical analyses such as mean, standard deviation, and chi square analysis.
- Calculating and interpreting the mean and standard deviation of a data set to describe variability.
- Using standard deviation to compare the spread of data between samples and determine significance.
- Performing hypothesis testing using calculated t values, t tables, and p values to determine if differences between data sets are statistically significant.
The document provides guidance on analyzing and interpreting data in teaching elementary science. It discusses the objectives of interpreting data, which include analyzing given data, making interpretations based on evidence, organizing data in different formats, making inferences, and understanding dependent and independent variables. Examples are given of different types of graphs like pie charts, line graphs, and bar graphs that can be used to visualize and analyze data. Steps for interpreting data involve organizing it, creating a graph, looking for trends, making inferences, and checking inferences against existing knowledge. The document emphasizes that interpreting data relies on human judgment and cognition.
This document provides an introduction to biostatistics in nursing. It defines biostatistics as statistics arising from biological sciences like medicine and public health. It discusses the importance of understanding biostatistics for nurses due to the increasing use of quantitative methods in medical research and literature. The document outlines different types of data like qualitative, discrete, continuous and scales of measurement. It also demonstrates how to create a frequency distribution table to organize and summarize patient data.
Standard deviation is a measure of how dispersed data values are from the average. It was developed in 1893 by Karl Pearson as a statistical concept to quantify dispersion. To calculate standard deviation, you find the mean, calculate the deviations from the mean, square the deviations to amplify large values, take the average of the squared deviations, and take the square root. Standard error is the standard deviation of a statistic's sampling distribution and is used to calculate confidence intervals, while standard deviation estimates variability in a population.
Boston DataSwap 2013 -- Network Visualization in NodeXLcodydunne
This document summarizes Cody Dunne's presentation on network visualization tools in NodeXL. It discusses how NodeXL can be used to visualize complex relationships in limited screen space through motif simplification. It also describes how NodeXL allows users to explore groups in a network by showing their size, membership and relationships using new Group-in-a-Box layouts. Finally, it lists the various features now available in the NodeXL software, including simplification, layouts, statistics, filtering and more.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
Data confusion (how to confuse yourself and others with data analysis)Vijay Kukrety
The document discusses various ways that data can be misused or misinterpreted, including through the use of misleading or non-informative graphs, misapplying averages, and using inappropriate statistical methods. It provides examples of bad graphs and analyses to avoid, and emphasizes the importance of properly collecting and presenting data to draw accurate conclusions. Key topics covered include distinguishing descriptive, enumerative, and analytical studies; understanding outliers and regression analysis; and avoiding forcing linear models on nonlinear data relationships.
Statistics is the science of collecting, analyzing, and drawing conclusions from data. It involves exploring numerical facts like measurements and counts to make sense of patterns in the real world. Studying statistics helps people understand trends, avoid being misled, and make better decisions with information.
The document discusses objectives and concepts related to statistical analysis in biology, including:
- Types of data, graphs, and statistical analyses such as mean, standard deviation, and chi square analysis.
- Calculating and interpreting the mean and standard deviation of a data set to describe variability.
- Using standard deviation to compare the spread of data between samples and determine significance.
- Performing hypothesis testing using calculated t values, t tables, and p values to determine if differences between data sets are statistically significant.
The document provides guidance on analyzing and interpreting data in teaching elementary science. It discusses the objectives of interpreting data, which include analyzing given data, making interpretations based on evidence, organizing data in different formats, making inferences, and understanding dependent and independent variables. Examples are given of different types of graphs like pie charts, line graphs, and bar graphs that can be used to visualize and analyze data. Steps for interpreting data involve organizing it, creating a graph, looking for trends, making inferences, and checking inferences against existing knowledge. The document emphasizes that interpreting data relies on human judgment and cognition.
This document provides an introduction to biostatistics in nursing. It defines biostatistics as statistics arising from biological sciences like medicine and public health. It discusses the importance of understanding biostatistics for nurses due to the increasing use of quantitative methods in medical research and literature. The document outlines different types of data like qualitative, discrete, continuous and scales of measurement. It also demonstrates how to create a frequency distribution table to organize and summarize patient data.
Standard deviation is a measure of how dispersed data values are from the average. It was developed in 1893 by Karl Pearson as a statistical concept to quantify dispersion. To calculate standard deviation, you find the mean, calculate the deviations from the mean, square the deviations to amplify large values, take the average of the squared deviations, and take the square root. Standard error is the standard deviation of a statistic's sampling distribution and is used to calculate confidence intervals, while standard deviation estimates variability in a population.
Statistics is the science of collecting, analyzing, and drawing conclusions from data. It involves exploring numerical facts like measurements and percentages to make sense of patterns in the world. Studying statistics helps us understand trends, avoid being misled, and make informed decisions using quantitative information.
The W's are not fully specified in the given information:
Who: Myrmecologist Walter Tschinkel of the University of Florida
What: Basic information about many ant species, including scientific name, location, nest depth, number of chambers, number of ants
When: Not specified
Where: Not fully specified, just that location information was included
How: Not specified
Why: To document how new ant colonies begin, ant nest designs, and how nests differ depending on species
This document provides an introduction to statistics. It discusses descriptive statistics, which summarize and describe data, versus inferential statistics, which make generalizations about a population based on a sample. Descriptive statistics include measures like percentages, averages, and tables to characterize data. Inferential statistics are used to compare treatment groups and determine whether observed differences could occur by chance or are likely due to the treatments. The document provides examples of statistics encountered in various fields and emphasizes the importance of understanding statistics to evaluate claims critically.
- Univariate analysis refers to analyzing one variable at a time using statistical measures like proportions, percentages, means, medians, and modes to describe data.
- These measures provide a "snapshot" of a variable through tools like frequency tables and charts to understand patterns and the distribution of cases.
- Measures of central tendency like the mean, median and mode indicate typical or average values, while measures of dispersion like the standard deviation and range indicate how spread out or varied the data are around central values.
Here are the responses to the questions:
1. A statistical population is the entire set of individuals or objects of interest. A sample is a subset of the population selected to represent the population. The sample infers information about the characteristics, attributes, and properties of the entire population.
2. Variance is the average of the squared deviations from the mean. It is calculated as the sum of the squared deviations from the mean divided by the number of values in the data set minus 1. Standard deviation is the square root of the variance. It measures how far data values spread out from the mean.
3. No data was provided to create graphs. Additional data on the number of fish in each age group would be needed.
This document discusses using inferential statistics to analyze the issue of obesity among people. Data from a 2000 study on the connection between obesity and health-related quality of life in individuals aged 18 and older is analyzed. After controlling for socioeconomic factors, the study found obesity to be significantly associated with poorer health. Obesity in both children and adults is linked to increased risk of diseases such as heart disease, diabetes, and high blood pressure. Childhood obesity can also negatively impact physical and psychological well-being.
This document contains the notes from a lecture on contingency tables and related statistical methods. It introduces contingency tables and how they can be used to analyze relationships between variables. It discusses Fisher's exact test and the chi-squared test for assessing independence in contingency tables. Examples are provided to demonstrate contingency table analysis and visualization of results. Additional resampling methods like bootstrapping and Monte Carlo simulation are also mentioned.
Statistics is the study of collecting, organizing, summarizing, and analyzing data. There are two main branches of statistics: descriptive statistics and inferential statistics. Descriptive statistics involves organizing and summarizing data, while inferential statistics involves using a sample of data to draw conclusions about a population. Statistics is used across many fields, especially in research involving biological or medical data, known as biostatistics. Statistics helps effectively present research findings and evaluate research proposals.
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docxeugeniadean34240
2. This exercise uses the dataset “WholeFoods.”
(a) Use Excel to construct a relative histogram for store size. Does the distribution of store size appear to be skewed? If so, does it appear to be skewed to the right or to the left? Explain.
(b) Use Excel to calculate the following four measures of central tendency for store size: mean, median, midrange, and 5% trimmed mean (using the trimmed mean definition from the textbook). Do any of these measures of central tendency appear to not be appropriate for this particular dataset? Explain.
(c) Use Excel to calculate the following four measures of dispersion for store size: variance, standard deviation, mean absolute deviation, and coefficient of variation. Please provide brief and “to-the-point” comments on your results.
(d) According to Chebyshev’s Theorem, at least what percentage of the observations within a sample is supposed to lie within 1.5 sample standard deviations of the sample mean? Next, using Excel, please take the observations for store size in the Whole Foods dataset and confirm that this prediction holds within the Whole Foods sample dataset.
(e) Use Excel to calculate the first quartile, the third quartile, the midhinge, and interquartile range for store size. Next, use Excel to create a box plot graph for store size. (Note: Excel does not have a built-in function for creating a box plot. Your group will need to “figure out” how to do it. For example, the internet has many examples of how to create a box plot in Excel using column/bar charts. You may do either a “horizontal” box plot (i.e., a box plot with the “whiskers” pointing to the right and to the left) or a “vertical” box plot (i.e., a box plot with the “whiskers” pointing to the top and to the bottom).)
(f) Use Excel to calculate both inner fences (left and right) for store size, and then both outer fences (left and right) for store size. Based on these calculated values, are there any “outlier” stores in the data? Any “extreme outlier” stores in the data? If so, which stores are they? (Note: In answering this question, please use the definition of “outlier” and “extreme outlier” provided on page 144 of the textbook; please do not use the definition of “outlier” provided on pages 135-137 of the textbook.) (g) Use Excel to calculate skewness for the variable store size. Is store size skewed right or left? Does your answer corroborate the answer you provided in part 2(a) above?
18 Chapter 1 Exploring Life and Science
• reproduce; and experience growth, and in many cases
development;
• maintain homeostasis to maintain the conditions of an internal
environment;
• respond to stimuli; and
• have an evolutionary history and are adapted to a way oflife.
1.2 Humans Are Related to Other Animals
The classification ofliving organisms mirrors their evolutionary
relationships. Humans are mammals, a type of vertebrate in the
animal kingdom ofthe domain Eukarya. Humans differ from other
mammals, including apes, .
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data WorldDatabeers Dublin
Cathal Walsh is a chair in statistics and health decision science researcher who focuses on model calibration, evidence synthesis, and identifying decision makers' utilities. His work considers issues with big data such as whether conclusions can be drawn from all available data rather than just a sample, and the human decisions and biases that can impact modeling and data analysis. He questions assumptions around data collection and modeling in order to develop a more rigorous statistical framework.
Statistics is a powerful tool for both researchers and decision makers, yet, there remains many misuse, misinterpretations, and misrepresentations of statistics. This seminar aims at raising awareness of common misconceptions in statistics in social science and beyond (e.g. media, readers). I do not own the copyrights of the materials in this presentation, all the sources were added in the bottom of the slide in which I borrowed the figures from other sources.
The document summarizes statistical analysis concepts and methods used to analyze biological data, including calculating means, standard deviations, and using t-tests to determine the significance of differences between data sets. It provides an example comparing bill length measurements in two hummingbird species. The mean bill length is slightly higher in C. latirostris, but A. colubris shows greater variability. A t-test is needed to determine if the difference in means is statistically significant given the overlap between the error bars representing standard deviation.
Researchers collected data on bill length from two hummingbird species: the red-throated hummingbird and the broadbilled hummingbird. They calculated the mean and standard deviation for each species to test whether there was a significant difference in bill length between the two. The mean bill length was 15.9mm for the red-throated hummingbird and 18.8mm for the broadbilled hummingbird. The standard deviation was higher for the red-throated hummingbird at 1.91 compared to 1.03 for the broadbilled hummingbird, indicating greater variability in bill length for the former species. Error bars were added to a graph to visually represent the standard deviation and variability between the two datasets.
This document discusses various statistical concepts including outliers, transforming data, normalizing data, weighting data, robustness, and homoscedasticity and heteroscedasticity. Outliers are values far from other data points and should be carefully examined before removing. Data can be transformed using logarithms, square roots, or other functions to better fit a normal distribution or equalize variances between groups. Normalizing data puts variables on comparable scales. Weighting data adjusts for under- or over-representation in samples. Robust tests are resistant to violations of assumptions. Homoscedasticity refers to equal variances between groups while heteroscedasticity refers to unequal variances.
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxpooleavelina
Homework #1
SOCY 3115
Spring 20
Read the Syllabus and FAQ on how to do your homework before beginning the assignment!
To get consideration for full credit, you must:
· Follow directions;
· Show all work required to arrive at answer (statistical calculations often require multiple steps, so you need to write these down, not just skip to the final answer)
· Use appropriate statistical notation at all times (e.g. if you are calculating a population mean, begin with the equation for population mean)
· Use units in your answer, where appropriate (e.g. a mean time would be “6.5 hours” rather than just “6.5”)
Understanding the Structure of Data
1. For the following rectangular dataset:
Id
Highest degree
Works full-time
Annual income cat
1
Did not grad HS
Yes
Low
2
HS dip
Yes
Low
3
HS dip
No
Med
4
BA
No
Low
5
BA
Yes
Med
6
MA
Yes
High
7
HS dip
Yes
Med
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For eachvariable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement?
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
2. For the following rectangular dataset:
Id
num_bdrms
num_bthrms
sqft
Ranch
1
4
3
3200
Yes
2
2
1.5
2800
Yes
3
2
1
1200
Yes
4
3
2
1500
No
5
2
2
1100
No
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For each variable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement? Before answering, be sure to consult the slide called “Level of measurement – language to use”. Use the formal language!
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
3. For each of the following questions (1) construct a dataset with one variable and three observations (2) add data that could have theoretically been collected (just make up the actual responses to the question); and (3) indicate the level-of-measurement of the variable. I’ve done two examples for you.
Example#1:
What is your current age? (individual is the unit-of-analysis)
idage
1 25
2 32
3 61
The age variable is continuous/interval ratio.
Example#2:
What is the size of this hospital based on number of beds? (hospital is the unit-of-analysis)? Answers can be small (1-100 beds), medium (101-500 beds), large (501 beds to 1000 beds), extra large (1001+ beds)
idhosp_size
1 med
2 med
3 ext ...
Epidemiological Analysis Workshop By Dr Suzanne Campbell COUNTDOWN on NTDs
The document provides an overview of a training workshop on epidemiological analysis. It discusses key concepts in epidemiology including study design, data collection, and analytical approaches. It emphasizes the importance of designing an analysis plan from the beginning to determine the appropriate statistical techniques for answering research questions based on the study objectives, design, and variables. The document uses an example of a cross-sectional study of schistosomiasis in Cameroon to demonstrate how to develop an analysis plan, including exploring variables, examining relationships, and considering regression modeling to adjust for confounding factors.
Chapter One (Salkind)Statistics of Sadistics .docxtiffanyd4
Chapter One (Salkind)
Statistics of Sadistics?
An Overview of This ChapterIn this chapter we cover the following items …
Part One: Why Statistics?
Part Two: A 5-Minute History of Statistics
Part Three: Statistics: What It Is (And Isn’t)
Part Four: What Am I Doing In A Statistics Class?
Part Five: An Eye Toward The Future
*
Part One
Why Statistics?
Why Statistics?There may be dozens of reasons why statistics makes you groan:
I don’t like math
I don’t understand math.
Statistics are tough.
Why do I even need this statistics stuff?
Why can’t I just let a computer do statistics for me?
*
Why Statistics?There may be dozens of reasons why statistics makes you groan :
I don’t plan on doing research when I graduate, so learning statistics is waste of time.
The minute I leave class, I won’t remember any of this stuff.
Those odd Greek symbols ( µ Σ σ ) make no sense!
And a few other biggies …
*
Why Statistics?There may be dozens of reasons why statistics makes you groan :
What am I supposed to do with a formula as complex as this?
Or this?
*
Why Statistics?Although you might have some trepidation about statistics and their applications, this course is designed to help you see past your fears, skepticism, and anxiousness about statistics
I know statistics may scare a lot of you, but I think that is mostly because you don’t have the foundation yet to understand when, how, and why you should use statistics.
But don’t let fear of the unknown prevent you from getting that knowledge. In a month, you might actually think methods is fun!
*
Part Two
A 5-Minute History Of Statistics
A 5-Minute History of StatisticsImagine a caveman looking for food. He comes across a huge herd of bison, and he knows he can kill them all now OR come back time and time again. What does he do? What information does he use to make his decision about how many to kill and eat?
If you said counting, you’re absolutely correct! Why deplete the food source if you only need a few bison at a time to live? Why not take only what you need and come back for more later?
As our earliest ancestors came to know, counting is not only a good idea, it is a useful skill
A 5-Minute History of StatisticsWe have come a long way from counting bison, as has our skills in trying to predict today what we hope to find tomorrow.
A great deal of statistics compares our expectations with reality. If I expect XYZ to occur, will it? If it does occur, how confident can I be that I understand WHY it occurred.
You’ll learn about a wide variety of statistics in this course, with some stats based on correlations (a relationship among variables) and some based on causation (A leads to B).
A 5-Minute History of StatisticsCorrelational Research
In correlational research, there is a relationship between two variables. That is, variables A and B are linked somehow
When variable A increases, B also increases
The more I like the color green, the mo.
1) Statistics are used to describe patterns in nature that are difficult to see with the naked eye, including in sports, economics, medicine, and education.
2) While statistics from verifiable data sources can be difficult to lie with, supplemental personal information is needed, as Joseph Stalin said "one death is a tragedy, a million deaths are a statistic."
3) Common types of statistics include measures of central tendency, measures of spread, and probability, but summary statistics can never tell the whole story on their own.
The document provides an overview of quantitative data analysis and statistics. It discusses different types of data, ways to visualize data through various plots and charts, key statistical concepts like the mean, median, mode, variance and standard deviation. It also covers important contributors to the field like John Tukey who introduced the box plot, and Karl Pearson who coined the term "standard deviation". Sample questions are included about calculating statistics from data sets.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Statistics is the science of collecting, analyzing, and drawing conclusions from data. It involves exploring numerical facts like measurements and percentages to make sense of patterns in the world. Studying statistics helps us understand trends, avoid being misled, and make informed decisions using quantitative information.
The W's are not fully specified in the given information:
Who: Myrmecologist Walter Tschinkel of the University of Florida
What: Basic information about many ant species, including scientific name, location, nest depth, number of chambers, number of ants
When: Not specified
Where: Not fully specified, just that location information was included
How: Not specified
Why: To document how new ant colonies begin, ant nest designs, and how nests differ depending on species
This document provides an introduction to statistics. It discusses descriptive statistics, which summarize and describe data, versus inferential statistics, which make generalizations about a population based on a sample. Descriptive statistics include measures like percentages, averages, and tables to characterize data. Inferential statistics are used to compare treatment groups and determine whether observed differences could occur by chance or are likely due to the treatments. The document provides examples of statistics encountered in various fields and emphasizes the importance of understanding statistics to evaluate claims critically.
- Univariate analysis refers to analyzing one variable at a time using statistical measures like proportions, percentages, means, medians, and modes to describe data.
- These measures provide a "snapshot" of a variable through tools like frequency tables and charts to understand patterns and the distribution of cases.
- Measures of central tendency like the mean, median and mode indicate typical or average values, while measures of dispersion like the standard deviation and range indicate how spread out or varied the data are around central values.
Here are the responses to the questions:
1. A statistical population is the entire set of individuals or objects of interest. A sample is a subset of the population selected to represent the population. The sample infers information about the characteristics, attributes, and properties of the entire population.
2. Variance is the average of the squared deviations from the mean. It is calculated as the sum of the squared deviations from the mean divided by the number of values in the data set minus 1. Standard deviation is the square root of the variance. It measures how far data values spread out from the mean.
3. No data was provided to create graphs. Additional data on the number of fish in each age group would be needed.
This document discusses using inferential statistics to analyze the issue of obesity among people. Data from a 2000 study on the connection between obesity and health-related quality of life in individuals aged 18 and older is analyzed. After controlling for socioeconomic factors, the study found obesity to be significantly associated with poorer health. Obesity in both children and adults is linked to increased risk of diseases such as heart disease, diabetes, and high blood pressure. Childhood obesity can also negatively impact physical and psychological well-being.
This document contains the notes from a lecture on contingency tables and related statistical methods. It introduces contingency tables and how they can be used to analyze relationships between variables. It discusses Fisher's exact test and the chi-squared test for assessing independence in contingency tables. Examples are provided to demonstrate contingency table analysis and visualization of results. Additional resampling methods like bootstrapping and Monte Carlo simulation are also mentioned.
Statistics is the study of collecting, organizing, summarizing, and analyzing data. There are two main branches of statistics: descriptive statistics and inferential statistics. Descriptive statistics involves organizing and summarizing data, while inferential statistics involves using a sample of data to draw conclusions about a population. Statistics is used across many fields, especially in research involving biological or medical data, known as biostatistics. Statistics helps effectively present research findings and evaluate research proposals.
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docxeugeniadean34240
2. This exercise uses the dataset “WholeFoods.”
(a) Use Excel to construct a relative histogram for store size. Does the distribution of store size appear to be skewed? If so, does it appear to be skewed to the right or to the left? Explain.
(b) Use Excel to calculate the following four measures of central tendency for store size: mean, median, midrange, and 5% trimmed mean (using the trimmed mean definition from the textbook). Do any of these measures of central tendency appear to not be appropriate for this particular dataset? Explain.
(c) Use Excel to calculate the following four measures of dispersion for store size: variance, standard deviation, mean absolute deviation, and coefficient of variation. Please provide brief and “to-the-point” comments on your results.
(d) According to Chebyshev’s Theorem, at least what percentage of the observations within a sample is supposed to lie within 1.5 sample standard deviations of the sample mean? Next, using Excel, please take the observations for store size in the Whole Foods dataset and confirm that this prediction holds within the Whole Foods sample dataset.
(e) Use Excel to calculate the first quartile, the third quartile, the midhinge, and interquartile range for store size. Next, use Excel to create a box plot graph for store size. (Note: Excel does not have a built-in function for creating a box plot. Your group will need to “figure out” how to do it. For example, the internet has many examples of how to create a box plot in Excel using column/bar charts. You may do either a “horizontal” box plot (i.e., a box plot with the “whiskers” pointing to the right and to the left) or a “vertical” box plot (i.e., a box plot with the “whiskers” pointing to the top and to the bottom).)
(f) Use Excel to calculate both inner fences (left and right) for store size, and then both outer fences (left and right) for store size. Based on these calculated values, are there any “outlier” stores in the data? Any “extreme outlier” stores in the data? If so, which stores are they? (Note: In answering this question, please use the definition of “outlier” and “extreme outlier” provided on page 144 of the textbook; please do not use the definition of “outlier” provided on pages 135-137 of the textbook.) (g) Use Excel to calculate skewness for the variable store size. Is store size skewed right or left? Does your answer corroborate the answer you provided in part 2(a) above?
18 Chapter 1 Exploring Life and Science
• reproduce; and experience growth, and in many cases
development;
• maintain homeostasis to maintain the conditions of an internal
environment;
• respond to stimuli; and
• have an evolutionary history and are adapted to a way oflife.
1.2 Humans Are Related to Other Animals
The classification ofliving organisms mirrors their evolutionary
relationships. Humans are mammals, a type of vertebrate in the
animal kingdom ofthe domain Eukarya. Humans differ from other
mammals, including apes, .
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data WorldDatabeers Dublin
Cathal Walsh is a chair in statistics and health decision science researcher who focuses on model calibration, evidence synthesis, and identifying decision makers' utilities. His work considers issues with big data such as whether conclusions can be drawn from all available data rather than just a sample, and the human decisions and biases that can impact modeling and data analysis. He questions assumptions around data collection and modeling in order to develop a more rigorous statistical framework.
Statistics is a powerful tool for both researchers and decision makers, yet, there remains many misuse, misinterpretations, and misrepresentations of statistics. This seminar aims at raising awareness of common misconceptions in statistics in social science and beyond (e.g. media, readers). I do not own the copyrights of the materials in this presentation, all the sources were added in the bottom of the slide in which I borrowed the figures from other sources.
The document summarizes statistical analysis concepts and methods used to analyze biological data, including calculating means, standard deviations, and using t-tests to determine the significance of differences between data sets. It provides an example comparing bill length measurements in two hummingbird species. The mean bill length is slightly higher in C. latirostris, but A. colubris shows greater variability. A t-test is needed to determine if the difference in means is statistically significant given the overlap between the error bars representing standard deviation.
Researchers collected data on bill length from two hummingbird species: the red-throated hummingbird and the broadbilled hummingbird. They calculated the mean and standard deviation for each species to test whether there was a significant difference in bill length between the two. The mean bill length was 15.9mm for the red-throated hummingbird and 18.8mm for the broadbilled hummingbird. The standard deviation was higher for the red-throated hummingbird at 1.91 compared to 1.03 for the broadbilled hummingbird, indicating greater variability in bill length for the former species. Error bars were added to a graph to visually represent the standard deviation and variability between the two datasets.
This document discusses various statistical concepts including outliers, transforming data, normalizing data, weighting data, robustness, and homoscedasticity and heteroscedasticity. Outliers are values far from other data points and should be carefully examined before removing. Data can be transformed using logarithms, square roots, or other functions to better fit a normal distribution or equalize variances between groups. Normalizing data puts variables on comparable scales. Weighting data adjusts for under- or over-representation in samples. Robust tests are resistant to violations of assumptions. Homoscedasticity refers to equal variances between groups while heteroscedasticity refers to unequal variances.
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxpooleavelina
Homework #1
SOCY 3115
Spring 20
Read the Syllabus and FAQ on how to do your homework before beginning the assignment!
To get consideration for full credit, you must:
· Follow directions;
· Show all work required to arrive at answer (statistical calculations often require multiple steps, so you need to write these down, not just skip to the final answer)
· Use appropriate statistical notation at all times (e.g. if you are calculating a population mean, begin with the equation for population mean)
· Use units in your answer, where appropriate (e.g. a mean time would be “6.5 hours” rather than just “6.5”)
Understanding the Structure of Data
1. For the following rectangular dataset:
Id
Highest degree
Works full-time
Annual income cat
1
Did not grad HS
Yes
Low
2
HS dip
Yes
Low
3
HS dip
No
Med
4
BA
No
Low
5
BA
Yes
Med
6
MA
Yes
High
7
HS dip
Yes
Med
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For eachvariable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement?
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
2. For the following rectangular dataset:
Id
num_bdrms
num_bthrms
sqft
Ranch
1
4
3
3200
Yes
2
2
1.5
2800
Yes
3
2
1
1200
Yes
4
3
2
1500
No
5
2
2
1100
No
a. What is the unit-of-analysis of the dataset?
b. How many variables are in the dataset?
c. How many observations/cases are in the dataset?
d. For each variable that is not named “id”:
i. What is the variable name?
ii. What is the level-of-measurement? Before answering, be sure to consult the slide called “Level of measurement – language to use”. Use the formal language!
iii. What are the values for the variable?
iv. If you had to make a guess, what do you think the “question” was that was asked of the unit-of-analysis to get these data? (for example, if we had a continuous variable called “num_pets” the question might be “How many pets live in your household?”)
3. For each of the following questions (1) construct a dataset with one variable and three observations (2) add data that could have theoretically been collected (just make up the actual responses to the question); and (3) indicate the level-of-measurement of the variable. I’ve done two examples for you.
Example#1:
What is your current age? (individual is the unit-of-analysis)
idage
1 25
2 32
3 61
The age variable is continuous/interval ratio.
Example#2:
What is the size of this hospital based on number of beds? (hospital is the unit-of-analysis)? Answers can be small (1-100 beds), medium (101-500 beds), large (501 beds to 1000 beds), extra large (1001+ beds)
idhosp_size
1 med
2 med
3 ext ...
Epidemiological Analysis Workshop By Dr Suzanne Campbell COUNTDOWN on NTDs
The document provides an overview of a training workshop on epidemiological analysis. It discusses key concepts in epidemiology including study design, data collection, and analytical approaches. It emphasizes the importance of designing an analysis plan from the beginning to determine the appropriate statistical techniques for answering research questions based on the study objectives, design, and variables. The document uses an example of a cross-sectional study of schistosomiasis in Cameroon to demonstrate how to develop an analysis plan, including exploring variables, examining relationships, and considering regression modeling to adjust for confounding factors.
Chapter One (Salkind)Statistics of Sadistics .docxtiffanyd4
Chapter One (Salkind)
Statistics of Sadistics?
An Overview of This ChapterIn this chapter we cover the following items …
Part One: Why Statistics?
Part Two: A 5-Minute History of Statistics
Part Three: Statistics: What It Is (And Isn’t)
Part Four: What Am I Doing In A Statistics Class?
Part Five: An Eye Toward The Future
*
Part One
Why Statistics?
Why Statistics?There may be dozens of reasons why statistics makes you groan:
I don’t like math
I don’t understand math.
Statistics are tough.
Why do I even need this statistics stuff?
Why can’t I just let a computer do statistics for me?
*
Why Statistics?There may be dozens of reasons why statistics makes you groan :
I don’t plan on doing research when I graduate, so learning statistics is waste of time.
The minute I leave class, I won’t remember any of this stuff.
Those odd Greek symbols ( µ Σ σ ) make no sense!
And a few other biggies …
*
Why Statistics?There may be dozens of reasons why statistics makes you groan :
What am I supposed to do with a formula as complex as this?
Or this?
*
Why Statistics?Although you might have some trepidation about statistics and their applications, this course is designed to help you see past your fears, skepticism, and anxiousness about statistics
I know statistics may scare a lot of you, but I think that is mostly because you don’t have the foundation yet to understand when, how, and why you should use statistics.
But don’t let fear of the unknown prevent you from getting that knowledge. In a month, you might actually think methods is fun!
*
Part Two
A 5-Minute History Of Statistics
A 5-Minute History of StatisticsImagine a caveman looking for food. He comes across a huge herd of bison, and he knows he can kill them all now OR come back time and time again. What does he do? What information does he use to make his decision about how many to kill and eat?
If you said counting, you’re absolutely correct! Why deplete the food source if you only need a few bison at a time to live? Why not take only what you need and come back for more later?
As our earliest ancestors came to know, counting is not only a good idea, it is a useful skill
A 5-Minute History of StatisticsWe have come a long way from counting bison, as has our skills in trying to predict today what we hope to find tomorrow.
A great deal of statistics compares our expectations with reality. If I expect XYZ to occur, will it? If it does occur, how confident can I be that I understand WHY it occurred.
You’ll learn about a wide variety of statistics in this course, with some stats based on correlations (a relationship among variables) and some based on causation (A leads to B).
A 5-Minute History of StatisticsCorrelational Research
In correlational research, there is a relationship between two variables. That is, variables A and B are linked somehow
When variable A increases, B also increases
The more I like the color green, the mo.
1) Statistics are used to describe patterns in nature that are difficult to see with the naked eye, including in sports, economics, medicine, and education.
2) While statistics from verifiable data sources can be difficult to lie with, supplemental personal information is needed, as Joseph Stalin said "one death is a tragedy, a million deaths are a statistic."
3) Common types of statistics include measures of central tendency, measures of spread, and probability, but summary statistics can never tell the whole story on their own.
The document provides an overview of quantitative data analysis and statistics. It discusses different types of data, ways to visualize data through various plots and charts, key statistical concepts like the mean, median, mode, variance and standard deviation. It also covers important contributors to the field like John Tukey who introduced the box plot, and Karl Pearson who coined the term "standard deviation". Sample questions are included about calculating statistics from data sets.
Similar to Explore, Analyze and Present your data (20)
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
7. Statistics are scary
cool
We have to deal with
them anyways, so we
had better enjoy them!
Statistics
(You at the end of the talk)
8. Press the
t-test button and
you’ll be done!
Did you check
the normality of
your data first?
9. Why should you care about statistics?
http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf
10. Why should you care about statistics?
Advances in Physiological Education
“Explorations in Statistics” series (2008-present)
(Douglas Curran-Everett)
11. Why should you care about statistics?
“Statistical Perspectives” series (2011-present)
(Gordon Drummond)
The Journal of Physiology
Experimental Physiology
The British Journal of Pharmacology
Microcirculation
The British Journal of Nutrition
http://jp.physoc.org/cgi/collection/stats_reporting
12. Why should you care about statistics?
Importance of being uncertain – September 2013
How samples are used to estimate population statistics and what this means in terms of
uncertainty.
Error Bars – October 2013
The use of error bars to represent uncertainty and advice on how to interpret them.
Significance, P values and t-tests – November 2013
Introduction to the concept of statistical significance and the one-sample t-test.
http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html
13. Why should you care about statistics?
“Journals […] fail to exert sufficient scrutiny over the results
that they publish”
“Nature research journals will introduce editorial measures to
address the problem by improving the consistency and quality of
reporting in life-sciences articles”
“We will examine statistics more closely and encourage authors
to be transparent, for example by including their raw data”
15. A picture is worth a thousand words
John Snow
(1813-1858)
Location of deaths in the 1854 London Cholera Epidemic
16. Why visualize your data?
The Anscombe’s quartet example
Dataset #1
Dataset #2
Dataset #3
Dataset #4
x
y
x
y
x
y
x
y
10
8.04
10
9.14
10
7.46
8
6.58
8
6.95
8
8.14
8
6.77
8
5.76
13
7.58
13
8.74
13 12.74
8
7.71
9
8.81
9
8.77
9
7.11
8
8.84
11
8.33
11
9.26
11
7.81
8
8.47
14
9.96
14
8.1
14
8.84
8
7.04
6
7.24
6
6.13
6
6.08
8
5.25
4
4.26
4
3.1
4
5.39
19
12.5
12 10.84
12
9.13
12
8.15
8
5.56
7
4.82
7
7.26
7
6.42
8
7.91
5
5.68
5
4.74
5
5.73
8
6.89
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
17. Why visualize your data?
The Anscombe’s quartet example
Property in each case
Value
Mean of x
9 (exact)
Variance of x
11 (exact)
Mean of y
7.5
Variance of y
4.122 or 4.127
Correlation of x and y
0.816
Linear regression line
y = 3.00 + 0.500x
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
18. Why visualize your data?
The Anscombe’s quartet example
Dataset #1
Dataset #2
Dataset #3
Dataset #4
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
19. Why visualize your data?
The Anscombe’s quartet example
Dataset #1
Dataset #2
Dataset #3
Dataset #4
Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
20. Visualize your data in their raw form!
Aim for revelation rather than mere summary
A great graphic with raw data will reveal
unexpected patterns and invites us to
make comparisons we might not have
thought of beforehand.
21. If you are still not convinced …
Mean: 16 / Stdv: 5
22. If you are still not convinced …
Mean: 16 / Stdv: 5
23. If you are still not convinced …
Mean: 16 / Stdv: 5
e
WBM secondary transplantation
(16 weeks)
Daniel’s Journal Club paper
Donor engraftment (%)
80
P < 0.05
60
40
20
0
flDMR/+
DMR/+
mH19
24. Avoid making bar graphs
“To maintain the highest level of trustworthiness of data,
we are encouraging authors to display data in their raw
form and not in a fashion that conceals their variance.
Presenting data as columns with error bars (dynamite
plunger plots) conceals data. We recommend that
individual data be presented as dot plots shown next to
the average for the group with appropriate error bars
(Figure 1).”
Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133
25. Avoid making bar graphs
Error bars
Different types, different meanings
100
SORRY
,
WE JUST
75
YOU...
• descriptive statistics (Range, SD)
• inferential statistics (SE, CI)
50
25
0
Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
26. Avoid making bar graphs
Error bars
Different types, different meanings
• descriptive statistics (Range, SD)
• inferential statistics (SE, CI)
Often, they also imply a
symmetrical distribution of the
data.
Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
27. Avoid making bar graphs
Mean and Standard deviation are only useful in the
context of a “normal distribution”
95%
µ
95% of a normal distribution lies within two
standard deviations (σ) of the mean (µ)
28. Avoid making bar graphs
symmetrical
distribution
skewed
distribution
Data presentation to reveal the distribution of the data
• Display data in their raw form.
• A dot plot is a good start.
• “Dynamite plunger plots” conceal data.
• Check the pattern of distribution of the values.
29. Avoid making bar graphs
symmetrical
distribution
skewed
distribution
• First set: Gaussian (or normal) distribution (symmetrically distributed)
• Second set: right skewed, lognormal (few large values)
“ This type of distribution of values is quite common in biology (ex: plasma concentrations
of immune or inflammatory mediators)”
“Plunger plots only: who would know that the values were skewed – ...
... and that the common statistical tests would be inappropriate?”
30. Avoid making bar graphs
Don't tell me no one warned you before!
Bar graph
Dynamite plunger
31. Summary
Why visualize your data?
For others ...
Providing a narrative for the reader
But primarily for you ...
Looking for patterns and relationships
Summarize complex data structures
Help avoid erroneous conclusions based upon questionable or
unexpected data
37. Is the mean always a good descriptor?
# of children per household in China (2012)
• mean: 1.35
http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
38. Is the mean always a good descriptor?
# of children per household in China (2012)
• mean: 1.35
• median: 1
more representative of the
“typical” family (One child policy)
http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
39. Any measure is wrong!
“Whenever you make a measurement, you must
know the uncertainty otherwise it is meaningless”
Walter Lewis (MIT)
183.3cm
185.7cm
http://www.youtube.com/watch?v=JUxHebuXviM
40. Any measure is wrong!
“Whenever you make a measurement, you must
know the uncertainty otherwise it is meaningless”
Walter Lewis (MIT)
The same concept applies when you
report your data!
Provide the uncertainty of your descriptor
hint: this is NOT the standard deviation
41. Any measure is wrong!
“Whenever you make a measurement, you must
know the uncertainty otherwise it is meaningless”
Walter Lewis (MIT)
The same concept applies when you
report your data!
Provide the uncertainty of your descriptor
hint: this is NOT the standard deviation
Report the Confidence Interval of your descriptor
42. The Bootstrap: origin
Modern electronic computation has encouraged a host of new statistical methods
that require fewer distributional assumptions than their predecessors and
can be applied to more complicated statistical estimators. These methods allow
[...] to explore and describe data and draw valid statistical inferences without the
usual concerns for mathematical tractability.
Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5
43. Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
44. Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2
A2
an
a1
an
a1
a3
a4
mA3
A2
a4
a3
an
a5
a1
a3
mA4
...
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
45. Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2
A2
an
a1
an
a1
a3
a4
mA3
A2
a4
a3
an
a5
a1
a3
mA4
...
...
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
46. Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2
A2
an
a1
an
a1
a3
a4
mA3
A2
a4
a3
an
a5
a1
a3
mA4
...
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
47. Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2
A2
an
a1
an
a1
a3
a4
mA3
A2
a4
a3
an
a5
a1
a3
mA4
...
5.18 [4.91, 4.47]
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
49. Choose your statistical test wisely
Authors Guidelines
Every paper that contains statistical testing should state
[...] a justification for the use of that test (including, for
example, a discussion of the normality of the data when the
test is appropriate only for normal data), [...], whether the
tests were one-tailed or two-tailed, and the actual P value
for each test (not merely "significant" or "P < 0.5").
http://www.nature.com/nature/authors/gta/#a5.6
50. The simple case (How to)
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
51. The simple case (How to)
Distribution of the data?
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
52. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
53. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
• fit of the histogram
54. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
• fit of the histogram
55. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
• fit of the histogram
• QQ plot
Male
ith point
A(i)
Theoretical quantiles of the distribution
Φ
−1
i − 3/8
n + 1/4
56. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
• fit of the histogram
• QQ plot
not “normal”
57. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
• fit of the histogram
• QQ plot
Female
Male
Male
58. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
visual
inspection
mean/std
187.0 ± 19.8
• fit of the histogram
• QQ plot
Female
Male
Male
59. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
visual
inspection
mean/std
test
187.0 ± 19.8
Male
• fit of the histogram
• QQ plot
• Shapiro-Wilk test
60. The simple case (How to)
Distribution of the data?
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
visual
inspection
mean/std
test
187.0 ± 19.8
Male
• fit of the histogram
• QQ plot
• Shapiro-Wilk test
Null Hypothesis for the SW test:
Data are normally distributed
Female
p-value: 0.9195
Male
p-value: 0.3866
61. The simple case (How to)
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
Distribution of the data?
Normally distributed
62. The simple case (How to)
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
Distribution of the data?
Normally distributed
63. The simple case (How to)
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
Distribution of the data?
Normally distributed
64. The simple case (How to)
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
Distribution of the data?
Normally distributed
Statistical test?
t-test
65. The simple case (How to)
difference/ci
51.2 [50.4, 51.9]
mean/std
135.9 ± 19.0
Female
mean/std
187.0 ± 19.8
Male
Distribution of the data?
Normally distributed
Statistical test?
t-test
Null Hypothesis for the t-test:
Data belong to the same population
t-test
p-value < 2.2e-16
75. Computing the bootstrap p-value
Are the two samples different?
Observed difference = 0.44
If the two samples were from the same population,
what would the probabilities be that the observed
difference was from chance alone?
82. Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an
D0 = mA-mB
(0.44)
B0
b2 b3 b1
b4 b5 bn
a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1
Repeat
10000 times
(D1 ... D10000)
How many pseudo-differences are
greater or equal than the observed
difference D0 ?
(0.44)
83. Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an
D0 = mA-mB
(0.44)
B0
b2 b3 b1
b4 b5 bn
a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1
How many pseudo-differences are
greater or equal than the observed
difference D0 ?
Repeat
10000 times
(D1 ... D10000)
(0.44)
9829<D0
171>D0
84. Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an
D0 = mA-mB
(0.44)
B0
b2 b3 b1
b4 b5 bn
a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1
How many pseudo-differences are
greater or equal than the observed
difference D0 ?
171
= 0.0171
p=
10000
(one-tailed)
Repeat
10000 times
(D1 ... D10000)
(0.44)
9829<D0
171>D0
85. Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an
D0 = mA-mB
(0.44)
B0
b2 b3 b1
b4 b5 bn
MW: p = 0.0169
171
= 0.0171
p=
10000
(one-tailed)
a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1
How many pseudo-differences are
greater or equal than the observed
difference D0 ?
Repeat
10000 times
(D1 ... D10000)
(0.44)
9829<D0
171>D0
86. Summary
How do my data look like?
Distribution?
• visual inspection (hist. / QQ plot)
• normality test
What do I want to compare?
• parametric test
Right statistical test? • non parametric test
• resampling statistics
90. Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
91. Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
Training has a larger effect in the mutant
mice than in the control mice!
92. Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
Training has a larger effect in the mutant
mice than in the control mice!
93. Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
*
Activity
Extreme scenario:
- training-induced activity barely reaches
significance in mutant mice (e.g., 0.049) and
barely fails to reach significance for control
mice (e.g., 0.051)
-
+
-
+
control
mutant
Does not test whether training effect for mutant mice differs
statistically from that for control mice.
94. Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
When making a comparison between two
effects, always report the statistical
significance of their difference rather than
the difference between significance levels.
Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”,
Nat Neuroscience, 14(9):1105-1107
95. P-values do not convey information
Mean: 16
SD: 5
Mean: 20
SD: 5
Difference = 4
p-value = 0.1090
96. P-values do not convey information
Mean: 16
SD: 5
Mean: 20
SD: 5
Difference = 4
p-value = 0.1090
0.0367
97. P-values do not convey information
Mean: 16
SD: 5
Mean: 20
SD: 5
Difference = 4
p-value = 0.1090
0.0367
0.0009
98. P-values do not convey information
Fact: Most applied scientists use p-values as a measure of evidence
and of the size of the effect
- The probability of hypotheses depends on much more than just the p-value.
- This topic has renewed importance with the advent of the massive multiple
testing often seen in genomics studies
8
“Manhattan plot”
-log10(P)
6
4
2
Loannidis JP, (2005) PLoS Med 2(8):e124
0
1
2
3
4
5
6
7
8
9
10 11 12
13 14 15 16 17 18 19
20
100. P-value is function of the sample size
Measured Effect Size:
difference = 0.018 mV
Amplitude (mV)
Control
Atropine
0.5 mV
100 ms
0.4
0.2
0
control
atropine
(n=6777) (n=5272)
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
101. P-value is function of the sample size
Measured Effect Size:
difference = 0.018 mV
Amplitude (mV)
Control
Atropine
0.5 mV
100 ms
p = 10-5
0.4
0.2
0
control
atropine
(n=6777) (n=5272)
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
102. P-value is function of the sample size
P (t-test)
100
not significant
10–2
significant
10–4
101
102
103
Hedges' g
0.4
0.2
0.018 mV
0
–0.2
–0.4
101
102
103
Sample size
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
107. Bootstrap effect size and 95% CIs
Do the 95% confidence intervals of
the observed effect size include
zero (no difference)?
0.44 [0.042, 0.853]
Eff. size = 0.44
A
B
250th
9750th
109. Statistical vs Biological significance
“The P value reported by tests is a probabilistic significance, not a
biological one.”
“Statistical significance suggests but does not imply biological
significance.”
Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”.
Nature Methods 10, 1041–1042
110. Statistical vs Biological significance
Statistical significance has a meaning in a specific context
No change
Small change
Large change
Biological consequences?
111. Statistical vs Biological significance
AB
PD
LP
LP 1
PY
LP 2
“Good enough” solutions
0.60
1,600
0.50
mRNA copy number
Conductances at +15 mV (µS/nF)
Somato-gastric ganglion
0.40
0.30
0.20
0.10
0
1,400
1,200
1,000
800
600
400
200
Kd
K Ca
A-type
0
shab
BK-KC
shal
Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons
in different animals". Nat Neurosci. 9: 356– 362
112. Statistical vs Biological significance
Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in
cardiac myocytes". J Physiol 589(Pt 24):6081-92
113. Statistical vs Biological significance
Breast cancer study
Difference in cancer returning between control vs
low-fat diet groups.
Authors conclusions:
People with low-fat diets had a 25% less chance of cancer returning
114. Statistical vs Biological significance
Breast cancer study
Difference in cancer returning between control vs
low-fat diet groups.
Authors conclusions:
People with low-fat diets had a 25% less chance of cancer returning
Actual return rates:
- control: 12.4%
- low-fat diet: 9.8%
Difference
2.6%
2.6
9.8 =
26.5%
115. Beware of false positives
(from the authors)
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic
Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
116. Beware of false positives
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic
Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
117. Beware of false positives
2012
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic
Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
122. Know your audience
who is my audience? level of understanding?
Who? what do they already know?
Why?
What?
How?
123. Know your audience
who is my audience? level of understanding?
Who? what do they already know?
why am I presenting?
Why? what do my audience want to achieve?
What?
How?
124. Know your audience
who is my audience? level of understanding?
Who? what do they already know?
why am I presenting?
Why? what do my audience want to achieve?
what do I want my audience to know?
What? which story will captivate the audience?
How?
125. Know your audience
who is my audience? level of understanding?
Who? what do they already know?
why am I presenting?
Why? what do my audience want to achieve?
what do I want my audience to know?
What? which story will captivate the audience?
what medium will support the message the best?
How? what format/layout will appeal to the audience?
126. Color blindness is a common disease
Males: one in 12 (8%) / Females: one in 200 (0.5%)
127. Color blindness is a common disease
“Anyone who needs to be convinced that making scientific
images more accessible is a worthwhile task [...]: if your next
grant or manuscript submission contains color figures, what if
some of your reviewers are color blind? Will they be able to
appreciate your figures? Considering the competition for funding
and for publication, can you afford the possibility of frustrating
your audience? The solution is at hand."
Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog
(http://blogs.nature.com/nautilus/2007/02/post_4.html)
128. Making figures for color blind people
Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441
147. Common mistakes in data reporting
Fig 1I
“We found that relative to WT mice, the luminal
microbiota of Il10−/− mice exhibited a ~100-fold
increase in E. coli (Fig. 1I)”
Arthur et al, (2012) Science 5;338(6103):120-3
152. Common mistakes in data reporting
Percent Return on Investment
40
30
20
10
0
year1
40
year2
year3
Group
year4 Group A B
Percent Return on Investment
Group A
30
Group B
20
10
0
year1
year2
year3
year4
153. Thank you!
“The important thing is not to stop questioning.
Curiosity has its own reason for existing”
- Albert Einstein-