This document discusses key concepts in statistics including:
- Descriptive statistics such as measures of central tendency (mean, median, mode), measures of dispersion (range, interquartile range, standard deviation, variance), and measures of shape.
- The difference between parameters and statistics, and how statistics are used to estimate population parameters.
- Types of data including primary data, secondary data, and how probability and non-probability samples are collected.
- Key aspects of statistical studies such as populations, samples, and how statistics can be used to make inferences about populations.
This document discusses measures of central tendency, including the mean, median, and mode. It provides definitions and formulas for calculating each measure for both grouped and ungrouped data. For the mean, it addresses how outliers can influence the value and introduces the trimmed mean. The median is described as the middle value of a data set and is not impacted by outliers. The mode is defined as the most frequent observation. Examples are given to demonstrate calculating each measure. Key differences between the measures are summarized.
This document discusses analyzing and summarizing data. It defines key terms like data, variables, and different types of data including quantitative, qualitative, discrete, and continuous data. It also discusses different types of data analysis including descriptive, exploratory, inferential, predictive, causal, and mechanistic. Finally, it explains measures of central tendency including the mean, median, and mode. It provides examples and formulas for calculating each as well as their advantages and disadvantages.
This document provides a lesson on measures of central tendency and dispersion. It defines mean, median, mode, range, quartiles, interquartile range, and outliers. Examples are provided to demonstrate how to calculate and interpret these measures for data sets. The document also explains how to construct box-and-whisker plots and choose the best measure of central tendency depending on the presence of outliers. Students are then quizzed on applying these concepts to analyze sample data sets.
The document defines and provides examples of various statistical measures used to summarize data, including measures of central tendency (mean, median, mode), measures of variation (variance, standard deviation, coefficient of variation), and shape of data distribution. It explains how to calculate and interpret these measures and when each is most appropriate to use. Examples are provided to demonstrate calculating various measures for different datasets.
This document discusses various measures of central tendency and variability used in statistics. It describes the three main measures of central tendency as the mode, median, and mean. For measures of variability, it defines concepts like range, variance, and standard deviation. The range is described as the highest score minus the lowest score and provides a simple measure of variation. Variance is defined as the mean of the squared deviations from the mean and standard deviation is the square root of the variance, providing a measure of how data points cluster around the mean. Examples are provided to demonstrate calculating each of these statistical measures.
This document summarizes an R boot camp focusing on statistics. It includes an agenda that covers introducing the lab component, R basics, descriptive statistics in R, revisiting installation instructions, and measures of variability in R. Descriptive statistics are presented as ways to characterize data through measures of central tendency, shape, and variability. Examples are provided in R for calculating the mean, median, mode, range, percentiles, variance, standard deviation, and coefficient of variation. The central limit theorem and standardizing scores are also discussed. Real-world applications of R for clean and messy data are mentioned.
This document provides an overview and objectives for Chapter 3 of the textbook "Statistical Techniques in Business and Economics" by Lind. The chapter covers describing data through numerical measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). It includes examples of computing various measures like the weighted mean, median, mode, and interpreting their relationships. The document also lists learning activities for students such as reading the chapter, watching video lectures, completing practice problems in the book, and participating in an online discussion forum.
This document discusses key concepts in statistics including:
- Descriptive statistics such as measures of central tendency (mean, median, mode), measures of dispersion (range, interquartile range, standard deviation, variance), and measures of shape.
- The difference between parameters and statistics, and how statistics are used to estimate population parameters.
- Types of data including primary data, secondary data, and how probability and non-probability samples are collected.
- Key aspects of statistical studies such as populations, samples, and how statistics can be used to make inferences about populations.
This document discusses measures of central tendency, including the mean, median, and mode. It provides definitions and formulas for calculating each measure for both grouped and ungrouped data. For the mean, it addresses how outliers can influence the value and introduces the trimmed mean. The median is described as the middle value of a data set and is not impacted by outliers. The mode is defined as the most frequent observation. Examples are given to demonstrate calculating each measure. Key differences between the measures are summarized.
This document discusses analyzing and summarizing data. It defines key terms like data, variables, and different types of data including quantitative, qualitative, discrete, and continuous data. It also discusses different types of data analysis including descriptive, exploratory, inferential, predictive, causal, and mechanistic. Finally, it explains measures of central tendency including the mean, median, and mode. It provides examples and formulas for calculating each as well as their advantages and disadvantages.
This document provides a lesson on measures of central tendency and dispersion. It defines mean, median, mode, range, quartiles, interquartile range, and outliers. Examples are provided to demonstrate how to calculate and interpret these measures for data sets. The document also explains how to construct box-and-whisker plots and choose the best measure of central tendency depending on the presence of outliers. Students are then quizzed on applying these concepts to analyze sample data sets.
The document defines and provides examples of various statistical measures used to summarize data, including measures of central tendency (mean, median, mode), measures of variation (variance, standard deviation, coefficient of variation), and shape of data distribution. It explains how to calculate and interpret these measures and when each is most appropriate to use. Examples are provided to demonstrate calculating various measures for different datasets.
This document discusses various measures of central tendency and variability used in statistics. It describes the three main measures of central tendency as the mode, median, and mean. For measures of variability, it defines concepts like range, variance, and standard deviation. The range is described as the highest score minus the lowest score and provides a simple measure of variation. Variance is defined as the mean of the squared deviations from the mean and standard deviation is the square root of the variance, providing a measure of how data points cluster around the mean. Examples are provided to demonstrate calculating each of these statistical measures.
This document summarizes an R boot camp focusing on statistics. It includes an agenda that covers introducing the lab component, R basics, descriptive statistics in R, revisiting installation instructions, and measures of variability in R. Descriptive statistics are presented as ways to characterize data through measures of central tendency, shape, and variability. Examples are provided in R for calculating the mean, median, mode, range, percentiles, variance, standard deviation, and coefficient of variation. The central limit theorem and standardizing scores are also discussed. Real-world applications of R for clean and messy data are mentioned.
This document provides an overview and objectives for Chapter 3 of the textbook "Statistical Techniques in Business and Economics" by Lind. The chapter covers describing data through numerical measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). It includes examples of computing various measures like the weighted mean, median, mode, and interpreting their relationships. The document also lists learning activities for students such as reading the chapter, watching video lectures, completing practice problems in the book, and participating in an online discussion forum.
This document provides an overview of key topics in statistics for management. It covers statistical surveys, classification and presentation of data, measures used to summarize data, probabilities, theoretical distributions, sampling and sampling distributions, estimation, hypothesis testing for large and small samples, and chi-square, F-distribution, analysis of variance, correlation, regression, business forecasting, and time series analysis. The document serves as an introduction to important statistical concepts and methods relevant for management.
3 descritive statistics measure of central tendency variatioLama K Banna
This document provides an overview of descriptive statistics and properties of numerical data, including measures of central tendency (mean, median, mode), variation (range, variance, standard deviation), and shape (skewness, kurtosis). It explains how to calculate the mean, median, and mode. The mean is the average and is calculated by summing all values and dividing by the total number. The median is the middle value when data is arranged in order. The mode is the most frequent value. Extreme values affect the mean more than the median.
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
Answer the questions in one paragraph 4-5 sentences.
· Why did the class collectively sign a blank check? Was this a wise decision; why or why not? we took a decision all the class without hesitation
· What is something that I said individuals should always do; what is it; why wasn't it done this time? Which mitigation strategies were used; what other strategies could have been used/considered? individuals should always participate in one group and take one decision
SAMPLING MEAN:
DEFINITION:
The term sampling mean is a statistical term used to describe the properties of statistical distributions. In statistical terms, the sample meanfrom a group of observations is an estimate of the population mean. Given a sample of size n, consider n independent random variables X1, X2... Xn, each corresponding to one randomly selected observation. Each of these variables has the distribution of the population, with mean and standard deviation. The sample mean is defined to be
WHAT IT IS USED FOR:
It is also used to measure central tendency of the numbers in a database. It can also be said that it is nothing more than a balance point between the number and the low numbers.
HOW TO CALCULATE IT:
To calculate this, just add up all the numbers, then divide by how many numbers there are.
Example: what is the mean of 2, 7, and 9?
Add the numbers: 2 + 7 + 9 = 18
Divide by how many numbers (i.e., we added 3 numbers): 18 ÷ 3 = 6
So the Mean is 6
SAMPLE VARIANCE:
DEFINITION:
The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number of items taken from a population. For example, if you are measuring American people’s weights, it wouldn’t be feasible (from either a time or a monetary standpoint) for you to measure the weights of every person in the population. The solution is to take a sample of the population, say 1000 people, and use that sample size to estimate the actual weights of the whole population.
WHAT IT IS USED FOR:
The sample variance helps you to figure out the spread out in the data you have collected or are going to analyze. In statistical terminology, it can be defined as the average of the squared differences from the mean.
HOW TO CALCULATE IT:
Given below are steps of how a sample variance is calculated:
· Determine the mean
· Then for each number: subtract the Mean and square the result
· Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by the number of data points.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use the Roman letter Sigma: Σ
The handy Sigma Notation says to sum up as many terms as we want.
· Next we need to divide by the number of data points, which is simply done by multiplying by "1/N":
Statistically it can be stated by the following:
·
· This value is the variance
EXAMPLE:
Sam has 20 Rose Bushes.
The number of flowers on each b.
The document provides instructions for learning about measures of central tendency. It discusses finding the mean, median, and mode of ungrouped data. The mean is calculated by adding all values and dividing by the number of values. The median is the middle value when data is arranged in order. The mode is the most frequent value. Examples are provided to demonstrate calculating the mean, median, and mode of various data sets.
This document provides an outline and overview of Chapter 3: Descriptive Statistics from a statistics textbook. It discusses key concepts in descriptive statistics including measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), measures of shape (skewness, kurtosis), and correlation. The chapter will cover calculating these statistics for both ungrouped and grouped data, and interpreting them to describe data distributions. It emphasizes that descriptive statistics are used to numerically summarize and characterize data sets.
Here are the steps to find the quartiles for this data set:
1. Order the data from lowest to highest: 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7
2. The number of observations is 16. To find the quartiles, we split the data into 4 equal parts.
3. n/4 = 16/4 = 4
4. Q1 is the median of the lower half of the data, which is the 4th observation: 2
5. Q2 is the median of all the data, which is also the 8th observation: 3
6. Q3 is the median of the upper half
1. Statistics is used to analyze data beyond what can be seen in maps and diagrams by using mathematical manipulation, which can reveal patterns that may otherwise go unnoticed.
2. It is important to justify any statistical techniques used and to only use techniques that are appropriate for the type of data.
3. Common methods for summarizing large data sets include calculating the mean, mode, and median. The mean is the average, the mode is the most frequent value, and the median is the middle value when the data is arranged from lowest to highest.
1. Statistics is used to analyze data beyond what can be seen in maps and diagrams by using mathematical manipulation, which can reveal patterns that may otherwise go unnoticed.
2. It is important to justify any statistical techniques used and to ensure the data is appropriate for the technique. Students should ask what the technique can prove and if the data is in the right format before performing calculations.
3. Common methods for summarizing a large data set are the mean, median, and mode. The mean is the average, the median is the middle value, and the mode is the most frequent value. These give a single value for the data but do not show the variation around that value.
This document provides an introduction to statistics. It discusses what statistics is, the two main branches of statistics (descriptive and inferential), and the different types of data. It then describes several key measures used in statistics, including measures of central tendency (mean, median, mode) and measures of dispersion (range, mean deviation, standard deviation). The mean is the average value, the median is the middle value, and the mode is the most frequent value. The range is the difference between highest and lowest values, the mean deviation is the average distance from the mean, and the standard deviation measures how spread out values are from the mean. Examples are provided to demonstrate how to calculate each measure.
This document provides an overview of descriptive statistics concepts and methods. It discusses numerical summaries of data like measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range). It explains how to calculate and interpret these measures. Examples are provided to demonstrate calculating measures for sample data and interpreting what they say about the data distribution. Frequency distributions and histograms are also introduced as ways to visually summarize and understand the characteristics of data.
Outlier Management, BASIC STATISTICS, Error, Accuracy, How to find Outliers, quartile, Data Management, Reporting and Evaluation, Communication & Corrective Action, Documentation,
This document provides an overview of basic statistics concepts including descriptive statistics, measures of central tendency, variability, sampling, and distributions. It defines key terms like mean, median, mode, range, standard deviation, variance, and quantiles. Examples are provided to demonstrate how to calculate and interpret these common statistical measures.
This document discusses analytical representation of data through descriptive statistics. It begins by showing raw, unorganized data on movie genre ratings. It then demonstrates organizing this data into a frequency distribution table and bar graph to better analyze and describe the data. It also calculates averages for each movie genre. The document then discusses additional descriptive statistics measures like the mean, median, mode, and percentiles to further analyze data through measures of central tendency and dispersion.
This document provides an overview of descriptive statistics and numerical summary measures. It discusses measures of central tendency including the mean, median, and mode. It also covers measures of relative standing such as percentiles and quartiles. Additionally, the document outlines measures of dispersion like variance, standard deviation, coefficient of variation, range, and interquartile range. Graphs and charts are presented as ways to describe data using these numerical summary measures.
This document provides a summary of key concepts from Chapter 3 of a statistics textbook, including:
- How to calculate measures of central tendency like the mean, median, mode, and weighted mean
- The characteristics and properties of each measure
- How the positions of the mean, median and mode relate to the shape of the distribution
- How to calculate the mean, median and mode for grouped data
- What the geometric mean represents and how it is calculated
Statistics is the study of collecting, analyzing, and presenting quantitative data. It involves planning data collection through surveys and experiments, as well as analyzing the data using measures of central tendency like the mean, median, and mode. The mean is the average value found by summing all values and dividing by the total number of values. The median is the middle value when data is arranged in order. The mode is the most frequent value. Statistics has limitations as it does not study qualitative data or individuals, and statistical laws may not be universally applicable. Frequency distributions organize data values and their frequencies to understand patterns in the data.
This document provides an introduction to statistics and probability. It discusses key concepts such as data, levels of measurement, population and sampling, measures of central tendency and dispersion, and outliers. Some key points covered include:
- Statistics helps draw inferences about populations based on random samples.
- Data can be continuous or categorical and is important for understanding relationships and making predictions.
- There are different levels of measurement for data: nominal, ordinal, interval, and ratio.
- A population is the whole group, while a sample is a subset used to make inferences.
- Common measures of central tendency are the mean, median, and mode, while measures of dispersion include range, variance, and standard deviation.
- Quart
Descriptions of data statistics for researchHarve Abella
This document defines and describes various measures of central tendency and variation that are used to summarize and describe sets of data. It discusses the mean, median, mode, midrange, percentiles, quartiles, range, variance, standard deviation, interquartile range, coefficient of variation, measures of skewness and kurtosis. Examples are provided to demonstrate how to compute and interpret these statistical measures.
This document provides an overview of key concepts in descriptive statistics including measures of central tendency (mode, median, mean), measures of dispersion (range, variance, standard deviation), the normal distribution, z-scores, hypothesis testing, and the t-distribution. It defines each concept and provides examples of calculating and interpreting common statistics.
Basic Statistical Descriptions of Data.pptxAnusuya123
This document provides an overview of 7 basic statistical concepts for data science: 1) descriptive statistics such as mean, mode, median, and standard deviation, 2) measures of variability like variance and range, 3) correlation, 4) probability distributions, 5) regression, 6) normal distribution, and 7) types of bias. Descriptive statistics are used to summarize data, variability measures dispersion, correlation measures relationships between variables, and probability distributions specify likelihoods of events. Regression models relationships, normal distribution is often assumed, and biases can influence analyses.
This document provides an overview of key topics in statistics for management. It covers statistical surveys, classification and presentation of data, measures used to summarize data, probabilities, theoretical distributions, sampling and sampling distributions, estimation, hypothesis testing for large and small samples, and chi-square, F-distribution, analysis of variance, correlation, regression, business forecasting, and time series analysis. The document serves as an introduction to important statistical concepts and methods relevant for management.
3 descritive statistics measure of central tendency variatioLama K Banna
This document provides an overview of descriptive statistics and properties of numerical data, including measures of central tendency (mean, median, mode), variation (range, variance, standard deviation), and shape (skewness, kurtosis). It explains how to calculate the mean, median, and mode. The mean is the average and is calculated by summing all values and dividing by the total number. The median is the middle value when data is arranged in order. The mode is the most frequent value. Extreme values affect the mean more than the median.
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
Answer the questions in one paragraph 4-5 sentences.
· Why did the class collectively sign a blank check? Was this a wise decision; why or why not? we took a decision all the class without hesitation
· What is something that I said individuals should always do; what is it; why wasn't it done this time? Which mitigation strategies were used; what other strategies could have been used/considered? individuals should always participate in one group and take one decision
SAMPLING MEAN:
DEFINITION:
The term sampling mean is a statistical term used to describe the properties of statistical distributions. In statistical terms, the sample meanfrom a group of observations is an estimate of the population mean. Given a sample of size n, consider n independent random variables X1, X2... Xn, each corresponding to one randomly selected observation. Each of these variables has the distribution of the population, with mean and standard deviation. The sample mean is defined to be
WHAT IT IS USED FOR:
It is also used to measure central tendency of the numbers in a database. It can also be said that it is nothing more than a balance point between the number and the low numbers.
HOW TO CALCULATE IT:
To calculate this, just add up all the numbers, then divide by how many numbers there are.
Example: what is the mean of 2, 7, and 9?
Add the numbers: 2 + 7 + 9 = 18
Divide by how many numbers (i.e., we added 3 numbers): 18 ÷ 3 = 6
So the Mean is 6
SAMPLE VARIANCE:
DEFINITION:
The sample variance, s2, is used to calculate how varied a sample is. A sample is a select number of items taken from a population. For example, if you are measuring American people’s weights, it wouldn’t be feasible (from either a time or a monetary standpoint) for you to measure the weights of every person in the population. The solution is to take a sample of the population, say 1000 people, and use that sample size to estimate the actual weights of the whole population.
WHAT IT IS USED FOR:
The sample variance helps you to figure out the spread out in the data you have collected or are going to analyze. In statistical terminology, it can be defined as the average of the squared differences from the mean.
HOW TO CALCULATE IT:
Given below are steps of how a sample variance is calculated:
· Determine the mean
· Then for each number: subtract the Mean and square the result
· Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by the number of data points.
First add up all the values from the previous step.
But how do we say "add them all up" in mathematics? We use the Roman letter Sigma: Σ
The handy Sigma Notation says to sum up as many terms as we want.
· Next we need to divide by the number of data points, which is simply done by multiplying by "1/N":
Statistically it can be stated by the following:
·
· This value is the variance
EXAMPLE:
Sam has 20 Rose Bushes.
The number of flowers on each b.
The document provides instructions for learning about measures of central tendency. It discusses finding the mean, median, and mode of ungrouped data. The mean is calculated by adding all values and dividing by the number of values. The median is the middle value when data is arranged in order. The mode is the most frequent value. Examples are provided to demonstrate calculating the mean, median, and mode of various data sets.
This document provides an outline and overview of Chapter 3: Descriptive Statistics from a statistics textbook. It discusses key concepts in descriptive statistics including measures of central tendency (mean, median, mode), measures of variability (range, standard deviation), measures of shape (skewness, kurtosis), and correlation. The chapter will cover calculating these statistics for both ungrouped and grouped data, and interpreting them to describe data distributions. It emphasizes that descriptive statistics are used to numerically summarize and characterize data sets.
Here are the steps to find the quartiles for this data set:
1. Order the data from lowest to highest: 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7
2. The number of observations is 16. To find the quartiles, we split the data into 4 equal parts.
3. n/4 = 16/4 = 4
4. Q1 is the median of the lower half of the data, which is the 4th observation: 2
5. Q2 is the median of all the data, which is also the 8th observation: 3
6. Q3 is the median of the upper half
1. Statistics is used to analyze data beyond what can be seen in maps and diagrams by using mathematical manipulation, which can reveal patterns that may otherwise go unnoticed.
2. It is important to justify any statistical techniques used and to only use techniques that are appropriate for the type of data.
3. Common methods for summarizing large data sets include calculating the mean, mode, and median. The mean is the average, the mode is the most frequent value, and the median is the middle value when the data is arranged from lowest to highest.
1. Statistics is used to analyze data beyond what can be seen in maps and diagrams by using mathematical manipulation, which can reveal patterns that may otherwise go unnoticed.
2. It is important to justify any statistical techniques used and to ensure the data is appropriate for the technique. Students should ask what the technique can prove and if the data is in the right format before performing calculations.
3. Common methods for summarizing a large data set are the mean, median, and mode. The mean is the average, the median is the middle value, and the mode is the most frequent value. These give a single value for the data but do not show the variation around that value.
This document provides an introduction to statistics. It discusses what statistics is, the two main branches of statistics (descriptive and inferential), and the different types of data. It then describes several key measures used in statistics, including measures of central tendency (mean, median, mode) and measures of dispersion (range, mean deviation, standard deviation). The mean is the average value, the median is the middle value, and the mode is the most frequent value. The range is the difference between highest and lowest values, the mean deviation is the average distance from the mean, and the standard deviation measures how spread out values are from the mean. Examples are provided to demonstrate how to calculate each measure.
This document provides an overview of descriptive statistics concepts and methods. It discusses numerical summaries of data like measures of central tendency (mean, median, mode) and variability (standard deviation, variance, range). It explains how to calculate and interpret these measures. Examples are provided to demonstrate calculating measures for sample data and interpreting what they say about the data distribution. Frequency distributions and histograms are also introduced as ways to visually summarize and understand the characteristics of data.
Outlier Management, BASIC STATISTICS, Error, Accuracy, How to find Outliers, quartile, Data Management, Reporting and Evaluation, Communication & Corrective Action, Documentation,
This document provides an overview of basic statistics concepts including descriptive statistics, measures of central tendency, variability, sampling, and distributions. It defines key terms like mean, median, mode, range, standard deviation, variance, and quantiles. Examples are provided to demonstrate how to calculate and interpret these common statistical measures.
This document discusses analytical representation of data through descriptive statistics. It begins by showing raw, unorganized data on movie genre ratings. It then demonstrates organizing this data into a frequency distribution table and bar graph to better analyze and describe the data. It also calculates averages for each movie genre. The document then discusses additional descriptive statistics measures like the mean, median, mode, and percentiles to further analyze data through measures of central tendency and dispersion.
This document provides an overview of descriptive statistics and numerical summary measures. It discusses measures of central tendency including the mean, median, and mode. It also covers measures of relative standing such as percentiles and quartiles. Additionally, the document outlines measures of dispersion like variance, standard deviation, coefficient of variation, range, and interquartile range. Graphs and charts are presented as ways to describe data using these numerical summary measures.
This document provides a summary of key concepts from Chapter 3 of a statistics textbook, including:
- How to calculate measures of central tendency like the mean, median, mode, and weighted mean
- The characteristics and properties of each measure
- How the positions of the mean, median and mode relate to the shape of the distribution
- How to calculate the mean, median and mode for grouped data
- What the geometric mean represents and how it is calculated
Statistics is the study of collecting, analyzing, and presenting quantitative data. It involves planning data collection through surveys and experiments, as well as analyzing the data using measures of central tendency like the mean, median, and mode. The mean is the average value found by summing all values and dividing by the total number of values. The median is the middle value when data is arranged in order. The mode is the most frequent value. Statistics has limitations as it does not study qualitative data or individuals, and statistical laws may not be universally applicable. Frequency distributions organize data values and their frequencies to understand patterns in the data.
This document provides an introduction to statistics and probability. It discusses key concepts such as data, levels of measurement, population and sampling, measures of central tendency and dispersion, and outliers. Some key points covered include:
- Statistics helps draw inferences about populations based on random samples.
- Data can be continuous or categorical and is important for understanding relationships and making predictions.
- There are different levels of measurement for data: nominal, ordinal, interval, and ratio.
- A population is the whole group, while a sample is a subset used to make inferences.
- Common measures of central tendency are the mean, median, and mode, while measures of dispersion include range, variance, and standard deviation.
- Quart
Descriptions of data statistics for researchHarve Abella
This document defines and describes various measures of central tendency and variation that are used to summarize and describe sets of data. It discusses the mean, median, mode, midrange, percentiles, quartiles, range, variance, standard deviation, interquartile range, coefficient of variation, measures of skewness and kurtosis. Examples are provided to demonstrate how to compute and interpret these statistical measures.
This document provides an overview of key concepts in descriptive statistics including measures of central tendency (mode, median, mean), measures of dispersion (range, variance, standard deviation), the normal distribution, z-scores, hypothesis testing, and the t-distribution. It defines each concept and provides examples of calculating and interpreting common statistics.
Basic Statistical Descriptions of Data.pptxAnusuya123
This document provides an overview of 7 basic statistical concepts for data science: 1) descriptive statistics such as mean, mode, median, and standard deviation, 2) measures of variability like variance and range, 3) correlation, 4) probability distributions, 5) regression, 6) normal distribution, and 7) types of bias. Descriptive statistics are used to summarize data, variability measures dispersion, correlation measures relationships between variables, and probability distributions specify likelihoods of events. Regression models relationships, normal distribution is often assumed, and biases can influence analyses.
Lecture 4 Probability Distributions.pptxABCraftsman
This document discusses probability distributions and expected value. It defines discrete and continuous random variables and their corresponding probability distributions. Discrete distributions assign probabilities to distinct outcomes while continuous distributions use probability densities over intervals. Expected value is the average value of a distribution, calculated by taking a weighted average of all possible outcomes. Several examples are provided to illustrate expected value calculations for games of chance.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
1. Part I
Each slide has its own narration in an audio file.
For the explanation of any slide click on the audio icon to start it.
Professor Friedman's Statistics Course by H & L Friedman is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
2. In this lecture we discuss using descriptive
statistics, as opposed to inferential statistics.
Here we are interested only in summarizing
the data in front of us, without assuming that
it represents anything more.
We will look at both numerical and categorical
data.
We will look at both quantitative and
graphical techniques.
The basic overall idea is to turn data into
information.
Descriptive Statistics I 2
3. Numerical data may be summarized according to several
characteristics. We will work with each in turn and then all
of them together.
◦ Measures of Location
Measures of central tendency: Mean; Median; Mode
Measures of noncentral tendency - Quantiles
Quartiles; Quintiles; Percentiles
◦ Measures of Dispersion
Range
Interquartile range
Variance
Standard Deviation
Coefficient of Variation
◦ Measures of Shape
◦ Skewness
5-number summary; Box-and-whisker; Stem-and-leaf
Standardizing Data
Descriptive Statistics I 3
4. Measures of location place the data set on the scale
of real numbers.
Measures of central tendency (i.e., central location)
help find the approximate center of the dataset.
These include the mean, the median, and the
mode.
Descriptive Statistics I 4
5. The sample mean is the sum of all the observations
(∑Xi) divided by the number of observations (n):
𝑋 = 𝑖=1
𝑛
𝑋𝑖
𝑛
where ΣXi = X1 + X2 + X3 + X4 + … + Xn
Example. 1, 2, 2, 4, 5, 10. Calculate the mean.
Note: n = 6 (six observations)
∑Xi = 1 + 2+ 2+ 4 + 5 + 10 = 24
𝑋= 24 / 6 = 4.0
Descriptive Statistics I 5
6. Example.
For the data: 1, 1, 1, 1, 51. Calculate the mean.
Note: n = 5 (five observations)
∑Xi = 1 + 1+ 1+ 1+ 51 = 55
𝑋= 55 / 5 = 11.0
Here we see that the mean is affected by extreme
values.
Descriptive Statistics I 6
7. The median is the middle value of the ordered data
To get the median, we must first rearrange the
data into an ordered array (in ascending or
descending order). Generally, we order the data
from the lowest value to the highest value.
Therefore, the median is the data value such that
half of the observations are larger and half are
smaller. It is also the 50th percentile (we will be
learning about percentiles in a bit).
If n is odd, the median is the middle observation of
the ordered array. If n is even, it is midway between
the two central observations.
Descriptive Statistics I 7
8. Example:
Note: Data has been ordered from lowest to highest. Since n is
odd (n=7), the median is the (n+1)/2 ordered observation, or the
4th observation.
Answer: The median is 5.
The mean and the median are unique for a given set of
data. There will be exactly one mean and one median.
Unlike the mean, the median is not affected by extreme
values.
Q: What happens to the median if we change the 100 to 5,000?
Not a thing, the median will still be 5. Five is still the middle
value of the data set.
Descriptive Statistics I 8
0 2 3 5 20 99 100
9. Example:
Note: Data has been ordered from lowest to highest. Since n
is even (n=6), the median is the (n+1)/2 ordered observation,
or the 3.5th observation, i.e., the average of observation 3 and
observation 4.
Answer: The median is 35.
Descriptive Statistics I 9
10 20 30 40 50 60
10. The median has 3 interesting characteristics:
◦ 1. The median is not affected by extreme values,
only by the number of observations.
◦ 2. Any observation selected at random is just as
likely to be greater than the median as less than
the median.
◦ 3. Summation of the absolute value of the
differences about the median is a minimum:
𝑖=0
𝑛
𝑋𝑖 − 𝑀𝑒𝑑𝑖𝑎𝑛 = minimum
Descriptive Statistics I 10
11. The mode is the value of the data that
occurs with the greatest frequency.
Example. 1, 1, 1, 2, 3, 4, 5
Answer. The mode is 1 since it occurs three times. The
other values each appear only once in the data set.
Example. 5, 5, 5, 6, 8, 10, 10, 10.
Answer. The mode is: 5, 10.
There are two modes. This is a bi-modal dataset.
Descriptive Statistics I 11
12. The mode is different from the mean and the
median in that those measures always exist and are
always unique. For any numeric data set there will
be one mean and one median.
The mode may not exist.
◦ Data: 1, 2, 3, 4, 5, 6, 7, 8, 9, 0
◦ Here you have 10 observations and they are all
different.
The mode may not be unique.
◦ Data: 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7
◦ Mode = 1, 2, 3, 4, 5, and 6. There are six modes.
Descriptive Statistics I 12
13. Measures of non-central location used to
summarize a set of data
Examples of commonly used quantiles:
◦ Quartiles
◦ Quintiles
◦ Deciles
◦ Percentiles
Descriptive Statistics I 13
14. Quartiles split a set of ordered data into four parts.
◦ Imagine cutting a chocolate bar into four equal pieces… How
many cuts would you make? (yes, 3!)
Q1 is the First Quartile
◦ 25% of the observations are smaller than Q1 and 75% of the
observations are larger
Q2 is the Second Quartile
◦ 50% of the observations are smaller than Q2 and 50% of the
observations are larger. Same as the Median. It is also the 50th
percentile.
Q3 is the Third Quartile
◦ 75% of the observations are smaller than Q3and 25% of the
observations are larger
Some books use a formula to determine the quartiles. We prefer a
quick-and-dirty approximation method outlined in the next slide.
Descriptive Statistics I 14
15. A quartile, like the median, either takes the value of one of the observations,
or the value halfway between two observations.
The simple method we like to use is just to first split the data set into two
equal parts to get the median (Q2) and then get the median of each resulting
subset.
The method we are using is an approximation. If you solve this in MS Excel, which
relies on a formula, you may get an answer that is slightly different.
Descriptive Statistics I 15
16. Computer Sales (n = 12 salespeople)
Original Data: 3, 10, 2, 5, 9, 8, 7, 12, 10, 0, 4, 6
Compute the mean, median, mode, quartiles.
First order the data:
0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 12
∑Xi = 76
𝑋= 76 / 12 = 6.33 computers sold
Median = 6.5 computers
Mode = 10 computers
Q1 = 3.5 computers, Q3 = 9.5 computers
Descriptive Statistics I 16
17. Similar to what we just learned about quartiles,
where 3 quartiles split the data into 4 equal parts,
◦ There are 9 deciles dividing the distribution into 10
equal portions (tenths).
◦ There are four quintiles dividing the population into 5
equal portions.
◦ … and 99 percentiles (next slide)
In all these cases, the convention is the same. The
point, be it a quartile, decile, or percentile, takes
the value of one of the observations or it has a
value halfway between two adjacent observations.
It is never necessary to split the difference between
two observations more finely.
Descriptive Statistics I 17
18. We use 99 percentiles to divide a data set into
100 equal portions.
Percentiles are used in analyzing the results of
standardized exams. For instance, a score of
40 on a standardized test might seem like a
terrible grade, but if it is the 99th percentile,
don’t worry about telling your parents. ☺
Which percentile is Q1? Q2 (the median)? Q3?
We will always use computer software to obtain
the percentiles.
Descriptive Statistics I 18
20. Data – number of absences (n=13) :
0, 5, 3, 2, 1, 2, 4, 3, 1, 0, 0, 6, 12
Compute the mean, median, mode, quartiles.
Answer. First order the data:
0, 0, 0,┋ 1, 1, 2, 2, 3, 3, 4,┋ 5, 6, 12
Mean = 39/13 = 3.0 absences
Median = 2 absences
Mode = 0 absences
Q1 = .5 absences
Q3 = 4.5 absences
Descriptive Statistics I 20
21. Data: Reading Levels of 16 eighth graders.
5, 6, 6, 6, 5, 8, 7, 7, 7, 8, 10, 9, 9, 9, 9, 9
Answer. First, order the data:
5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10
Sum=120.
Mean= 120/16 = 7.5 This is the average reading
level of the 16 students.
Median = Q2 = 7.5
Q1 = 6, Q3 = 9
Mode = 9
Descriptive Statistics I 21
22. 5 5 6 6 ┋ 6 7 7 7 ┋ 8 8 9 9 ┋ 9 9 9 10
Alternate method: The data can also be set up as a
frequency distribution.
Measures can be
computed using methods
for grouped data.
Note that the sum of the frequencies, ∑fi = n = 16
Descriptive Statistics I 22
Reading Level - Xi Frequency - fi
5 2
6 3
7 3
8 2
9 5
10 1
16
23. 𝑋= ∑Xifi/n = 120 / 16 = 7.5
We see that the column total- ∑Xifi -is the
sum of the ungrouped data.
Descriptive Statistics I 23
Xi fi (Xi)(fi)
5 2 10
6 3 18
7 3 21
8 2 16
9 5 45
10 1 10
16 120
24. Dispersion is the amount of spread, or
variability, in a set of data.
Why do we need to look at measures of
dispersion?
Consider this example:
A company is about to buy computer chips that must
have an average life of 10 years. The company has a
choice of two suppliers. Whose chips should they buy?
They take a sample of 10 chips from each of the
suppliers and test them. See the data on the next slide.
Descriptive Statistics I 24
25. We see that supplier B’s chips have a longer average life.
However, what if the company offers
a 3-year warranty?
Then, computers manufactured
using the chips from supplier A
will have no returns
while using supplier B will result in
4/10 or 40% returns.
Descriptive Statistics I 25
Supplier A chips
(life in years)
Supplier B chips
(life in years)
11 170
11 1
10 1
10 160
11 2
11 150
11 150
11 170
10 2
12 140
𝑋A = 10.8 years 𝑋𝐵 = 94.6 years
MedianA = 11 years MedianB = 145 years
sA = 0.63 years sB = 80.6 years
RangeA = 2 years RangeB = 169 years
26. We will study these five measures of
dispersion
◦ Range
◦ Interquartile Range
◦ Standard Deviation
◦ Variance
◦ Coefficient of Variation
Descriptive Statistics I 26
27. Range = Largest Value – Smallest Value
Example: 1, 2, 3, 4, 5, 8, 9, 21, 25, 30
Answer: Range = 30 – 1 = 29.
The range is simple to use and to explain to
others.
One problem with the range is that it is
influenced by extreme values at either end.
Descriptive Statistics I 27
28. IQR = Q3 – Q1
Example (n = 15):
0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98
Q1 = 3, Q3 = 22
IQR = 22 – 3 = 19 (Range = 98)
This is basically the range of the central 50% of
the observations in the distribution.
Problem: The interquartile range does not take
into account the variability of the total data (only
the central 50%). We are “throwing out” half of
the data.
Descriptive Statistics I 28
29. The standard deviation, s, measures a kind of
“average” deviation about the mean. It is not really
the “average” deviation, even though we may think
of it that way.
Why can’t we simply compute the average deviation
about the mean, if that’s what we want?
𝑖=1
𝑛
(𝑋𝑖 − 𝑋)
𝑛
If you take a simple mean, and then add up the
deviations about the mean, as above, this sum will
be equal to 0. Therefore, a measure of “average
deviation” will not work.
Descriptive Statistics I 29
30. Instead, we use:
𝑠 = 𝑖=1
𝑛
(𝑋𝑖− 𝑋)2
𝑛−1
This is the “definitional formula” for standard deviation.
The standard deviation has lots of nice properties,
including:
◦ By squaring the deviation, we eliminate the problem of the
deviations summing to zero.
◦ In addition, this sum is a minimum. No other value subtracted
from X and squared will result in a smaller sum of the deviation
squared. This is called the “least squares property.”
Note we divide by (n-1), not n. This will be referred to
as a loss of one degree of freedom.
Descriptive Statistics I 30
31. Example. Two data sets, X and Y. Which of
the two data sets has greater variability?
Calculate the standard deviation for each.
We note that both sets of data have the
same mean:
𝑋 = 3
𝑌 = 3
(continued…)
Descriptive Statistics I 31
Xi Yi
1 0
2 0
3 0
4 5
5 10
33. Note that σ = 𝑖=1
𝑁 𝑋𝑖−𝜇
2
𝑁
and 𝑠 = 𝑖=1
𝑛
(𝑋𝑖− 𝑋)2
𝑛−1
You divide by N only when you have taken a census and therefore
know the population mean. This is rarely the case.
Normally, we work with a sample and calculate sample measures,
like the sample mean and the sample standard deviation:
The reason we divide by n-1 instead of n is to assure that s is an
unbiased estimator of σ.
◦ We have taken a shortcut: in the second formula we are using the sample
mean, 𝑋, a statistic, in lieu of μ, a population parameter. Without a
correction, this formula would have a tendency to understate the true
standard deviation. We divide by n-1, which increases s. This makes it an
unbiased estimator of σ.
◦ We will refer to this as “losing one degree of freedom” (to be explained
more fully later on in the course).
Descriptive Statistics I 33
34. The variance, s2, is the standard deviation (s)
squared. Conversely, 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒.
Definitional formula: 𝑠2 = 𝑖=1
𝑛
𝑋𝑖−𝑋
2
𝑛−1
Computational formula: 𝑠2
=
𝑖=1
𝑛
𝑋𝑖
2− 𝑖=1
𝑛 𝑋𝑖
2
𝑛
𝑛−1
This is what computer software
(e.g., MS Excel or your calculator key) uses.
Descriptive Statistics I 34
35. The problem with s2 and s is that they are both, like the
mean, in the “original” units.
This makes it difficult to compare the variability of two data
sets that are in different units or where the magnitude of the
numbers is very different in the two sets. For example,
◦ Suppose you wish to compare two stocks and one is in dollars and the other is in yen;
if you want to know which one is more volatile, you should use the coefficient of
variation.
◦ It is also not appropriate to compare two stocks of vastly different prices even if both
are in the same units.
◦ The standard deviation for a stock that sells for around $300 is going to be very
different from one with a price of around $0.25.
The coefficient of variation will be a better measure of
dispersion in these cases than the standard deviation (see
example on the next slide).
Descriptive Statistics I 35
36. 𝐶𝑉 = 𝑠
𝑋
(100%)
CV is in terms of a percent. What we are in
effect calculating is what percent of the
sample mean is the standard deviation. If
CV is 100%, this indicates that the sample
mean is equal to the sample standard
deviation. This would demonstrate that
there is a great deal of variability in the
data set. 200% would obviously be even
worse.
Descriptive Statistics I 36
37. Descriptive Statistics I 37
Which stock is more volatile?
Closing prices over the last 8 months:
CVA = x 100% = 95.3%
CVB = x 100% = 6.0%
Answer: The standard deviation of B is higher than for A,
but A is more volatile:
Stock A Stock B
JAN $1.00 $180
FEB 1.50 175
MAR 1.90 182
APR .60 186
MAY 3.00 188
JUN .40 190
JUL 5.00 200
AUG .20 210
Mean $1.70 $188.88
s2 2.61 128.41
s $1.62 $11.33
70
.
1
$
62
.
1
$
88
.
188
$
33
.
11
$
38. Data (n=10): 0, 0, 40, 50, 50, 60, 70, 90, 100, 100
Compute the mean, median, mode, quartiles (Q1, Q2, Q3), range,
interquartile range, variance, standard deviation, and coefficient of
variation. We shall refer to all these as the descriptive (or
summary) statistics for a set of data.
Answer. First order the data:
0, 0, 40, 50, 50 ┋ 60, 70, 90, 100, 100
◦ Mean: ∑Xi = 560 and n = 10, so 𝑋 = 560/10 = 56.
Median = Q2 = 55
◦ Q1 = 40 ; Q3 = 90 (Note: Excel gives these as Q1 = 42.5, Q3 = 85.)
◦ Mode = 0, 50, 100
Range = 100 – 0 = 100
◦ IQR = 90 – 40 = 50
◦ s2 = 11,840/9 = 1315.5
◦ s = √1315.5 = 36.27
◦ CV = (36.27/56) x 100% = 64.8%
Descriptive Statistics I 38