Abstract: This PDSG workshop introduces basic concepts of statistics. Concepts covered are mean (average), median, mode, standard deviation discrete vs. continuous, normal distribution, sampling distribution, Z-scores and boxplots.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
Introduction to Inference, Estimating with Confidence, Inference, Statistical Confidence, Confidence Intervals, Confidence Interval for a Population Mean, Choosing the Sample Size
Chapter 5 part1- The Sampling Distribution of a Sample Meannszakir
Mathematics, Statistics, Population Distribution vs. Sampling Distribution, The Mean and Standard Deviation of the Sample Mean, Sampling Distribution of a Sample Mean, Central Limit Theorem
Measure of dispersion has two types Absolute measure and Graphical measure. There are other different types in there.
In this slide the discussed points are:
1. Dispersion & it's types
2. Definition
3. Use
4. Merits
5. Demerits
6. Formula & math
7. Graph and pictures
8. Real life application.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
Introduction to Inference, Estimating with Confidence, Inference, Statistical Confidence, Confidence Intervals, Confidence Interval for a Population Mean, Choosing the Sample Size
Chapter 5 part1- The Sampling Distribution of a Sample Meannszakir
Mathematics, Statistics, Population Distribution vs. Sampling Distribution, The Mean and Standard Deviation of the Sample Mean, Sampling Distribution of a Sample Mean, Central Limit Theorem
Measure of dispersion has two types Absolute measure and Graphical measure. There are other different types in there.
In this slide the discussed points are:
1. Dispersion & it's types
2. Definition
3. Use
4. Merits
5. Demerits
6. Formula & math
7. Graph and pictures
8. Real life application.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
Basic statistics for algorithmic tradingQuantInsti
In this presentation we try to understand the core basics of statistics and its application in algorithmic trading.
We start by defining what statistics is. Collecting data is the root of statistics. We need data to analyse and take quantitative decisions.
While analyzing, there are certain parameters for statistics, this branches statistics into two - descriptive statistics & inferential statistics.
This data that we have collected can be classified into uni-variate and bi-variate. It also tries to explain the fundamental difference.
Going Further we also cover topics like regression line, Coefficient of Determination, Homoscedasticity and Heteroscedasticity.
In this way the presentation look at various aspects of statistics which are used for algorithmic trading.
To learn the advanced applications of statistics for HFT & Quantitative Trading connect with us one our website: www.quantinsti.com.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
The Normal Distribution is a symmetrical probability distribution where most results are located in the middle and few are spread on both sides. It has the shape of a bell and can entirely be described by its mean and standard deviation.
Simulation plays important role in many problems of our daily life. There has been increasing interest in the use of simulation to teach the concept of sampling distribution. In this paper we try to show the sampling distribution of some important statistic we often found in statistical methods by taking 10,000 simulations. The simulation is presented using R-programming language to help students to understand the concept of sampling distribution. This paper helps students to understand the concept of central limit theorem, law of large number and simulation of distribution of some important statistic we often encounter in statistical methods. This paper is about one sample and two sample inference. The paper shows the convergence of t-distribution to standard normal distribution. The sum of the square of deviations of items from population mean and sample mean follow chi-square distribution with different degrees of freedom. The ratio of two sample variance follow F-distribution. It is interesting that in linear regression the sampling distribution of the estimated parameters are normally distributed.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 6: Normal Probability Distribution
6.3: Sampling Distributions and Estimators
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
This article provides a brief discussion on several statistical parameters that are most commonly used in any measurement and analysis process. There are a plethora of such parameters but the most important and widely used are briefed in here.
Basic statistics for algorithmic tradingQuantInsti
In this presentation we try to understand the core basics of statistics and its application in algorithmic trading.
We start by defining what statistics is. Collecting data is the root of statistics. We need data to analyse and take quantitative decisions.
While analyzing, there are certain parameters for statistics, this branches statistics into two - descriptive statistics & inferential statistics.
This data that we have collected can be classified into uni-variate and bi-variate. It also tries to explain the fundamental difference.
Going Further we also cover topics like regression line, Coefficient of Determination, Homoscedasticity and Heteroscedasticity.
In this way the presentation look at various aspects of statistics which are used for algorithmic trading.
To learn the advanced applications of statistics for HFT & Quantitative Trading connect with us one our website: www.quantinsti.com.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
The Normal Distribution is a symmetrical probability distribution where most results are located in the middle and few are spread on both sides. It has the shape of a bell and can entirely be described by its mean and standard deviation.
Simulation plays important role in many problems of our daily life. There has been increasing interest in the use of simulation to teach the concept of sampling distribution. In this paper we try to show the sampling distribution of some important statistic we often found in statistical methods by taking 10,000 simulations. The simulation is presented using R-programming language to help students to understand the concept of sampling distribution. This paper helps students to understand the concept of central limit theorem, law of large number and simulation of distribution of some important statistic we often encounter in statistical methods. This paper is about one sample and two sample inference. The paper shows the convergence of t-distribution to standard normal distribution. The sum of the square of deviations of items from population mean and sample mean follow chi-square distribution with different degrees of freedom. The ratio of two sample variance follow F-distribution. It is interesting that in linear regression the sampling distribution of the estimated parameters are normally distributed.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 6: Normal Probability Distribution
6.3: Sampling Distributions and Estimators
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfRavinandan A P
Biostatistics, Unit-I, Measures of Dispersion, Dispersion
Range
variation of mean
standard deviation
Variance
coefficient of variation
standard error of the mean
Abstract: This workship introduces basic concepts of Bayes Theorem. Concepts covered are difference between independent and conditional probabilities, Bayes formulaes and examples.
Level: Fundamental
Requirements: No prior programming or statistics knowledge is required.
Abstract: This workshop teaches the application of statistics to the software quality assurance process. The course covers smoke testing, acceptance testing, Pareto principle, defect distributions, automated vs. manual testing, predicting effort, and experimental thoughts using Bellman equation approach and machine learning.
Level: Intermediate
Requirements: Some basic statistics knowledge is preferred and experience or exposure to the software quality assurance process.
Abstract: This workshop teaches basic algorithms in whiteboarding interviews. All the code examples are in Python and the course has dual purpose teaching basic Python programming.
Abstract: This PDSG workshop covers the basics of OOP programming in Python. Concepts covered are class, object, scope, method overloading and inheritance.
Level: Fundamental
Requirements: One should have some knowledge of programming.
Abstract: This PDSG workshop covers the basics of OOP programming in Python. Concepts covered are class, object, scope, method overloading and inheritance.
Level: Fundamental
Requirements: One should have some knowledge of programming.
Python - Installing and Using Python and Jupyter NotepadAndrew Ferlitsch
Abstract: This PDSG workshop covers installing Python and Juypter Notebook, and how to create a notebook.
Level: Fundamental
Requirements: One should have some knowledge of programming.
Natural Language Processing - Groupings (Associations) GenerationAndrew Ferlitsch
Abstract: This PDSG workshop covers methods to automatically generate word groupings as associations, which can be used to teach associations between objects to pre-school and early school children. Ex. What item does not belong? Cat, Dog, Fire Truck, Bird
In this presentation, I will cover how to build categorical and association dictionaries to automatically generate associations of the form, what item does not belong.
Level: Intermediate
Requirements: One should have some programming knowledge.
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Andrew Ferlitsch
Abstract: It is common for government and public datasets to include narrative fields, such as inspection reports, incident reporting, surveys, 911 calls, fire response, etc. In addition to categorical fields, such as datetime, location, demographics, these datasets tend to include a narrative description (e.g., what happened). It is typically in the narrative field that the most interesting data resides for the purpose of classifying. The problem, is that since the narrative is human interpreted and entered, each entry may be unique and if we use the whole entry as a single value, one will end up with an overfitted model that works only on the training data.
In this presentation, I will cover how natural language processing techniques are used to convert narrative fields into categorical data.
Level: Intermediate
Requirements: One should know basics of linear regression models. No prior programming knowledge is required.
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts of recurrent neural networks. Concepts covered are feed forward vs. recurrent, time progression, memory cells, short term memory predictions and long term memory predictions.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts of convolutional neural networks. Concepts covered are image pixels, image preprocessing, feature detectors, feature maps, convolution, ReLU, pooling and flattening.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required. Some knowledge of neural networks is recommended.
Machine Learning - Introduction to Neural NetworksAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts of neural networks. Concepts covered are Neurons, Binary vs. Categorical vs. Real Value output, activation functions, fully connected networks, deep neural networks, specialized learners, cost function and feed forward.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces the basics of Python libraries used in machine learning. Libraries covered are Numpy, Pandas and MathlibPlot.
Level: Fundamental
Requirements: One should have some knowledge of programming and some statistics.
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts on measuring accuracy of your trained model. Concepts covered are loss functions and confusion matrices.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts of ensemble methods in machine learning. Concepts covered are Condercet Jury Theorem, Weak Learners, Decision Stumps, Bagging and Majority Voting.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts of multiple linear regression in machine learning. Concepts covered are Feature Elimination and Backward Elimination, with examples in Python.
Level: Fundamental
Requirements: Should have some experience with Python programming.
Abstract: This PDSG workshop introduces basic concepts of simple linear regression in machine learning. Concepts covered are Slope of a Line, Loss Function, and Solving Simple Linear Regression Equation, with examples.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts of categorical variables in training data. Concepts covered are dummy variable conversion, and dummy variable trap.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts of splitting a dataset for training a model in machine learning. Concepts covered are training, test and validation data, serial and random splitting, data imbalance and k-fold cross validation.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Dataset Preparation
Abstract: This PDSG workshop introduces basic concepts on preparing a dataset for training a model. Concepts covered are data wrangling, replacing missing values, categorical variable conversion, and feature scaling.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
Abstract: This PDSG workshop introduces basic concepts on TensorFlow. The course covers fundamentals. Concepts covered are Vectors/Matrices/Vectors, Design&Run, Constants, Operations, Placeholders, Bindings, Operators, Loss Function and Training.
Level: Fundamental
Requirements: Some basic programming knowledge is preferred. No prior statistics background is required.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Statistics - Basics
1. Statistics
Mean, Median, Mode, Standard
Deviation, Normal and Sampling
Distribution, and Z-Score
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017
2. Mean
• The mean is the average of a set of samples or a
population distribution.
Sum (add) up all the samples
Example:
Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 }
1 + 2 + 2.5 + 2.5 + 3 + 3 + 3.5
7
µ = 2.5
1
𝑛
𝑖=0
𝑛
𝑥𝑖
Divide the summation by the number of samples
µ =
Symbol for mean (mu)
3. Median
• The median is the mid-point in a sorted (frequency) distribution of
samples (population).
• Odd Number of Samples – is the sample at the midpoint (center)
• Even Number of Samples – is the average of the two samples at
the midpoint (center)
Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 }
= 2.5
midpoint
Eight Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5, 4 }
= ( 2.5 + 3 ) / 2 = 2.75
midpoint
Symbol for median
4. Discrete vs. Continuous
• The values of a population can be classified as either discrete or
continuous values.
• Discrete – the values in a sample (population) are discrete if the
selected values are from a finite set of values. Examples, a fix set
of values for a categorical variable (US States), or a finite set of
numbers (person’s age in years as whole numbers).
• Continuous – the values in a sample (population) are continuous
if the selected values are from an infinite set of values. Examples,
an infinite number of real values (dollar value in checking account,
or a person’s age as a real number [not rounded]).
Ex., Age = 0, 1, 2 … 99
Checking = { $1, $10, $1046.37, $2,000,300.12, etc … }
5. Mode
• The mode is the value that occurs must frequently in a set of
samples (population distribution).
On a bar chart, it is the tallest bar.
• For discrete samples, it is the value that occurs most frequently.
• For continuous samples, it is the range that occurs must frequently,
where the values are grouped into ranges.
Samples = { 1, 2, 2, 2, 3, 3, 4, 5, 7 }
Discrete values that occur most frequent
Mode
Steps:
1. Select a Range Size (e.g., 10)
2. Partition the samples into sequential steps of the range (e.g., 10, 20, 30)
3. Assign each sample to a range that it is within.
4. Select the range with the largest number of samples.
6. Standard Deviation
• The standard deviation is a measure that is used to quantify the
amount of variation or dispersion of a set of samples (population).
1
𝑛
𝑖
𝑛
µ − 𝑥𝑖 2σ =
Symbol for standard deviation (sigma)
Sum (add) up the squared difference between the mean and each sample
Divide the summation by the number of samples
Example:
Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } , µ = 2.5
1
7
𝑖
𝑛
(2.5 – 1)2 + (2.5 – 2)2 + (2.5 – 2.5)2 + (2.5 – 2.5)2 + (2.5 – 3)2 + (2.5 – 3)2 + (2.5 – 3.5)2
1
7
𝑖
𝑛
2.25 + 0.25 + 0 + 0 + 0.25 + 0.25 + 1
1
7
∗ 4= = 4
7
= 0.87
7. Normal Distribution
• The normal (Gaussian) distribution is a distribution that is
used in probability for the expected random distribution of samples
in a population.
• Based on distributions on natural occurring things.
• 68% of the samples should be within 1 standard deviation of the mean.
• 95% of the samples should be within 2 standard deviations of the mean.
• 99.8% of the samples should be within 3 standard deviations of the mean.
8. Population vs. Sample
Population
Random Sample
Distribution
µ (mean)
σ (std. dev)
N (size)
Can be any distribution
Parameters
Probability
x̅ (mean)
s (std. dev)
n (size)
Can calculate probability of
sample is in population, when
population is known.
Statistic
9. Sampling Distribution
Population
Random Samples
( , , , … )
Sampling Distribution
µ = µ (mean)
σ =
σ
𝑛
(std. dev)
A collection of randomly chosen samples
in a population is called a sampling
distribution.
x̅
x̅
x̅
x̅
Each sample has a mean
x̅ x̅ x̅
Plot of Sample Means
Central Limit Theorem
As the number of samples increase,
plot of the sample means will
approach a normal distribution
The mean of a
sampling distribution
will approach the
mean of the
population.
x̅
x̅
Central limit theorem only specifies that the central part of a distribution of
averages will approach a normal distribution as the number of trials goes to infinity.
10. Z-Score
• The Z-Score is the same as the standard deviation from the mean
in a normal distribution.
Z-Score = 2Z-Score = -2
Arbitrary Z-score (e.g., 1.5)
Z =
(x̅ − µ )
σx̅
µ
11. Standard Normal Probabilities
• The Probability that a Z-Score for a sample will fall within the area
of a normal distribution can be looked up in the Standard Normal
Probabilities Table - http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf
50% Probability that Sample falls into the area of the distribution
µ
Probability of Sample falling within area of distribution increases with the std. deviation
12. Robot Example
• Warehouse of Boxes: Mean Weight of 50 lbs, Standard Deviation of 10 lbs.
• Pallet of Boxes: Need to move pallet of 10 boxes of unknown weight.
• Robot: Has lift limit of 560 lbs.
• Question: What is the probability the Robot can lift this pallet.
Population
Weight Distribution of Boxes
µ (mean) = 50 lbs
σ (std. dev) = 10 lbs
Pallet of 10 Boxes
Weight of Boxes Unknown
µ = µ (mean) = 50
σ =
σ
𝑛
(std. dev) = 10 / 𝟏𝟎 = 3.16
Calculate
Std. Dev.
of Pallet
max = 560 lbs / 10 boxes = 56
x̅
x̅
X̅
Z =
(x̅max − µ )
σ
x̅
Maximum mean weight of
10 boxes robot can lift.
=
𝟔
𝟑.𝟏𝟔
= 1.9Standard Normal Probability of 1.9 = 97.13 %
13. Null Hypothesis
• The Null Hypothesis H0 is the opposite of what one is trying to prove.
H0 = The mean price of a transaction has increased (e.g., µ > $25)
H1 = The mean price of a transaction has not increase (e.g., µ ≤ $25)
• To Prove the Alternate Hypothesis H1 :
• Disprove the Null Hypothesis
• Within a Level of Statistical Significance
• Example: Transaction History has µ = $25 with σ = $5
Transaction Sample has x̅ = $26.50
σ =
σ
𝑛
= 5 / 𝟏𝟎 = 1.58x̅
Z =
(x̅max − µ )
σ
=
𝟐𝟔.𝟓 −𝟐𝟓
𝟏.𝟓𝟖
= 0.95
x̅
Calculate Std. Dev. of
Transaction
Z-Score of Transaction
Standard Normal Probability of 0.95 = 82.18 %
Confidence
Level
Transaction Sample Size = 10
σ =
σ
𝑛
= 5 / 𝟏𝟎𝟎 = 0.5x̅
Z =
(x̅max − µ )
σ
=
𝟐𝟔.𝟓 −𝟐𝟓
𝟎.𝟓
= 3
x̅
Standard Normal Probability of 3 = 99.87 %
Transaction Sample Size = 100
i.e., nothing changed
14. Box (and Whisker) Plot
• A method used to visualize the spread of data.
• Split the data into quartiles (quarters).
• A box is drawn around the middle two quartiles (1st and 3rd)
• The whiskers are drawn at the end points.
0
Data Values
(x) 2nd quartile (median)
1st quartile (median of lower half)
3rd quartile (median of upper half)
Box
(IQR)
Lowest value
Highest valueWhisker
Whisker
1. Calculate the median
of the entire dataset,
Split the dataset into halves.
2. Calculate the median
of the top and lower half
of the dataset, splitting them
Into quarters.
15. Box (and Whisker) Plot - Outliers
• A variation of a box plot to show outliers.
• The whiskers are replaced with an inner and outer fence at
1.5 x IQR (inner) and 3 x IQR (outer).
• Values between 1.5 and 3 IQR are suspected outliers (white).
• Values outside of 3 IQR are outliers (black).
0
Data Values
(x)
Inner Fence (1.5 IQR)
Box
(IQR)
Inner Fence (1.5 IQR)
Outer Fence (3 IQR)
Outlier
Suspected
Outliers
Outlier