The document provides an agenda and overview for a Champion Training module on basic analytics. It includes definitions of descriptive and inferential statistics, how to monitor descriptive statistics over time, examples of numeric display terms like mean, median, and standard deviation, definitions of defects and defective units, and a table for calculating sigma levels and associated defects per million. The training aims to provide champions with knowledge and skills to lead process improvement projects.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced by a treatment through data mining / machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining/data science/business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with plenty of publications and presentations appeared in recent years from both industry practitioners and academics.
In this workshop, I will introduce the concept of Uplift, review existing methods, contrast with the traditional approach, and introduce a new method that can be implemented with standard software. A method and metrics for model assessment will be recommended. Our discussion will include new approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, using techniques from causal inference. Additionally, an integrated modeling approach for uplift and direct response (where it can be identified who actually responded, e.g., click-through or coupon scanning) will be discussed. Last but not least, extension to the multiple treatment situation with solutions to optimizing treatments at the individual level will also be discussed. While the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.
Improving predictions: Lasso, Ridge and Stein's paradoxMaarten van Smeden
Slides of masterclass "Improving predictions: Lasso, Ridge and Stein's paradox" at the (Dutch) National Institute for Public Health and the Environment (RIVM)
Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced by a treatment through data mining / machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining/data science/business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with plenty of publications and presentations appeared in recent years from both industry practitioners and academics.
In this workshop, I will introduce the concept of Uplift, review existing methods, contrast with the traditional approach, and introduce a new method that can be implemented with standard software. A method and metrics for model assessment will be recommended. Our discussion will include new approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, using techniques from causal inference. Additionally, an integrated modeling approach for uplift and direct response (where it can be identified who actually responded, e.g., click-through or coupon scanning) will be discussed. Last but not least, extension to the multiple treatment situation with solutions to optimizing treatments at the individual level will also be discussed. While the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
An extension on hypothesis testing, this lesson reviews the 1 Sample Sign & Wilcoxon tests as central tendency measurements for non-normal distributions.
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Matt Hansen
An extension on hypothesis testing, this lesson reviews the Mood’s Median & Kruskal-Wallis tests as central tendency measurements for non-normal distributions.
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
A clinical prediction model can be used in various clinical contexts, including screening for asymptomatic illness, forecasting future events such as disease, and assisting doctors in their decision-making and health education. Despite the positive effects of clinical prediction models on practice, prediction modelling is a difficult process that necessitates meticulous statistical analysis and sound clinical judgments. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Read More With Us: https://bit.ly/3dxn32c
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics across Methodologies | Wide Range of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
United Kingdom: 44-1143520021
India: 91-4448137070
WhatsApp: 91-8754446690
Development and evaluation of prediction models: pitfalls and solutions (Part...BenVanCalster
Slides for the statistics in practice session for the Biometrisches Kolloquium (organized by the Deutsche Region der Internationalen Biometrischen Gesellschaft), 16 March 2021.
Part I from Maarten van Smeden: https://www.slideshare.net/MaartenvanSmeden/development-and-evaluation-of-prediction-models-pitfalls-and-solutions
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
An extension on a series about hypothesis testing, this lesson reviews the 2 Sample T & Paired T tests as central tendency measurements for normal distributions.
Descriptive statistics are methods of describing the characteristics of a data set. It includes calculating things such as the average of the data, its spread and the shape it produces.
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:Standard)Matt Hansen
An extension on hypothesis testing, this lesson reviews the 1 Sample Sign & Wilcoxon tests as central tendency measurements for non-normal distributions.
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Matt Hansen
An extension on hypothesis testing, this lesson reviews the Mood’s Median & Kruskal-Wallis tests as central tendency measurements for non-normal distributions.
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
A clinical prediction model can be used in various clinical contexts, including screening for asymptomatic illness, forecasting future events such as disease, and assisting doctors in their decision-making and health education. Despite the positive effects of clinical prediction models on practice, prediction modelling is a difficult process that necessitates meticulous statistical analysis and sound clinical judgments. Statswork offers statistical services as per the requirements of the customers. When you Order statistical Services at Statswork, we promise you the following always on Time, outstanding customer support, and High-quality Subject Matter Experts.
Read More With Us: https://bit.ly/3dxn32c
Why Statswork?
Plagiarism Free | Unlimited Support | Prompt Turnaround Times | Subject Matter Expertise | Experienced Bio-statisticians & Statisticians | Statistics across Methodologies | Wide Range of Tools & Technologies Supports | Tutoring Services | 24/7 Email Support | Recommended by Universities
Contact Us:
Website: www.statswork.com
Email: info@statswork.com
United Kingdom: 44-1143520021
India: 91-4448137070
WhatsApp: 91-8754446690
Development and evaluation of prediction models: pitfalls and solutions (Part...BenVanCalster
Slides for the statistics in practice session for the Biometrisches Kolloquium (organized by the Deutsche Region der Internationalen Biometrischen Gesellschaft), 16 March 2021.
Part I from Maarten van Smeden: https://www.slideshare.net/MaartenvanSmeden/development-and-evaluation-of-prediction-models-pitfalls-and-solutions
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
An extension on a series about hypothesis testing, this lesson reviews the 2 Sample T & Paired T tests as central tendency measurements for normal distributions.
Descriptive statistics are methods of describing the characteristics of a data set. It includes calculating things such as the average of the data, its spread and the shape it produces.
Seven tools of quality control.slideshareraiaryan448
7 tools of quality control help identify potential problem root cause and then target them for improvements and process optimization. These are widely used in all kind of manufacturing industries along with service industry as well.
process monitoring (statistical process control)Bindutesh Saner
Statistical Process Control (SPC) is an industry
standard methodology for measuring and controlling quality during
the manufacturing process. Attribute data (measurements)
is collected from products as they are being produced. By
establishing upper and lower control limits, variations in the
process can be detected before they result in defective product,
entirely eliminating the need for final inspection.
Basic Statistics for Paid Search AdvertisingNina Estenzo
SGS is not directly affiliated with PPC Pinas.
Katharine is a full-time employee of SGS and a member of PPC Pinas.
SGS is the world's leading inspection, testing, certification and verification company.
PPC Pinas is a community for Filipino paid search professionals and individuals who have interest in search engine marketing, digital media buying and related activities.
Lecture 3
Statistical Process
Control (SPC)
Data collection for Six SigmaData are simply facts and figures without context or interpretation.Information refers to useful or meaningful patterns found in the data.Knowledge represents information of sufficient quality and/or quantity that actions can be taken based on the information.If data are not collected and used wisely, their vary existence can lead to activities that are ineffective and possibly even counterproductive.An organization collects data & reacts whenever an out-of-specification condition occurs.
“Common cause” & “ special cause” variation
There are two causes of process variations:
1) Common cause variation: This variation is due to the process only. It may not tell you whether the process meets the needs of the customer unless it is compared with the specification. This can be improved by focusing on the process.
2) Special cause variation: This variation is due the individual employee, if the point is beyond specification limits. In this case the focus should be about what happened relative to the individual employee as though it were a “special” condition.
Attribute versus Variable Data
Attribute data: It is a data with yes or no decision such as:whether an iten passed or failed a testpass/fail, go/no go gaging, true/false, accept/reject. There are no quantifiable values
Variable data: are related to measurements with quantifiable values such as:Diameter of a part which has been machinedlength or thickness of the machined part
The success of Six SigmaThe success of Six Sigma depends upon knowing the difference between special & common cause variations and how the organization reacts to the data.If the management focuses on wrong cause of variation, it can lead to waste of time (firefighting).It can also effect employee motivation & morale.Reacting to one data point that do not meet the specification limit can be counterproductive and very expensive.Do not use “firefighting” actions just because the data point is out of specification limits. It must first be determined whether the condition is common or special cause.
Example of variability due to common causeControl limits are calculated from the sample data.There are no data points outside the control limits therefore there are no special causes within the data.The source of variation in this case is “common cause” due to process.
Type of firefighting done by management before evaluating the cause of variabilityProduction supervisors might constantly review production output by employee, machine, product line, work shift etc.An administrative assistant’s daily output & memo’s may be monitored.The average time per call may be monitored in a call center.The efficiency of computer programmers may be monitored by tracking “lines of code produced per day”.
All of these actions would be a waste of time if the cause of variability is “common cause” and due to the process rather than individu ...
Similar to Basic Analytics Module for Sponsors (20)
1. Action plan and SOP for
Special Cause Variation
Determine new Goals
(UCL, LCL)
2. Module 8: Basic Analytics
Welcome to Champion Training
Facilitated by Kaplan’s Process Improvement Team
3. Agenda of Champion Training Modules
# Module # Pages
1 Introduction 19
2 Project Selection and Engaging Process Improvement 22
3 Champion Role through Project Lifecycle 26
4 Calculating Financial Benefit/ the Cost of Poor Quality 13
5 Define Overview & Tools 26
6 Measure Overview & Tools 27
7 Analyze Overview & Tools 18
8 Basic Analytics 21
9 Improve Overview & Tools 33
10 Control Overview & Tools 28
11 How to Effectuate Change Using Change Management 31
12 Kaplan’s Work Out 15
4. Purpose of This Training
Provide Kaplan champions with the knowledge and skills to be
effective leaders and coaches to their people engaged in Process
Improvement/Six Sigma projects
“Everything should be made as simple as
possible, but not too simple.”
Albert Einstein
5. Types of Statistics
Descriptive Statistics are used to describe the basic features of the data
in a study. They provide simple summaries about the sample and the
measures. Together with simple graphics analysis, they form the basis of
virtually every quantitative analysis of data. With descriptive statistics you
are simply describing what is, what the data shows.
Inferential Statistics investigate questions, models and hypotheses. In
many cases, the conclusions from inferential statistics extend beyond the
immediate data alone. For instance, we use inferential statistics to try to
infer from the sample data what the population thinks. Or, we use inferential
statistics to make judgments of the probability that an observed difference
between groups is a dependable one or one that might have happened by
chance in this study. Thus, we use inferential statistics to make inferences
from our data to more general conditions; we use descriptive statistics
simply to describe what's going on in our data.
6. Monitor Descriptive Statistics
Monitor
performance of
the Xs and Ys
over time
Verify that the
improvement
actions on the
Xs have made
the desired
improvement in
the Y
Mean, Median, Mode
Standard Deviation
7. Numeric Display Terms
• The number of data points with non-missing values in the data
set.N
• The Average
Mean (Arithmetic Mean)
• The middle data point in the data set.Median (50th Percentile)
• The Value that occurs the most frequently in a data set.Mode
• The average distance from the mean.StDev (Standard
Deviation)
• The highest value form the lowest 25% of the ranked data.
Q1 (First Quartile or 25th
Percentile)
• The lowest value from the highest 25% of the ranked data.
Q3 (Third Quartile or 75th
Percentile)
8. Defects
• A DEFECT is failure to conform to customer
requirements
• DEFECTIVE is when an entire unit fails to
meet acceptance criteria, regardless of the
number of defects within the unit.
Defective
Defect
Defective
10. Shift The Mean And Reduce Variation
Calculate new process capability after implementing the improvement or design
Determine if the new process capability (process sigma) meets stated goals
See if you achieved the desired shift, variance reduction, or DPMO reduction
11. Sigma and Normal Distribution
As you can see, the curve is divided into a series of equal increments, each
representing one standard deviation from the mean.
13. Causes for Greenbelts Not Completing Project
55
25
8
6 3 3
Not enough
time
Sponsor does
not understand
value
Did not pass
exam
Unclear of
what needed
to be done on
template
Lost template Office closed for
a month
55%
80%
88%
94%
100%97%
Causes
CumulativePercent
14. Central Limit Theorem
http://www.intuitor.com/statistics/CentralLim.html
If a random sample is drawn from any population, the sampling distribution of the sample
mean is approximately normal for a sufficiently large sample size. The larger the sample
size, the more closely the sampling distribution of the sample mean will resemble a normal
distribution
1 3 15 30
15. Yields
Rolled
Throughput
Yield
Receive request for Financial Aid
45,000 DPMO wasted
Step 1 in Financial Aid
28,650 DPMO wasted
Step 2 in Awarding Financial Aid
51,876 DPMO wasted
Financial Aid Awarded
Right
First
Time
125,526 DPMO
wasted opportunities
95.5% Yield (YTP)
97% Yield (YTP)
94.4% Yield (YTP)
Yields can be multiplied with many
process steps. Assumes independent
sources of defects.
YRT = .955*.97*.944 = 87.5%
16. Correlations are not Necessarily Causal
• City of Oldenburg, Germany
• 1930- 1936
• X-axis: stork population
• Y-axis: human population
What your mother told you about
babies when you were three is still
not right, despite the strong
correlation “evidence”.
Causal means that one variable results in the other thing occurring. In general, it is
extremely difficult to establish causality between two correlated events or observances.
There are many statistical tools to establish a statistical significant correlation.
Source: Box, Hunter, hunter Statistics For Experiments 1978
17. Regression
Regression can be used for prediction, inference, hypothesis testing, and
modeling of causal relationships
The procedure calculates estimates of the relationship between the
independent variables (advertising, price, etc.) and the dependent variable
(sales).
Simple Linear
Regression Analysis:
Y = b0 + b1X
18. The Control Chart or Shewhart Chart
Observation
IndividualValue
28252219161310741
60
50
40
30
20
10
0
_
X=29.06
UCL=55.24
LCL=2.87
1
Control Chart of Recycle
Process
Center
(usually the Mean)
Special Cause
Variation
Detected
Control
Limits
Common
Cause
Variation
19. Common Distributions
Sample size - Normal Distributions
• As the number of samples measured increases, to 30, the distribution
becomes more representative of the population.
Population
sample
NORMAL DISTRIBUTION’S IMPORTANCE
Most variables are approximately normally
distributed. This means we can use the
normal distribution as a model to help us
better understand these variables.
NORMALITY TESTS
Normality tests are used to determine if any
group of data fits a standard normal
distribution