SlideShare a Scribd company logo
1 of 49
Download to read offline
(C) The School of Continuous Improvement v1.0
1
Exploratory Data Analysis
Disclaimer
(C) The School of Continuous Improvement v1.0
2
This module on Exploratory Data Analysis is being offered free of charge to the interested
individuals who wish to learn more about using these tools to understand their datasets, better.
Usage of these tools is recommended with the help of a mentor. Please speak to us at
vishy@theschoolofci.org, should you need our mentoring on Exploratory Data Analysis.
Reproducing this module or distributing or selling it to achieve financial benefits will invite stringent
action under the concerned law of jurisdiction by the institution facilitating this module.
Body of knowledge
(C) The School of Continuous Improvement v1.0
3
1. Stem and Leaf Plot
2. Box Plot
3. Median Polish
4. Resistant Line
5. Resistant Smooth
6. Rootogram
Introduction to Exploratory Data Analysis
(C) The School of Continuous Improvement v1.0
4
Exploratory Data Analysis is an approach that has a list of techniques which can be used to understand the
data better without the need to use significance or confidence level testing.
Uses of Exploratory Data Analysis are as below:
1. Get detailed insight into your dataset.
2. Understand some critical impact variables that influence the dataset.
3. Detect if any outliers are present in the dataset.
4. Test the underlying assumptions of’ the dataset.
Exploratory Data Analysis can be done in a matter of 3 minutes using Minitab or any other statistical
software package.
Be surprised though --- We will use Microsoft Excel ® to complete these tools.
(C) The School of Continuous Improvement v1.0
5
Stem and Leaf Plot
Steam and Leaf Plot
(C) The School of Continuous Improvement v1.0
6
A contact center quality team evaluates 100 calls in the contact center. The
Quality Manager decides to review the quality scores of the operations floor.
Let us draw a stem and leaf plot to understand the data.
A snapshot of the data sheet is attached here. This data sheet can be
found in the file EDA.xls.
Steam and Leaf Plot
(C) The School of Continuous Improvement v1.0
7
Step 1 – Sort the data in ascending
order.
Step 2 – Find out the minimum and
maximum values using the MIN and
the MAX function
Step 3 – Find out the range using the
formula MAX – MIN
Step 4 – Construct the stems starting
from 0 and ending with 8. Rule for
constructing stems – If you have
a data set with 3 digit values, the
stems would need to be
constructed in accordance to the
hundredth place.
Steam and Leaf Plot
(C) The School of Continuous Improvement v1.0
8
Step 5 – We need to write the formula to compute leafs. For example, let us
take the Stem 3 highlighted in Yellow background. We need to count how
many values fall greater than 30.
Let us first write the formula to count the values that are 30.
Press Enter. See how the Leaf shows up as 0. Now, this means we have one
value of 30.
Let us change the first value of the dataset to 30 – For sake of simulations!!
As we see here, you now have two values of 30. So, the formula works!!
Steam and Leaf Plot
(C) The School of Continuous Improvement v1.0
9
Step 6 – Let us now build the formula which will count all the numbers in the
series of 30-40, i.e. 31, 32, 33, 34, 35, and so on.
Huh! That formula seems to never end, does it? Well do it just once and then it
would be easy.
But yes, it is some pain and worth it!!
Steam and Leaf Plot
(C) The School of Continuous Improvement v1.0
10
Step 7 – The Stem and Leaf Plot
as shown here.
Step 8 – Let us the LEN and
SUBSTITUTE formula
together to add the
interpretation.
Stem and Leaf Plot
(C) The School of Continuous Improvement v1.0
11
1. You have an easier option to run a macro to generate the
Stem and Leaf Plot, but VBA coding is not everyone’s
cup of tea.
2. You could use some statistical software but that may turn
out to be slightly expensive.
3. With the use of some simple Excel formulas, you have
discovered tool 1 which is used to show granularity in
information in the dataset.
4. That is the Steam and Leaf Plot for you.
(C) The School of Continuous Improvement v1.0
12
Box Plot
Box Plot
(C) The School of Continuous Improvement v1.0
13
Granularity as provided by the Stem and Leaf Plot is good, but at times you
need a graph that shows the data shape, its distribution and the spread. That’s
where we use the Box Plot.
Let us draw a Box plot to understand the data.
5 teams of a factory produce homogenous units. The sampled cycle
times are shown as below.
Box Plot
(C) The School of Continuous Improvement v1.0
14
Step 1 – Let us setup the table as
seen here. We know how to
calculate the Minimum and
Maximum value.
Step 2 – Calculate the Median, and
the Quartile values using the
formulas below
Median: = MEDIAN()
1st Quartile: = PERCENTILE(Data
range, 25%)
3rd Quartile: = PERCENTILE(Data
range, 75%)
Box Plot
(C) The School of Continuous Improvement v1.0
15
Step 3 – Although you have
prepared the basic data needed, we
aren’t ready to draw the Box Plot
yet. We need to prepare another
table, one that is shown here.
Step 4 – In the row titled Series 1,
fetch the minimum values for the
Teams.
In the row titled Series 2,
subtract the Minimum value
from the 1st Quartile value
from the Summary Range table.
Box Plot
(C) The School of Continuous Improvement v1.0
16
Step 5– In the row titled Series 3,
subtract the 1st Quartile value from
the Median value.
In the row titled Series 4, subtract
the Median from the 3rd Quartile
value.
In the row titled Series 5, subtract
the 3rd Quartile value from the
Maximum value.
Let us now try to draw the
Box Plot.
Box Plot
(C) The School of Continuous Improvement v1.0
17
Step 6– Select data from Series to Series 4. Don’t select Series 5 as of yet. We
will do it later.
Select 2D Column – Stacked Column Chart.
Box Plot
(C) The School of Continuous Improvement v1.0
18
Step 7– Obviously the chart is not a
completed Box Plot. We need to work
around a few things on Excel. Let us
first hide the Series 1 in the graph
generated.
To do this, right click on Series 1 on
the graph.
Click on Format Data Series.
Click on Fill. Select No fill.
Click on Border Color. Select No
color.
See how the blue bars for Series
1 go away.
Box Plot
(C) The School of Continuous Improvement v1.0
19
Step 8– Repeat the same steps as in
Step 7 discussed in the previous slide
but leave the cursor selected on the
axis of Series 2.
Step 9 – We need to define the
Whiskers. To do that,
Click on Layout, click on Error
Bars and click on More Error
Bar options.
Step 10 – In the dialog window box
that opens up, select Minus for
Direction and change the
percentage to 100.
Box Plot
(C) The School of Continuous Improvement v1.0
20
Step 11– After doing Step 9 and Step
10, the graph changes shape to what
is seen here. Take a look at the graph.
Step 12 – Repeat steps 9 and 10 for
Series 4. A small change. In the More
Error bars options, select the
Direction to Plus.
You will see how the lower and
upper whiskers are defined
now.
Box Plot
(C) The School of Continuous Improvement v1.0
21
Step 11– After doing Step 9 and Step
10, the graph changes shape to what
is seen here. Take a look at the graph.
Step 12 – Repeat steps 9 and 10 for
Series 4. A small change. In the More
Error bars options, select the
Direction to Plus.
You will see how the lower and
upper whiskers are defined
now.
Box Plot
(C) The School of Continuous Improvement v1.0
22
Step 13– Oops something went wrong
with the graph here. We have not
defined the Maximum values here.
Step 14 – Click on the lines at the top.
Click on Layout, Click on More Error
Bars and in the window that opens
up, select Custom and specify values.
Select the maximum values from
the data for chart table, aka
Series 5.
Box Plot
(C) The School of Continuous Improvement v1.0
23
Step 15– The Box Plot is ready now. We can now start interpreting. Obviously
we spent some time making this Box Plot, but it is a one time effort. Once you
are able to construct this, you can use this as a Box Plot Template.
Box Plot
Interpretation
1. The Median cycle time for Team C seems the
lowest at approximately 20 minutes.
2. Team A shows the greatest spread in data.
3. Data for Team A is also heavily skewed.
4. Team E seems to have a good % of population
in the lower end of the cycle time.
Box Plot
(C) The School of Continuous Improvement v1.0
24
1. Box Plot doesn’t confirm anything. It is thus not a confirmatory data analysis
tool.
2. Given the fact that a Box Plot is able to tell you information about central
tendency, spread and shape of the data, you can use this EDA tool pretty
much everywhere you have stratified data.
3. You can also use this tool where you just have one sample of data and you
wish to study properties of that sample.
(C) The School of Continuous Improvement v1.0
25
Median Polish
Median Polish
(C) The School of Continuous Improvement v1.0
26
In Inferential statistics, Analysis of Variance is a Hypothesis testing measure
that fits an additive model to a 2-way design and identifies data patterns not
explained by Row and Column variable effects.
Median Polish does a similar thing except that Median Polish will
use Medians.
A company wishes to conduct a Median Polish on the percentage
scores achieved by students in each course of an IT institution.
Table 1
Median Polish
(C) The School of Continuous Improvement v1.0
27
Step 1 – First find out the medians
of all the course scores individually
and subtract the individual mean
performance scores from the
median. This is known as the 1st
sweep.
Step 2 – Now, do the 2nd sweep. In
the second sweep, subtract the
median from table 2 (Last row)
and the Row median from table 2
(Last column) (Both highlighted)
from the table values of table 1.
For the column median, subtract
2nd Sweep value for any cell with
the corresponding cell in 1st sweep.
Table 2
Table 3
Median Polish
(C) The School of Continuous Improvement v1.0
28
Step 3 – Let’s do the 3rd sweep
now. Subtract the row values
obtained in table 3 from the row
medians. Identify the new column
medians in the 3rd sweep itself. The
new row medians = Change
Median – Median from table 3.
Table 4
Step 4 – Time for the 4th sweep. Subtract all the row value in table 4
from the 3rd sweep column median. This will give you the row values for
new table which we would be constructing.
Also add the Column Median value with the 3rd Sweep Column Median.
Median Polish
(C) The School of Continuous Improvement v1.0
29
Table 5
Step 4 – Time for the 4th sweep. Subtract all the row value in table 4
from the 3rd sweep column median. This will give you the row values for
new table which we would be constructing.
Also add the Column Median value with the 3rd Sweep Column Median.
Median Polish
(C) The School of Continuous Improvement v1.0
30
Table 6 – Final
Residual Table
Median Polish
(C) The School of Continuous Improvement v1.0
31
Interpretations
1. The average test score performance
across all the courses was 44.25%.
2. People who do JAVA programs alone
score approximately 13 points less than
those who do .NET.
3. Oh yes, look at the Column effects from
the Residual table. Students with 90%
attendance outscore the ones with 70%
attendance by 5 points.
Median Polish
(C) The School of Continuous Improvement v1.0
32
Final Notes
1. The tediousness of calculations shouldn’t shy you away from this wonderful
tool.
2. In a 2*2 design where there is a possibility that one of them is categorical,
Median polish comes in very handy in establishing relationships.
3. With the power of calculating residuals with the Median Polish tool, you can
also predict on what could happen in the future.
(C) The School of Continuous Improvement v1.0
33
Histogram
Histogram
(C) The School of Continuous Improvement v1.0
34
Histogram is another important EDA tool, which you can use when you wish
to check the shape. Importantly, histogram will outline issues in the data like
1. Modality issues
2. Skew issues
3. Mixed distribution issues
Let us go back to the cycle time data and try to plot the histogram with the
help of Excel.
Histogram
(C) The School of Continuous Improvement v1.0
35
Step 1 – Let us first calculate the descriptive statistics measures for all the teams.
As you can see from the table shown here, most of the formulas are basic except
for the ones shaded in Light amber background.
IQR = 3rd Quartile – 1st Quartile
Bin width = 2*Count1/3
Number of bins = (Maximum – Minimum)/ Bin width
Histogram
(C) The School of Continuous Improvement v1.0
36
Step 2 – Let us now define with the bins. Start with the minimum value. For
example, for Team A the first bin would be 0.32. The next bin will be = 0.32+Bin
Size (7.26). The third bin would be 7.53+ 7.26 and so on. Continue this until you
reach 7 bins.
Histogram
(C) The School of Continuous Improvement v1.0
37
Step 3 – Let us first draw the Histogram for one team’s metric performance, e.g.
Team A.
Steps to draw a Histogram
1. Click on Data. Click on Data Analysis (If this option is not available, please
insert the Data Analysis Add-in).
2. From the Data Analysis Dialog window, choose Histogram.
3. In the section showing Input variable, select data corresponding to Team A.
4. In the section showing Bin range, select Bin range corresponding to Team A.
5. Put a tick on Chart Output and Click Ok.
Histogram
(C) The School of Continuous Improvement v1.0
38
We achieved this nice
looking Histogram by
reducing the Gap to 0%
on the graph.
Histogram
(C) The School of Continuous Improvement v1.0
39
Interpretations
1. Bi-modality observed at 7.53 and 56. Is
this due to an external issue?
2. If the Bi-modality is resolved, we’d get a
close to a perfect distribution, but what is
the reason for this bi-modality?
3. It could difference in suppliers, difference
in changeovers, difference in raw materials
--- Anything?
Rootogram
(C) The School of Continuous Improvement v1.0
40
Interpretations
1. Introduction of a new tool here. Instead of having the frequencies on the
vertical axis, you can now take the square root of all the frequencies on
the vertical axis and what you have is known as the Rootogram.
2. The x-axis is the response variable instead of bins used in a Histogram.
Histogram
(C) The School of Continuous Improvement v1.0
41
Based on the 4 Histograms drawn for each of the teams, what can you
infer?
Which team’s data distribution is close to being a normal
distribution?
Rootogram
(C) The School of Continuous Improvement v1.0
42
Interpretations
1. Introduction of a new tool here. Instead of having the frequencies on the
vertical axis, you can now take the square root of all the frequencies on
the vertical axis and what you have is known as the Rootogram.
2. The x-axis is the response variable instead of bins used in a Histogram.
(C) The School of Continuous Improvement v1.0
43
Scatter Plot
Scatter Plot
(C) The School of Continuous Improvement v1.0
44
Most times in projects we stumble upon the fact that x impact y. In other
words, y = f(x). Now, using scatter plots, you can visually understand if there is
a relationship between x and y.
Let us use data for two variables – Machine downtime and
production capacity for a factory to understand how does a scatter
plot work. Downtime is expressed in % and Production Capacity is
expressed in tons.
Scatter Plot
(C) The School of Continuous Improvement v1.0
45
Step 1 – Select the data, Click
on Insert, Click on Scatter and
Click on Scatter with only
markers.
Step 2 – Voila – you are done.
There you have the scatter
chart as seen here.
Scatter Plot
(C) The School of Continuous Improvement v1.0
46
Step 3 – Modification to a Regression equation
This is where you can use an EDA tool as an Inferential statistics tool. Right
click on any point in the graph and click on Add Trendline. Select Linear, Display
equation and Display R-Square.
Scatter Plot
(C) The School of Continuous Improvement v1.0
47
Step 4 – Interpretation
While the scatter graph itself visually revealed absence of any strong correlation
between downtime and production capacity, the regression statistics merely
confirm.
The R-Square value needs to be > 0.64 for us to conclude strong correlation.
Final Notes
(C) The School of Continuous Improvement v1.0
48
1. This module covers most of the tools used in Exploratory data analysis.
2. Some other tools are:
a. Parallel Coordinates
b. Run Charts
c. Odds Ratio
d. Principal Components Analysis
e. Ordination
Please write into us at vishy@theschoolofci.org for
usage of EDA tools if you have doubts or also follow us
at Linkedin on The School of Continuous
Improvement.
(C) The School of Continuous Improvement v1.0 49
Thank you….

More Related Content

What's hot

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysismlong24
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data VisualizationStephen Tracy
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analyticsSSaudia
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 

What's hot (20)

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Eda sri
Eda sriEda sri
Eda sri
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
R programming
R programmingR programming
R programming
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Data visualization
Data visualizationData visualization
Data visualization
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Viewers also liked

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysisthinrhino
 
Sampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysisSampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysisEli Bressert
 
Exploratory data analysis handbook (from www.nist.gov, Engineering Statistic...
Exploratory data analysis handbook (from www.nist.gov,  Engineering Statistic...Exploratory data analysis handbook (from www.nist.gov,  Engineering Statistic...
Exploratory data analysis handbook (from www.nist.gov, Engineering Statistic...Stella Tsank
 
Data Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of productsData Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of productsEli Bressert
 
Circuit City Report1 (2) Earning Call Example
Circuit City Report1 (2) Earning Call ExampleCircuit City Report1 (2) Earning Call Example
Circuit City Report1 (2) Earning Call Examplewcampagn
 
Steam & Leaf Diagram
Steam & Leaf DiagramSteam & Leaf Diagram
Steam & Leaf Diagramnikkisimonson
 
03.statistica psihologica m_popa (2) (1)
03.statistica psihologica m_popa (2) (1)03.statistica psihologica m_popa (2) (1)
03.statistica psihologica m_popa (2) (1)Florina
 
Data input and transformation
Data input and transformationData input and transformation
Data input and transformationMohsin Siddique
 
Qualitative data analysis: many approaches to understand user insights
Qualitative data analysis: many approaches to understand user insightsQualitative data analysis: many approaches to understand user insights
Qualitative data analysis: many approaches to understand user insightsAgnieszka Szóstek
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 

Viewers also liked (18)

Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Sampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysisSampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysis
 
Exploratory data analysis handbook (from www.nist.gov, Engineering Statistic...
Exploratory data analysis handbook (from www.nist.gov,  Engineering Statistic...Exploratory data analysis handbook (from www.nist.gov,  Engineering Statistic...
Exploratory data analysis handbook (from www.nist.gov, Engineering Statistic...
 
Exploratory data analysis coursera
Exploratory data analysis courseraExploratory data analysis coursera
Exploratory data analysis coursera
 
Data Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of productsData Over Matter: Innovating the next generation of products
Data Over Matter: Innovating the next generation of products
 
Ryd.io
Ryd.ioRyd.io
Ryd.io
 
Circuit City Report1 (2) Earning Call Example
Circuit City Report1 (2) Earning Call ExampleCircuit City Report1 (2) Earning Call Example
Circuit City Report1 (2) Earning Call Example
 
Steam & Leaf Diagram
Steam & Leaf DiagramSteam & Leaf Diagram
Steam & Leaf Diagram
 
03.statistica psihologica m_popa (2) (1)
03.statistica psihologica m_popa (2) (1)03.statistica psihologica m_popa (2) (1)
03.statistica psihologica m_popa (2) (1)
 
Data input and transformation
Data input and transformationData input and transformation
Data input and transformation
 
Qualitative data analysis: many approaches to understand user insights
Qualitative data analysis: many approaches to understand user insightsQualitative data analysis: many approaches to understand user insights
Qualitative data analysis: many approaches to understand user insights
 
Xgboost
XgboostXgboost
Xgboost
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 
Culture
CultureCulture
Culture
 

Similar to Exploratory data analysis v1.0

How to create graphs for science
How to create graphs for scienceHow to create graphs for science
How to create graphs for scienceBrad Kremer
 
Using Microsoft Excel for Weibull Analysis by William Dorner
Using Microsoft Excel for Weibull Analysis by William DornerUsing Microsoft Excel for Weibull Analysis by William Dorner
Using Microsoft Excel for Weibull Analysis by William DornerMelvin Carter
 
De vry math 399 all ilabs latest 2016 november
De vry math 399 all ilabs latest 2016 novemberDe vry math 399 all ilabs latest 2016 november
De vry math 399 all ilabs latest 2016 novemberlenasour
 
Using microsoft excel for weibull analysis
Using microsoft excel for weibull analysisUsing microsoft excel for weibull analysis
Using microsoft excel for weibull analysisMelvin Carter
 
Class6 term2 2019-20
Class6 term2 2019-20Class6 term2 2019-20
Class6 term2 2019-20Andrew Raj
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
computer applications in business unit 3
computer applications in business unit 3computer applications in business unit 3
computer applications in business unit 3Dr T.Sivakami
 
De vry math221 all ilabs latest 2016 november
De vry math221 all ilabs latest 2016 novemberDe vry math221 all ilabs latest 2016 november
De vry math221 all ilabs latest 2016 novemberlenasour
 
De vry math 221 all ilabs latest 2016 november
De vry math 221 all ilabs latest 2016 novemberDe vry math 221 all ilabs latest 2016 november
De vry math 221 all ilabs latest 2016 novemberlenasour
 
Itm310 problem solving #7 complete solutions correct answers key
Itm310 problem solving #7 complete solutions correct answers keyItm310 problem solving #7 complete solutions correct answers key
Itm310 problem solving #7 complete solutions correct answers keySong Love
 
qc-tools.ppt
qc-tools.pptqc-tools.ppt
qc-tools.pptAlpharoot
 
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiMAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiAbramMartino96
 
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiMAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiAbramMartino96
 
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docxENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docxYASHU40
 
Name _______________________________ Class time __________.docx
Name _______________________________    Class time __________.docxName _______________________________    Class time __________.docx
Name _______________________________ Class time __________.docxrosemarybdodson23141
 
ENGR 131 Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131  Elementary Computer ProgrammingTeam IN – InstructorENGR 131  Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131 Elementary Computer ProgrammingTeam IN – InstructorTanaMaeskm
 

Similar to Exploratory data analysis v1.0 (20)

How to create graphs for science
How to create graphs for scienceHow to create graphs for science
How to create graphs for science
 
Using Microsoft Excel for Weibull Analysis by William Dorner
Using Microsoft Excel for Weibull Analysis by William DornerUsing Microsoft Excel for Weibull Analysis by William Dorner
Using Microsoft Excel for Weibull Analysis by William Dorner
 
De vry math 399 all ilabs latest 2016 november
De vry math 399 all ilabs latest 2016 novemberDe vry math 399 all ilabs latest 2016 november
De vry math 399 all ilabs latest 2016 november
 
Using microsoft excel for weibull analysis
Using microsoft excel for weibull analysisUsing microsoft excel for weibull analysis
Using microsoft excel for weibull analysis
 
Class6 term2 2019-20
Class6 term2 2019-20Class6 term2 2019-20
Class6 term2 2019-20
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
Mmt 001
Mmt 001Mmt 001
Mmt 001
 
Excel Training
Excel TrainingExcel Training
Excel Training
 
computer applications in business unit 3
computer applications in business unit 3computer applications in business unit 3
computer applications in business unit 3
 
De vry math221 all ilabs latest 2016 november
De vry math221 all ilabs latest 2016 novemberDe vry math221 all ilabs latest 2016 november
De vry math221 all ilabs latest 2016 november
 
De vry math 221 all ilabs latest 2016 november
De vry math 221 all ilabs latest 2016 novemberDe vry math 221 all ilabs latest 2016 november
De vry math 221 all ilabs latest 2016 november
 
Itm310 problem solving #7 complete solutions correct answers key
Itm310 problem solving #7 complete solutions correct answers keyItm310 problem solving #7 complete solutions correct answers key
Itm310 problem solving #7 complete solutions correct answers key
 
qc-tools.ppt
qc-tools.pptqc-tools.ppt
qc-tools.ppt
 
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiMAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
 
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wiMAT 240 Random Sampling in Excel Tutorial This tutorial wi
MAT 240 Random Sampling in Excel Tutorial This tutorial wi
 
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docxENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
ENGR 102B Microsoft Excel Proficiency LevelsPlease have your in.docx
 
Mathcad
MathcadMathcad
Mathcad
 
Name _______________________________ Class time __________.docx
Name _______________________________    Class time __________.docxName _______________________________    Class time __________.docx
Name _______________________________ Class time __________.docx
 
ENGR 131 Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131  Elementary Computer ProgrammingTeam IN – InstructorENGR 131  Elementary Computer ProgrammingTeam IN – Instructor
ENGR 131 Elementary Computer ProgrammingTeam IN – Instructor
 
introduction
introductionintroduction
introduction
 

More from Vishy Chandra

Lean Cost per Loan Reduction
Lean Cost per Loan ReductionLean Cost per Loan Reduction
Lean Cost per Loan ReductionVishy Chandra
 
Project management without pmp
Project management without pmpProject management without pmp
Project management without pmpVishy Chandra
 
16 big losses for manufacturing and services
16 big losses for manufacturing and services16 big losses for manufacturing and services
16 big losses for manufacturing and servicesVishy Chandra
 
Certified kaizen practitioner v1.0
Certified kaizen practitioner v1.0Certified kaizen practitioner v1.0
Certified kaizen practitioner v1.0Vishy Chandra
 
Yellow belt process improvement training and certification module
Yellow belt process improvement training and certification moduleYellow belt process improvement training and certification module
Yellow belt process improvement training and certification moduleVishy Chandra
 
Lean six sigma deployment success checklist
Lean six sigma deployment success checklistLean six sigma deployment success checklist
Lean six sigma deployment success checklistVishy Chandra
 

More from Vishy Chandra (6)

Lean Cost per Loan Reduction
Lean Cost per Loan ReductionLean Cost per Loan Reduction
Lean Cost per Loan Reduction
 
Project management without pmp
Project management without pmpProject management without pmp
Project management without pmp
 
16 big losses for manufacturing and services
16 big losses for manufacturing and services16 big losses for manufacturing and services
16 big losses for manufacturing and services
 
Certified kaizen practitioner v1.0
Certified kaizen practitioner v1.0Certified kaizen practitioner v1.0
Certified kaizen practitioner v1.0
 
Yellow belt process improvement training and certification module
Yellow belt process improvement training and certification moduleYellow belt process improvement training and certification module
Yellow belt process improvement training and certification module
 
Lean six sigma deployment success checklist
Lean six sigma deployment success checklistLean six sigma deployment success checklist
Lean six sigma deployment success checklist
 

Recently uploaded

FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 

Recently uploaded (20)

FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 

Exploratory data analysis v1.0

  • 1. (C) The School of Continuous Improvement v1.0 1 Exploratory Data Analysis
  • 2. Disclaimer (C) The School of Continuous Improvement v1.0 2 This module on Exploratory Data Analysis is being offered free of charge to the interested individuals who wish to learn more about using these tools to understand their datasets, better. Usage of these tools is recommended with the help of a mentor. Please speak to us at vishy@theschoolofci.org, should you need our mentoring on Exploratory Data Analysis. Reproducing this module or distributing or selling it to achieve financial benefits will invite stringent action under the concerned law of jurisdiction by the institution facilitating this module.
  • 3. Body of knowledge (C) The School of Continuous Improvement v1.0 3 1. Stem and Leaf Plot 2. Box Plot 3. Median Polish 4. Resistant Line 5. Resistant Smooth 6. Rootogram
  • 4. Introduction to Exploratory Data Analysis (C) The School of Continuous Improvement v1.0 4 Exploratory Data Analysis is an approach that has a list of techniques which can be used to understand the data better without the need to use significance or confidence level testing. Uses of Exploratory Data Analysis are as below: 1. Get detailed insight into your dataset. 2. Understand some critical impact variables that influence the dataset. 3. Detect if any outliers are present in the dataset. 4. Test the underlying assumptions of’ the dataset. Exploratory Data Analysis can be done in a matter of 3 minutes using Minitab or any other statistical software package. Be surprised though --- We will use Microsoft Excel ® to complete these tools.
  • 5. (C) The School of Continuous Improvement v1.0 5 Stem and Leaf Plot
  • 6. Steam and Leaf Plot (C) The School of Continuous Improvement v1.0 6 A contact center quality team evaluates 100 calls in the contact center. The Quality Manager decides to review the quality scores of the operations floor. Let us draw a stem and leaf plot to understand the data. A snapshot of the data sheet is attached here. This data sheet can be found in the file EDA.xls.
  • 7. Steam and Leaf Plot (C) The School of Continuous Improvement v1.0 7 Step 1 – Sort the data in ascending order. Step 2 – Find out the minimum and maximum values using the MIN and the MAX function Step 3 – Find out the range using the formula MAX – MIN Step 4 – Construct the stems starting from 0 and ending with 8. Rule for constructing stems – If you have a data set with 3 digit values, the stems would need to be constructed in accordance to the hundredth place.
  • 8. Steam and Leaf Plot (C) The School of Continuous Improvement v1.0 8 Step 5 – We need to write the formula to compute leafs. For example, let us take the Stem 3 highlighted in Yellow background. We need to count how many values fall greater than 30. Let us first write the formula to count the values that are 30. Press Enter. See how the Leaf shows up as 0. Now, this means we have one value of 30. Let us change the first value of the dataset to 30 – For sake of simulations!! As we see here, you now have two values of 30. So, the formula works!!
  • 9. Steam and Leaf Plot (C) The School of Continuous Improvement v1.0 9 Step 6 – Let us now build the formula which will count all the numbers in the series of 30-40, i.e. 31, 32, 33, 34, 35, and so on. Huh! That formula seems to never end, does it? Well do it just once and then it would be easy. But yes, it is some pain and worth it!!
  • 10. Steam and Leaf Plot (C) The School of Continuous Improvement v1.0 10 Step 7 – The Stem and Leaf Plot as shown here. Step 8 – Let us the LEN and SUBSTITUTE formula together to add the interpretation.
  • 11. Stem and Leaf Plot (C) The School of Continuous Improvement v1.0 11 1. You have an easier option to run a macro to generate the Stem and Leaf Plot, but VBA coding is not everyone’s cup of tea. 2. You could use some statistical software but that may turn out to be slightly expensive. 3. With the use of some simple Excel formulas, you have discovered tool 1 which is used to show granularity in information in the dataset. 4. That is the Steam and Leaf Plot for you.
  • 12. (C) The School of Continuous Improvement v1.0 12 Box Plot
  • 13. Box Plot (C) The School of Continuous Improvement v1.0 13 Granularity as provided by the Stem and Leaf Plot is good, but at times you need a graph that shows the data shape, its distribution and the spread. That’s where we use the Box Plot. Let us draw a Box plot to understand the data. 5 teams of a factory produce homogenous units. The sampled cycle times are shown as below.
  • 14. Box Plot (C) The School of Continuous Improvement v1.0 14 Step 1 – Let us setup the table as seen here. We know how to calculate the Minimum and Maximum value. Step 2 – Calculate the Median, and the Quartile values using the formulas below Median: = MEDIAN() 1st Quartile: = PERCENTILE(Data range, 25%) 3rd Quartile: = PERCENTILE(Data range, 75%)
  • 15. Box Plot (C) The School of Continuous Improvement v1.0 15 Step 3 – Although you have prepared the basic data needed, we aren’t ready to draw the Box Plot yet. We need to prepare another table, one that is shown here. Step 4 – In the row titled Series 1, fetch the minimum values for the Teams. In the row titled Series 2, subtract the Minimum value from the 1st Quartile value from the Summary Range table.
  • 16. Box Plot (C) The School of Continuous Improvement v1.0 16 Step 5– In the row titled Series 3, subtract the 1st Quartile value from the Median value. In the row titled Series 4, subtract the Median from the 3rd Quartile value. In the row titled Series 5, subtract the 3rd Quartile value from the Maximum value. Let us now try to draw the Box Plot.
  • 17. Box Plot (C) The School of Continuous Improvement v1.0 17 Step 6– Select data from Series to Series 4. Don’t select Series 5 as of yet. We will do it later. Select 2D Column – Stacked Column Chart.
  • 18. Box Plot (C) The School of Continuous Improvement v1.0 18 Step 7– Obviously the chart is not a completed Box Plot. We need to work around a few things on Excel. Let us first hide the Series 1 in the graph generated. To do this, right click on Series 1 on the graph. Click on Format Data Series. Click on Fill. Select No fill. Click on Border Color. Select No color. See how the blue bars for Series 1 go away.
  • 19. Box Plot (C) The School of Continuous Improvement v1.0 19 Step 8– Repeat the same steps as in Step 7 discussed in the previous slide but leave the cursor selected on the axis of Series 2. Step 9 – We need to define the Whiskers. To do that, Click on Layout, click on Error Bars and click on More Error Bar options. Step 10 – In the dialog window box that opens up, select Minus for Direction and change the percentage to 100.
  • 20. Box Plot (C) The School of Continuous Improvement v1.0 20 Step 11– After doing Step 9 and Step 10, the graph changes shape to what is seen here. Take a look at the graph. Step 12 – Repeat steps 9 and 10 for Series 4. A small change. In the More Error bars options, select the Direction to Plus. You will see how the lower and upper whiskers are defined now.
  • 21. Box Plot (C) The School of Continuous Improvement v1.0 21 Step 11– After doing Step 9 and Step 10, the graph changes shape to what is seen here. Take a look at the graph. Step 12 – Repeat steps 9 and 10 for Series 4. A small change. In the More Error bars options, select the Direction to Plus. You will see how the lower and upper whiskers are defined now.
  • 22. Box Plot (C) The School of Continuous Improvement v1.0 22 Step 13– Oops something went wrong with the graph here. We have not defined the Maximum values here. Step 14 – Click on the lines at the top. Click on Layout, Click on More Error Bars and in the window that opens up, select Custom and specify values. Select the maximum values from the data for chart table, aka Series 5.
  • 23. Box Plot (C) The School of Continuous Improvement v1.0 23 Step 15– The Box Plot is ready now. We can now start interpreting. Obviously we spent some time making this Box Plot, but it is a one time effort. Once you are able to construct this, you can use this as a Box Plot Template. Box Plot Interpretation 1. The Median cycle time for Team C seems the lowest at approximately 20 minutes. 2. Team A shows the greatest spread in data. 3. Data for Team A is also heavily skewed. 4. Team E seems to have a good % of population in the lower end of the cycle time.
  • 24. Box Plot (C) The School of Continuous Improvement v1.0 24 1. Box Plot doesn’t confirm anything. It is thus not a confirmatory data analysis tool. 2. Given the fact that a Box Plot is able to tell you information about central tendency, spread and shape of the data, you can use this EDA tool pretty much everywhere you have stratified data. 3. You can also use this tool where you just have one sample of data and you wish to study properties of that sample.
  • 25. (C) The School of Continuous Improvement v1.0 25 Median Polish
  • 26. Median Polish (C) The School of Continuous Improvement v1.0 26 In Inferential statistics, Analysis of Variance is a Hypothesis testing measure that fits an additive model to a 2-way design and identifies data patterns not explained by Row and Column variable effects. Median Polish does a similar thing except that Median Polish will use Medians. A company wishes to conduct a Median Polish on the percentage scores achieved by students in each course of an IT institution. Table 1
  • 27. Median Polish (C) The School of Continuous Improvement v1.0 27 Step 1 – First find out the medians of all the course scores individually and subtract the individual mean performance scores from the median. This is known as the 1st sweep. Step 2 – Now, do the 2nd sweep. In the second sweep, subtract the median from table 2 (Last row) and the Row median from table 2 (Last column) (Both highlighted) from the table values of table 1. For the column median, subtract 2nd Sweep value for any cell with the corresponding cell in 1st sweep. Table 2 Table 3
  • 28. Median Polish (C) The School of Continuous Improvement v1.0 28 Step 3 – Let’s do the 3rd sweep now. Subtract the row values obtained in table 3 from the row medians. Identify the new column medians in the 3rd sweep itself. The new row medians = Change Median – Median from table 3. Table 4 Step 4 – Time for the 4th sweep. Subtract all the row value in table 4 from the 3rd sweep column median. This will give you the row values for new table which we would be constructing. Also add the Column Median value with the 3rd Sweep Column Median.
  • 29. Median Polish (C) The School of Continuous Improvement v1.0 29 Table 5 Step 4 – Time for the 4th sweep. Subtract all the row value in table 4 from the 3rd sweep column median. This will give you the row values for new table which we would be constructing. Also add the Column Median value with the 3rd Sweep Column Median.
  • 30. Median Polish (C) The School of Continuous Improvement v1.0 30 Table 6 – Final Residual Table
  • 31. Median Polish (C) The School of Continuous Improvement v1.0 31 Interpretations 1. The average test score performance across all the courses was 44.25%. 2. People who do JAVA programs alone score approximately 13 points less than those who do .NET. 3. Oh yes, look at the Column effects from the Residual table. Students with 90% attendance outscore the ones with 70% attendance by 5 points.
  • 32. Median Polish (C) The School of Continuous Improvement v1.0 32 Final Notes 1. The tediousness of calculations shouldn’t shy you away from this wonderful tool. 2. In a 2*2 design where there is a possibility that one of them is categorical, Median polish comes in very handy in establishing relationships. 3. With the power of calculating residuals with the Median Polish tool, you can also predict on what could happen in the future.
  • 33. (C) The School of Continuous Improvement v1.0 33 Histogram
  • 34. Histogram (C) The School of Continuous Improvement v1.0 34 Histogram is another important EDA tool, which you can use when you wish to check the shape. Importantly, histogram will outline issues in the data like 1. Modality issues 2. Skew issues 3. Mixed distribution issues Let us go back to the cycle time data and try to plot the histogram with the help of Excel.
  • 35. Histogram (C) The School of Continuous Improvement v1.0 35 Step 1 – Let us first calculate the descriptive statistics measures for all the teams. As you can see from the table shown here, most of the formulas are basic except for the ones shaded in Light amber background. IQR = 3rd Quartile – 1st Quartile Bin width = 2*Count1/3 Number of bins = (Maximum – Minimum)/ Bin width
  • 36. Histogram (C) The School of Continuous Improvement v1.0 36 Step 2 – Let us now define with the bins. Start with the minimum value. For example, for Team A the first bin would be 0.32. The next bin will be = 0.32+Bin Size (7.26). The third bin would be 7.53+ 7.26 and so on. Continue this until you reach 7 bins.
  • 37. Histogram (C) The School of Continuous Improvement v1.0 37 Step 3 – Let us first draw the Histogram for one team’s metric performance, e.g. Team A. Steps to draw a Histogram 1. Click on Data. Click on Data Analysis (If this option is not available, please insert the Data Analysis Add-in). 2. From the Data Analysis Dialog window, choose Histogram. 3. In the section showing Input variable, select data corresponding to Team A. 4. In the section showing Bin range, select Bin range corresponding to Team A. 5. Put a tick on Chart Output and Click Ok.
  • 38. Histogram (C) The School of Continuous Improvement v1.0 38 We achieved this nice looking Histogram by reducing the Gap to 0% on the graph.
  • 39. Histogram (C) The School of Continuous Improvement v1.0 39 Interpretations 1. Bi-modality observed at 7.53 and 56. Is this due to an external issue? 2. If the Bi-modality is resolved, we’d get a close to a perfect distribution, but what is the reason for this bi-modality? 3. It could difference in suppliers, difference in changeovers, difference in raw materials --- Anything?
  • 40. Rootogram (C) The School of Continuous Improvement v1.0 40 Interpretations 1. Introduction of a new tool here. Instead of having the frequencies on the vertical axis, you can now take the square root of all the frequencies on the vertical axis and what you have is known as the Rootogram. 2. The x-axis is the response variable instead of bins used in a Histogram.
  • 41. Histogram (C) The School of Continuous Improvement v1.0 41 Based on the 4 Histograms drawn for each of the teams, what can you infer? Which team’s data distribution is close to being a normal distribution?
  • 42. Rootogram (C) The School of Continuous Improvement v1.0 42 Interpretations 1. Introduction of a new tool here. Instead of having the frequencies on the vertical axis, you can now take the square root of all the frequencies on the vertical axis and what you have is known as the Rootogram. 2. The x-axis is the response variable instead of bins used in a Histogram.
  • 43. (C) The School of Continuous Improvement v1.0 43 Scatter Plot
  • 44. Scatter Plot (C) The School of Continuous Improvement v1.0 44 Most times in projects we stumble upon the fact that x impact y. In other words, y = f(x). Now, using scatter plots, you can visually understand if there is a relationship between x and y. Let us use data for two variables – Machine downtime and production capacity for a factory to understand how does a scatter plot work. Downtime is expressed in % and Production Capacity is expressed in tons.
  • 45. Scatter Plot (C) The School of Continuous Improvement v1.0 45 Step 1 – Select the data, Click on Insert, Click on Scatter and Click on Scatter with only markers. Step 2 – Voila – you are done. There you have the scatter chart as seen here.
  • 46. Scatter Plot (C) The School of Continuous Improvement v1.0 46 Step 3 – Modification to a Regression equation This is where you can use an EDA tool as an Inferential statistics tool. Right click on any point in the graph and click on Add Trendline. Select Linear, Display equation and Display R-Square.
  • 47. Scatter Plot (C) The School of Continuous Improvement v1.0 47 Step 4 – Interpretation While the scatter graph itself visually revealed absence of any strong correlation between downtime and production capacity, the regression statistics merely confirm. The R-Square value needs to be > 0.64 for us to conclude strong correlation.
  • 48. Final Notes (C) The School of Continuous Improvement v1.0 48 1. This module covers most of the tools used in Exploratory data analysis. 2. Some other tools are: a. Parallel Coordinates b. Run Charts c. Odds Ratio d. Principal Components Analysis e. Ordination Please write into us at vishy@theschoolofci.org for usage of EDA tools if you have doubts or also follow us at Linkedin on The School of Continuous Improvement.
  • 49. (C) The School of Continuous Improvement v1.0 49 Thank you….