Epson has developed a toolkit to help users analyze data and make decisions. The toolkit outlines a 5-step process: 1) define the problem and data collection plan, 2) collect and clean the data, 3) interpret the data, 4) develop recommendations, and 5) monitor improvements. It also provides guidance on descriptive statistics, data relationships, grouping data, and identifying trends to analyze problems. The overall goal is to help users turn data into actionable insights and impactful decisions.
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Data Analysis Toolkit_Final v1.0
1. 1EAI Confidential
Epson Data Analysis and
Decision-Making Toolkit
An unwavering commitment to drive innovation and performance
EPSON
INNOVATION
ENGINEVersion 1.0 – January, 2016
2. Data Analysis and Decision Making Process
Define Problem and
Data Collection Plan
Collect, Validate and
Clean Data
Interpret the Data
Develop
Recommendations
or Make Decisions
• Define the business
problem
• Select exploratory or
hypothesis-driven
approach
• Define problem
statement or hypothesis
to test
• Identify sources of data /
information to explore or
test the hypothesis
• Think with the end in
mind
• Collect the data in the
format needed for
analysis
• Validate and test the
data – make sure it is
correct
• Clean data and format
as required – most
commonly as a flat file
that can be used in
Excel
• Interpret the data (e.g.
whether it supports or
does not support your
hypothesis)
• This step may be
iterative; however, it will
be insightful
• Once confident in
interpretation, brainstorm
ways to improve the
situation (further analysis
may be needed)
• Visually depict data to
support your conclusions
• Create recommendations
or alternatives if required
• Make decisions and
proceed with experiment
– fail fast and adjust
• Monitor any improvement
by continually collecting
and measuring data on
the process or problem
Alignment with problem-solving (DMAIC) phases:
Below depicts how to develop a plan to turn data into actionable or impactful results.
Define Measure Analyze Improve
3. Discovery (Exploratory) vs Hypothesis-driven Research
Past Present
Deviation
Actual Performance
Expected/Desired Performance
Investigation approaches:
• Hypothesis-based method: “I have an idea, let me verify.”
• Begins with a proposition by the user, who then seeks to validate the truthfulness of the
proposition. Click here to learn more.
• Discovery-based method: “I have no clue, let me explore.”
• Finds patterns, associations, and relationships among the data in order to uncover facts that
were previously unknown or not even contemplated by an organization. Click here to learn more.
Often, some preliminary research (discovery) is needed in order to create a hypothesis.
A problem is a deviation from a standard or expectation:
4. Hypothesis-Testing
Symptoms
- Low energy
- Headaches
- Fever
Impact
- Can’t do my job
- Can’t exercise
- Can’t take care of family
Hypotheses
- Cold?
- Flu?
- Tuberculosis?
Potential Causes
Hypothesis Testing
Virus A
Virus B
Virus C
Root Causes
2. Analysis &
Interpretation
1. Pain &
Suffering
3. Testing &
Proof
Symptoms
5. Hypothesis-Testing with Logic Trees
Epson can increase
selling time as a
proportion of total
available time
Epson can increase
sales volumes from
available selling time
How can Epson
increase sales-force
productivity?
Epson can transfer
or outsource many
non-value-added
tasks (e.g. admin)
Epson can reduce or
eliminate many non-
value-added tasks
(e.g. travel, error
correction)
Epson can improve
generation of sales
leads
Epson can improve
the proportion of
leads converted to
sales
Epson can improve
the sales conversion
skills of the sales
force
Epson can provide
the sales force with
better tools for lead
conversion
1. State the
problem
2. Generate
hypotheses
3. Keep
decomposing
them
Postulate an overall hypothesis as to the solution,
with the minimum efficient rationale to validate it.
Each hypothesis:
• Can be proven right or wrong
• Is not obvious
• Points directly to an action or actions you can take
4. Prioritize
them for
analysis
6. Descriptive Statistics Data Relationships: Scatter Plot & Correlation
Summarize large amounts of
data so that the main features
of the data can be easily
understood.
Identifies and visually displays
the relationship between two
variables.
Data Collection Techniques Data Grouping: 2x2 Matrix
Systematically gather
information to be analyzed in
order to develop a deeper
understanding of an issue.
Categorizes items into a 2x2
matrix using two variables in
order to clarify the desirability
of options and simplify
decision making.
Data Distribution: Histogram & Pareto Chart Data Trends: Trend (Run) Charts
Graphically displays data
grouped into ranges or
categories so that the
frequency/quantity of each one
can be better analyzed.
Graphically displays data over
time to identify process trends,
cycles, changes/shifts,
abnormalities, or problems.
Our Core Data Analysis and Visualization Tools
7. Descriptive Statistics
Why use these tools?
To synthesize large amounts of data so that it can be presented in a quantitative, easy to
understand manner (either numerically or graphically). The most popular descriptive statistics
show the central tendency (mean, median, and mode) and the spread of the data (standard
deviation and variance). Note that descriptive statistics only describe/summarize data. More
advanced statistics are needed for hypothesis testing.
What results you can expect?
• Make data easier to understand
and share
• Develop a deeper understanding of
the issue(s) at hand
• Point the direction for further
investigation and analysis
• Provide a basis for more advanced
statistical analysis
Further Learning
• Creating Descriptive Statistics in Excel
| LEARN MORE
8. Data Collection Techniques
Why use this tool?
To systematically gather information in order to develop a deeper understanding of an issue and
answer relevant questions. Typical data collection methods include: observations (note taking and
check sheets), surveys (questionnaires), interviews, and focus groups.
What results you can expect?
• Develop a better understanding of
the issue from new perspectives
• Identify and validate beliefs
• Generate and test hypotheses
• Discover previously unknown
factors
• Improve the probability of
developing effective solutions
Further Learning
• Surveys
• Check Sheet
• Qualitative vs Quantitative Data
| LEARN MORE
9. Data Distribution: Histogram
Why use this tool?
To graphically illustrate a distribution of numerical data by grouping data into ranges (bins) with the
frequencies shown as vertical bars. Histograms are frequently used when there is a large data set.
What results you can expect?
• Graphically display the distribution
of a data set
• Quickly identify ranges with
unusually high or low frequencies
• Point the direction for further
research
• Communicate data to stakeholders
in a simple format
Further Learning
• Dot Plots
• Box & Whisker Plots
• Comparing dot plots, histograms,
and box plots
• Creating Histograms in Excel
| LEARN MORE
10. Data Distribution: Pareto Chart
Why use this tool?
To graphically identify the key issues of a problem by following the 80/20 rule: 80% of the effects
can often be attributed to 20% of the causes. The Pareto chart is a combination histogram/bar
chart and line chart. The histogram/bar chart shows the frequency of the items/events in
descending order of magnitude while the line chart shows the cumulative frequency.
What results you can expect?
• Identify the major issues that need
to be addressed or further
investigated
• Focus analysis where it will have
the greatest impact
• Communicate data to stakeholders
in a simple format
Further Learning
• Creating Pareto Charts in Excel
| LEARN MORE
11. Data Relationships: Scatter Plot (Diagram)
Why use this tool?
To graphically display the relationship between two variables. Each variable is plotted on one axis
of an XY plot. If the variables are correlated (i.e. a relationship exists), the points will form a
pattern. The shape of the pattern indicates the type of relationship between the variables. More
well defined patterns indicate stronger relationships.
What results you can expect?
• Indicate the type and strength of
relationship between two variables
• Eliminate unimportant variables
from further analysis
• Communicate data to stakeholders
in a simple format
Further Learning
• Correlation Analysis
• Regression Analysis
• Creating Scatter Plots in Excel
| LEARN MORE
Vehicle Price vs Age
12. Data Grouping: 2x2 Matrix
Why use these tools?
To categorize items using two data variables in order to clarify the options and simplify decision
making. Generally, the matrix is structured so that the least desirable options fall into the lower left
quadrant and the most desirable options fall into the upper right quadrant.
What results you can expect?
• Rapidly sort options into categories
to facilitate decision making
• Organize data into memorable
categories or groups
• Assess the situation using more
than a single variable
• Communicate data to stakeholders
in a simple format
Further Learning
• Cluster Analysis
13. Data Trends: Trend (Run) Charts
Why use these tools?
To graphically display time series data (data sequenced over time). The horizontal axis displays
time and the vertical axis displays the values of the data. Trend/run charts are often used to
identify process trends, cycles, changes/shifts, abnormalities, or problems.
What results you can expect?
• Identify trends, changes, or
abnormalities with processes
• Increase the understanding of
processes
• Determine if a process change
resulted in improved process
performance
• Determine if improved performance
has been maintained
• Monitor and compare processes
Further Learning
• Statistical Process Control
• Control Charts
• Creating Trend/Run Charts in Excel
| LEARN MORE
14. Sources
Topic Link
Descriptive Statistics
https://www.youtube.com/watch?v=Mpl_v96dlfg
https://www.khanacademy.org/math/probability/descriptive-statistics
https://www.youtube.com/watch?v=MhDH9jsyzBA
Data Collection Techniques
http://www.sciencebuddies.org/science-fair-projects/project_ideas/Soc_survey.shtml
http://qualityamerica.com/LSS-Knowledge-
Center/qualityimprovementtools/check_sheets.php
http://regentsprep.org/regents/math/algebra/ad1/qualquant.htm
http://blog.socialcops.com/resources/4-data-collection-techniques-ones-right
https://www.youtube.com/watch?v=B2nmh_kEF98
Data Distribution: Histogram
https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/dot-
plot/v/frequency-tables-and-dot-plots
https://www.khanacademy.org/math/probability/descriptive-statistics/box-and-
whisker-plots/v/reading-box-and-whisker-plots
https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/cc-
7th-compare-data-displays/v/comparing-dot-plots-histograms-and-box-plots
https://www.youtube.com/watch?v=YYRkWKJIc9k
https://www.moresteam.com/toolbox/histogram.cfm
https://www.youtube.com/watch?v=gSEYtAjuZ-Y
Data Distribution: Pareto Chart
https://www.youtube.com/watch?v=i_XZzady-dQ
http://asq.org/learn-about-quality/cause-analysis-tools/overview/pareto.html
https://www.youtube.com/watch?v=GVGdtlnZ7xM
15. Sources
Topic Link
Data Relationships: Scatter Plot (Diagram)
https://explorable.com/statistical-correlation
https://www.moresteam.com/toolbox/regression-analysis.cfm
https://www.youtube.com/watch?v=uvJNfRmfAys
http://asq.org/learn-about-quality/cause-analysis-tools/overview/scatter.html
https://www.youtube.com/watch?v=CWnfwZRAuaY
Data Grouping: 2x2 Matrix
https://www.youtube.com/watch?v=zqKFH7WNmfE
https://www.youtube.com/watch?v=PLr3CT79pSc
Data Trends: Trend (Run) Charts
https://www.moresteam.com/toolbox/statistical-process-control-spc.cfm
http://asq.org/learn-about-quality/data-collection-analysis-tools/overview/control-
chart.html
https://www.youtube.com/watch?v=jWlM9z8iFZI
https://www.moresteam.com/toolbox/trend-chart.cfm
https://www.youtube.com/watch?v=YQd1QoMHYwU