9. • Analytics Life Cycle
• Business Understanding
• Data Understanding
• Data Preparation
• Modeling
10.
11. • Business understanding – What does
the business need?
• Data understanding – What data do
we have / need? Is it clean?
• Data preparation – How do we
organize the data for modeling?
12. • Modeling – What modeling techniques
should we apply?
• Evaluation – Which model best meets
the business objectives?
• Deployment – How do stakeholders
access the results?
15. ensures that the project is aligned with
the business objectives
helps identify the data sources relevant
to the problem, saving time and
resources later
establishes the success criteria for the
project
20. a phase that involves gaining
familiarity with the data available for
analysis
21. • Data Collection - involves understanding
the sources, formats,and access methods
• Data Description - understanding its
structure and composition
• Data Exploration - performing initial
exploratory data
22. • Verify Data Quality - assessing quality
• Initial Insights and Hypotheses
Generation - startinh point for further
analysis
• Documentation - ensures that insights
are gained and captured
23. The primary objective of the Data Understanding
phase is to gain a comprehensive understanding
of the data available for analysis. This
understanding informs subsequent phases of the
CRISP-DM methodology, particularly data
preparation and modeling.
24.
25. Data preparation is an important
step in data analytics. It aims at
assessing and improving the
quality of data for secondary
statistical analysis. With this, the
data is better understood and
the data analysis is performed
more accurately and efficiently.
26. TASKS FOR DATA PREPARATION
Data Cleaning Data Integration
Data
Transformation
Data Reduction
This step deals with
missing data, noise,
outliers, and correct
inconsistencies of the
data making sure that
it is accurate and
correct.
It is a process of combining
data derived from various
data sources(such as
database, flat files, etc.)
into a consistent data set
for both operational and
analytical.
It aims to transform
the data values into a
format, scale, or unit
that is more suitable
for analysis.
Data reduction is a
process of obtaining a
reduced representation of
the data set that is
much smaller in volume
but yet produce the same
(or almost the same)
analytical results.
27.
28.
29. FOUR TASK:
• Selecting modeling techniques
• Generating Design Model
• Building model(s)
• Assessing model(s)
30. As the first step in modeling, select the actual
modeling technique that is to be used initially.
During the business understanding stage, you may
already have picked a tool.
31. Deliverables for this task include two reports:
• Modeling technique: refers to the actual
modeling technique that is used.
• Modeling assumptions: specific assumptions
about the data, data quality or the data format.
32. Prior to building a model, a procedure
needs to be defined to test the model’s
quality and validity.
34. Run the modelling tool on the prepared dataset to create one
or more models.
• Parameter settings – With any modelling tool there are often a
large number of parameters that can be adjusted.
• Models – These are the actual models produced by the modelling
tool, not a report on the models.
• Model descriptions – Describe the resulting models, report on
the interpretation of the models and document any difficulties
encountered with their meanings.
35. At this stage you should rank the models and assess
them according to the evaluation criteria.
In most data mining projects a single technique is
applied more than once and data mining results are
generated with several different techniques.
• Model assessment
• Revised parameter settings
46. Any of these:
• Improved project
outcomes
• Reduced risk
• Efficient use of
resources
• Improved
communication
47. Any of these:
• Identifying the Business
Problem
• Defining project objectives
• Determining success criteria
• Assessing project feasibility
• Identifying data sources