Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
(CentreforKnowledgeTransfer)
institute
DISCUSSIONTOPICS
 Data Analytics Lifecycle
 Importance of Data Analytics Lifecycle
 Phase 1: Discovery
 Phase 2: Data Preparation
 Phase 3: Model Planning
 Phase 4: Model Building
 Phase 5: Communication Results
 Phase 6: Operationalize
 Data Analytics Lifecycle Example
(CentreforKnowledgeTransfer)
institute
DATA ANALYTICS LIFECYCLE
 The Data analytic lifecycle is designed for Big Data problems
and data science projects.
 The cycle is iterative to represent real project.
 To address the distinct requirements for performing analysis
on Big Data, step – by – step methodology is needed to
organize the activities and tasks involved with
 acquiring,
 processing,
 analyzing, and
 repurposing data.
(CentreforKnowledgeTransfer)
institute
IMPORTANCE OF DATA ANALYTICS
LIFECYCLE
 DataAnalytics Lifecycle defines the roadmap of how data is generated, collected,
processed, used, and analyzed to achieve business goals.
 It offers a systematic way to manage data for converting it into information that
can be used to fulfil organizational and project goals.
 The process provides the direction and methods to extract information from the
data and proceed in the right direction to accomplish business goals.
 Data professionals use the lifecycle’s circular form to proceed with data analytics
in either forward or backward direction.
 Based on the newly received insights, they can decide whether to proceed with
their existing research or scrap it and redo the complete analysis.
 The Data Analytics lifecycle guides them throughout this process.
(CentreforKnowledgeTransfer)
institute
PHASE 1: DISCOVERY
 The data science team learn and investigate the problem.
 Develop context and understanding.
 Come to know about data sources needed and available for the project.
 The team formulates initial hypothesis that can be later tested with data.
(CentreforKnowledgeTransfer)
institute
PHASE 2: DATA PREPARATION
 Steps to explore, preprocess, and condition data prior to modeling and analysis.
 It requires the presence of an analytic sandbox, the team execute, load, and
transform, to get data into the sandbox.
 Data preparation tasks are likely to be performed multiple times and not in
predefined order.
 Several tools commonly used for this phase are – Hadoop,Alpine Miner, Open
Refine, etc.
(CentreforKnowledgeTransfer)
institute
PHASE 3: MODEL PLANNING
 Team explores data to learn about relationships between variables and
subsequently, selects key variables and the most suitable models.
 In this phase, data science team develop data sets for training, testing, and
production purposes.
 Team builds and executes models based on the work done in the model planning
phase.
 Several tools commonly used for this phase are – Matlab, STASTICA
(CentreforKnowledgeTransfer)
institute
PHASE 4: MODEL BUILDING
 Team develops datasets for testing, training, and production purposes.
 Team also considers whether its existing tools will suffice for running the models
or if they need more robust environment for executing models.
 Free or open-source tools – Rand PL/R, Octave,WEKA.
 Commercial tools – Matlab , STASTICA.
(CentreforKnowledgeTransfer)
institute
PHASE 5: COMMUNICATION RESULTS
 After executing model team need to compare outcomes of modeling to criteria
established for success and failure.
 Team considers how best to articulate findings and outcomes to various team
members and stakeholders, taking into account warning, assumptions.
 Team should identify key findings, quantify business value, and develop narrative
to summarize and convey findings to stakeholders
(CentreforKnowledgeTransfer)
institute
PHASE 6: OPERATIONALIZE
 The team communicates benefits of project more broadly and sets up pilot
project to deploy work in controlled way before broadening the work to full
enterprise of users.
 This approach enables team to learn about performance and related constraints
of the model in production environment on small scale , and make adjustments
before full deployment.
 The team delivers final reports, briefings, codes.
 Free or open source tools – Octave,WEKA, SQL, MADlib.
(CentreforKnowledgeTransfer)
institute
DATA ANALYTICS LIFECYCLE EXAMPLE
 Consider an example of a retail store chain that wants to optimize its products’ prices for boosting its
revenue.
 The store chain has thousands of products over hundreds of outlets, making it a highly complex scenario.
 Once you identify the store chain’s objective, you find the data you need, prepare it, and go through the
Data Analytics lifecycle process.
 You observe different types of customers, such as ordinary customers and customers like contractors who
buy in bulk.
 According to you, treating various types of customers differently can give you the solution.
 However, you don’t have enough information about it and need to discuss this with the client team.
 In this case, you need to get the definition, find data, and conduct the hypothesis testing to check whether
various customer types impact the model results and get the right output.
 Once you are convinced with the model results, you can deploy the model, integrate it into the business,
and you are all set to deploy the prices you think are the most optimal across the outlets of the store.

Data Analytics Life Cycle

  • 1.
    Dr. C.V. SureshBabu (CentreforKnowledgeTransfer) institute
  • 2.
    (CentreforKnowledgeTransfer) institute DISCUSSIONTOPICS  Data AnalyticsLifecycle  Importance of Data Analytics Lifecycle  Phase 1: Discovery  Phase 2: Data Preparation  Phase 3: Model Planning  Phase 4: Model Building  Phase 5: Communication Results  Phase 6: Operationalize  Data Analytics Lifecycle Example
  • 3.
    (CentreforKnowledgeTransfer) institute DATA ANALYTICS LIFECYCLE The Data analytic lifecycle is designed for Big Data problems and data science projects.  The cycle is iterative to represent real project.  To address the distinct requirements for performing analysis on Big Data, step – by – step methodology is needed to organize the activities and tasks involved with  acquiring,  processing,  analyzing, and  repurposing data.
  • 4.
    (CentreforKnowledgeTransfer) institute IMPORTANCE OF DATAANALYTICS LIFECYCLE  DataAnalytics Lifecycle defines the roadmap of how data is generated, collected, processed, used, and analyzed to achieve business goals.  It offers a systematic way to manage data for converting it into information that can be used to fulfil organizational and project goals.  The process provides the direction and methods to extract information from the data and proceed in the right direction to accomplish business goals.  Data professionals use the lifecycle’s circular form to proceed with data analytics in either forward or backward direction.  Based on the newly received insights, they can decide whether to proceed with their existing research or scrap it and redo the complete analysis.  The Data Analytics lifecycle guides them throughout this process.
  • 5.
    (CentreforKnowledgeTransfer) institute PHASE 1: DISCOVERY The data science team learn and investigate the problem.  Develop context and understanding.  Come to know about data sources needed and available for the project.  The team formulates initial hypothesis that can be later tested with data.
  • 6.
    (CentreforKnowledgeTransfer) institute PHASE 2: DATAPREPARATION  Steps to explore, preprocess, and condition data prior to modeling and analysis.  It requires the presence of an analytic sandbox, the team execute, load, and transform, to get data into the sandbox.  Data preparation tasks are likely to be performed multiple times and not in predefined order.  Several tools commonly used for this phase are – Hadoop,Alpine Miner, Open Refine, etc.
  • 7.
    (CentreforKnowledgeTransfer) institute PHASE 3: MODELPLANNING  Team explores data to learn about relationships between variables and subsequently, selects key variables and the most suitable models.  In this phase, data science team develop data sets for training, testing, and production purposes.  Team builds and executes models based on the work done in the model planning phase.  Several tools commonly used for this phase are – Matlab, STASTICA
  • 8.
    (CentreforKnowledgeTransfer) institute PHASE 4: MODELBUILDING  Team develops datasets for testing, training, and production purposes.  Team also considers whether its existing tools will suffice for running the models or if they need more robust environment for executing models.  Free or open-source tools – Rand PL/R, Octave,WEKA.  Commercial tools – Matlab , STASTICA.
  • 9.
    (CentreforKnowledgeTransfer) institute PHASE 5: COMMUNICATIONRESULTS  After executing model team need to compare outcomes of modeling to criteria established for success and failure.  Team considers how best to articulate findings and outcomes to various team members and stakeholders, taking into account warning, assumptions.  Team should identify key findings, quantify business value, and develop narrative to summarize and convey findings to stakeholders
  • 10.
    (CentreforKnowledgeTransfer) institute PHASE 6: OPERATIONALIZE The team communicates benefits of project more broadly and sets up pilot project to deploy work in controlled way before broadening the work to full enterprise of users.  This approach enables team to learn about performance and related constraints of the model in production environment on small scale , and make adjustments before full deployment.  The team delivers final reports, briefings, codes.  Free or open source tools – Octave,WEKA, SQL, MADlib.
  • 11.
    (CentreforKnowledgeTransfer) institute DATA ANALYTICS LIFECYCLEEXAMPLE  Consider an example of a retail store chain that wants to optimize its products’ prices for boosting its revenue.  The store chain has thousands of products over hundreds of outlets, making it a highly complex scenario.  Once you identify the store chain’s objective, you find the data you need, prepare it, and go through the Data Analytics lifecycle process.  You observe different types of customers, such as ordinary customers and customers like contractors who buy in bulk.  According to you, treating various types of customers differently can give you the solution.  However, you don’t have enough information about it and need to discuss this with the client team.  In this case, you need to get the definition, find data, and conduct the hypothesis testing to check whether various customer types impact the model results and get the right output.  Once you are convinced with the model results, you can deploy the model, integrate it into the business, and you are all set to deploy the prices you think are the most optimal across the outlets of the store.