Methodology for
Analytics and
Solution Implementation
What is analytics
Analytics is an encompassing and
multidimensional field that uses
mathematics, statistics, predictive modeling
and machine learning (extension of artificial
intelligence) techniques to find meaningful
patterns and knowledge in the recorded
data.
Rupak Roy
Why it is so important
 Analytics helps business be proactive
rather than retrospective.
 Analytics is not data-entry, data
warehousing, etl or software agents.
Analytics is an extension of statistics.
 Very important to track and measure your
results across time.
Rupak Roy
Algorithms
 Different types of machine learning
analytical algorithms used today are
Random Forest,
Logistic Regression,
Linear Regression,
K-means Clustering,
Neural Networks
Survival Analysis and many more.
Rupak Roy
How can we conduct a successful analysis ?
 By following some of the generalized proven
methodologies:
1. Business Objective
2. Determining Data Sources
3. Data preparation
4. Solution design
5. Evaluation
6. Solution monitoring
However, every organization have its own
customized methodologies behind its success.
Rupak Roy
1. Business Objectives :
The first thing is to have a clear and sound
business goal because without it is very hard to
planify the next step.
2. Determining Data Sources:
Primary Source: means someone collected
the data from the original source first hand. It is
more reliable, authentic and its objective has
not been changed or altered.
Examples: surveys, questionnaires, interviews,
observations, etc.
But cost of procuring primary data is costly and
time consuming.
Rupak Roy
Secondary source: means the data that is
already collected by and available from other
sources. It can be obtained from various sources
like published printed sources, newspapers,
media etc.
The advantages of secondary source are
inexpensive, easily accessible and immediately
available but still lacks of authenticity, incomplete
information.
So it is very advisable to choose the appropriate
source of data depending on your analysis.
Rupak Roy
3. Data preparation:
Includes data exploration where we have to
explore the data and search for any anomalies
or missing values and then prepare the data to
assemble according to the need.
4. Solution design :
So far from all the stages this is the stage
were majority of the time is spent to design an
effective solution or the methods to carry out to
solve the problem. Hence it is an iterative
process.
Rupak Roy
5. Evaluation:
Is also known as validation, where we have
to validate the model’s accuracy and its
effectiveness.
5. Solution Monitoring:
This is continuous process where we assess
and track the effectiveness of the solution
implemented over time. It is a very important
stage that is often ignored.
Rupak Roy
Solution implementation
Typical steps that can be categorized under
solution implementation are as:
1. Data exploration
2. Data preparation
3. Data partitioning
4. Model building
5. Model iteration and final model
6. Validation
7. Solution implementation
Rupak Roy
1. Data exploration: as we have already
discussed data exploration to understand
the data that will helps us to prepare the
required data for the next phase.
2. Data Preparation: were we take the
necessary data to have its sanity check and
structure the data according to the need.
3. Data partitioning: partitions the data into
training dataset and test data set where the
model is built on train dataset and assess the
accuracy of the model with test data set.
Rupak Roy
4. Model building: finally here we will build the
model using algorithms for the data that we
have processed at earlier stages.
5. Model iteration and final model: through
continuous iterations the model that performs
the best is selected as final model.
6. Validation: where we track its effectiveness
and the accuracy to ensure proper solution.
7. Solution implementation: lastly this is the stage
where the model is finally implemented in the
actual environment with constant monitoring
to check its effectiveness over time.
Rupak Roy
Next we will learn
The types of statistics.
Rupak Roy
 To be continued.
Rupak Roy

Data Science Methodology for Analytics and Solution Implementation

  • 1.
  • 2.
    What is analytics Analyticsis an encompassing and multidimensional field that uses mathematics, statistics, predictive modeling and machine learning (extension of artificial intelligence) techniques to find meaningful patterns and knowledge in the recorded data. Rupak Roy
  • 3.
    Why it isso important  Analytics helps business be proactive rather than retrospective.  Analytics is not data-entry, data warehousing, etl or software agents. Analytics is an extension of statistics.  Very important to track and measure your results across time. Rupak Roy
  • 4.
    Algorithms  Different typesof machine learning analytical algorithms used today are Random Forest, Logistic Regression, Linear Regression, K-means Clustering, Neural Networks Survival Analysis and many more. Rupak Roy
  • 5.
    How can weconduct a successful analysis ?  By following some of the generalized proven methodologies: 1. Business Objective 2. Determining Data Sources 3. Data preparation 4. Solution design 5. Evaluation 6. Solution monitoring However, every organization have its own customized methodologies behind its success. Rupak Roy
  • 6.
    1. Business Objectives: The first thing is to have a clear and sound business goal because without it is very hard to planify the next step. 2. Determining Data Sources: Primary Source: means someone collected the data from the original source first hand. It is more reliable, authentic and its objective has not been changed or altered. Examples: surveys, questionnaires, interviews, observations, etc. But cost of procuring primary data is costly and time consuming. Rupak Roy
  • 7.
    Secondary source: meansthe data that is already collected by and available from other sources. It can be obtained from various sources like published printed sources, newspapers, media etc. The advantages of secondary source are inexpensive, easily accessible and immediately available but still lacks of authenticity, incomplete information. So it is very advisable to choose the appropriate source of data depending on your analysis. Rupak Roy
  • 8.
    3. Data preparation: Includesdata exploration where we have to explore the data and search for any anomalies or missing values and then prepare the data to assemble according to the need. 4. Solution design : So far from all the stages this is the stage were majority of the time is spent to design an effective solution or the methods to carry out to solve the problem. Hence it is an iterative process. Rupak Roy
  • 9.
    5. Evaluation: Is alsoknown as validation, where we have to validate the model’s accuracy and its effectiveness. 5. Solution Monitoring: This is continuous process where we assess and track the effectiveness of the solution implemented over time. It is a very important stage that is often ignored. Rupak Roy
  • 10.
    Solution implementation Typical stepsthat can be categorized under solution implementation are as: 1. Data exploration 2. Data preparation 3. Data partitioning 4. Model building 5. Model iteration and final model 6. Validation 7. Solution implementation Rupak Roy
  • 11.
    1. Data exploration:as we have already discussed data exploration to understand the data that will helps us to prepare the required data for the next phase. 2. Data Preparation: were we take the necessary data to have its sanity check and structure the data according to the need. 3. Data partitioning: partitions the data into training dataset and test data set where the model is built on train dataset and assess the accuracy of the model with test data set. Rupak Roy
  • 12.
    4. Model building:finally here we will build the model using algorithms for the data that we have processed at earlier stages. 5. Model iteration and final model: through continuous iterations the model that performs the best is selected as final model. 6. Validation: where we track its effectiveness and the accuracy to ensure proper solution. 7. Solution implementation: lastly this is the stage where the model is finally implemented in the actual environment with constant monitoring to check its effectiveness over time. Rupak Roy
  • 13.
    Next we willlearn The types of statistics. Rupak Roy
  • 14.
     To becontinued. Rupak Roy