Data Science Process
1. Setting the research goal
1. Spend time understanding the goals and context of your research
2. Create a project charter
2. Retrieving Data
1. Data within company
a. This data can be stored in ocial data repositories such as databases, data marts, data
warehouses, and data lakes maintained by a team of IT professionals.
b. The primary goal of a database is data storage, while a data warehouse is designed for
reading and analyzing that data.
c. A data mart is a subset of the data warehouse and geared toward serving a specic business
unit. While data warehouses and data marts are home to preprocessed data, data lakes
contains data in its natural or raw format.
2. Open source data
3. Data Preparation
1. Cleansing
Why the errors should be corrected asap?
● Not everyone spots the data anomalies. Decision-makers may make costly mistakes
on information based on incorrect data from applications that fail to correct for the
faulty data.
● If errors are not corrected early on in the process, the cleansing will have to be done
for every project that uses that data.
● Data errors may point to defective equipment, such as broken transmission lines
and defective sensors.
● Data errors can point to bugs in software or in the integration of software that may
be critical to the company. While doing a small project at a bank we discovered that
two software applications used different local settings. This caused problems with
numbers greater than 1,000. For one app the number 1.000 meant one, and for the
other it meant one thousand.
Combining Data
1. Joining Tables
2. Appending Tables
3. Creating views
Data Transformation
4. EDA
5. Build the Model
Building a model is an iterative process. The way you build your model depends on
whether you go with classic statistics or the somewhat more recent machine
learning school, and the type of technique you want to use. Either way, most
models consist of the following main steps:
1 Selection of a modeling technique and variables to enter in the model
2 Execution of the model
3 Diagnosis and model comparison
6 . Presentation and automation—
Presenting your results to the stakeholders and industrializing your analysis
process for repetitive reuse and integration with other tools.

Data Science Process.pptx.pdf

  • 1.
  • 2.
    1. Setting theresearch goal 1. Spend time understanding the goals and context of your research 2. Create a project charter
  • 3.
    2. Retrieving Data 1.Data within company a. This data can be stored in ocial data repositories such as databases, data marts, data warehouses, and data lakes maintained by a team of IT professionals. b. The primary goal of a database is data storage, while a data warehouse is designed for reading and analyzing that data. c. A data mart is a subset of the data warehouse and geared toward serving a specic business unit. While data warehouses and data marts are home to preprocessed data, data lakes contains data in its natural or raw format. 2. Open source data
  • 4.
  • 6.
    Why the errorsshould be corrected asap? ● Not everyone spots the data anomalies. Decision-makers may make costly mistakes on information based on incorrect data from applications that fail to correct for the faulty data. ● If errors are not corrected early on in the process, the cleansing will have to be done for every project that uses that data. ● Data errors may point to defective equipment, such as broken transmission lines and defective sensors. ● Data errors can point to bugs in software or in the integration of software that may be critical to the company. While doing a small project at a bank we discovered that two software applications used different local settings. This caused problems with numbers greater than 1,000. For one app the number 1.000 meant one, and for the other it meant one thousand.
  • 7.
    Combining Data 1. JoiningTables 2. Appending Tables 3. Creating views
  • 8.
  • 10.
  • 11.
    5. Build theModel Building a model is an iterative process. The way you build your model depends on whether you go with classic statistics or the somewhat more recent machine learning school, and the type of technique you want to use. Either way, most models consist of the following main steps: 1 Selection of a modeling technique and variables to enter in the model 2 Execution of the model 3 Diagnosis and model comparison
  • 14.
    6 . Presentationand automation— Presenting your results to the stakeholders and industrializing your analysis process for repetitive reuse and integration with other tools.