SlideShare a Scribd company logo
DSA – 105 Introduction to
Data Science
Week 3 – Steps involved in Data Science
Ferdin Joe John Joseph, PhD
Faculty of Information Technology
Thai-Nichi Institute of Technology
Week 3
Agenda
• Steps involved in Data Science
Faculty of Information Technology, Thai - Nichi Institute of
Technology
2
Process in Data Science Life Cycle (DSLC)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
3
DSLC
• Business understanding
• Data acquisition and understanding
• Modeling
• Deployment
• Customer acceptance
Faculty of Information Technology, Thai - Nichi Institute of
Technology
4
Business Understanding
Faculty of Information Technology, Thai - Nichi Institute of
Technology
5
Data Acquisition and Understanding
Faculty of Information Technology, Thai - Nichi Institute of
Technology
6
Data Modelling
Faculty of Information Technology, Thai - Nichi Institute of
Technology
7
Data Modelling (Contd)
Types of Data Models
• Conceptual: This Data Model defines WHAT the system contains. This
model is typically created by Business stakeholders and Data Architects.
The purpose is to organize, scope and define business concepts and rules.
• Logical: Defines HOW the system should be implemented regardless of the
DBMS. This model is typically created by Data Architects and Business
Analysts. The purpose is to developed technical map of rules and data
structures.
• Physical: This Data Model describes HOW the system will be implemented
using a specific DBMS system. This model is typically created by DBA and
developers. The purpose is actual implementation of the database.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
8
Advantages and Disadvantages of Data Model
Advantages of Data model:
• The main goal of a designing data model is to make certain that data objects offered by the functional team are represented
accurately.
• The data model should be detailed enough to be used for building the physical database.
• The information in the data model can be used for defining the relationship between tables, primary and foreign keys, and stored
procedures.
• Data Model helps business to communicate the within and across organizations.
• Data model helps to documents data mappings in ETL process
• Help to recognize correct sources of data to populate the model
Disadvantages of Data model:
• To developer Data model one should know physical data stored characteristics.
• This is a navigational system produces complex application development, management. Thus, it requires a knowledge of the
biographical truth.
• Even smaller change made in structure require modification in the entire application.
• There is no set data manipulation language in DBMS.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
9
Data Model - Nutshell
• Data modeling is the process of developing data model for the data to be stored in a Database.
• Data Models ensure consistency in naming conventions, default values, semantics, security while
ensuring quality of the data.
• Data Model structure helps to define the relational tables, primary and foreign keys and stored
procedures.
• There are three types of conceptual, logical, and physical.
• The main aim of conceptual model is to establish the entities, their attributes, and their
relationships.
• Logical data model defines the structure of the data elements and set the relationships between
them.
• A Physical Data Model describes the database specific implementation of the data model.
• The main goal of a designing data model is to make certain that data objects offered by the
functional team are represented accurately.
• The biggest drawback is that even smaller change made in structure require modification in the
entire application.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
10
Data Vs Meta Data
Faculty of Information Technology, Thai - Nichi Institute of
Technology
11
Data Model Definition
Faculty of Information Technology, Thai - Nichi Institute of
Technology
12
Data Model Representation
Faculty of Information Technology, Thai - Nichi Institute of
Technology
13
Representation of Data Model
Faculty of Information Technology, Thai - Nichi Institute of
Technology
14
Scope of Data Modelling
Faculty of Information Technology, Thai - Nichi Institute of
Technology
15
Data Model Aspects
• Business
• Technical
Faculty of Information Technology, Thai - Nichi Institute of
Technology
16
Data Model Aspects: Business
Faculty of Information Technology, Thai - Nichi Institute of
Technology
17
Data Model Aspects: Technical
Faculty of Information Technology, Thai - Nichi Institute of
Technology
18
Levels of Data Models
• Logical
• Enterprise
• Conceptual
• Physical
Faculty of Information Technology, Thai - Nichi Institute of
Technology
19
Logical Data Modelling Components
Faculty of Information Technology, Thai - Nichi Institute of
Technology
20
Logical Data Modelling Components
Faculty of Information Technology, Thai - Nichi Institute of
Technology
21
Faculty of Information Technology, Thai - Nichi Institute of
Technology
22
Steps in Data Science Process
Faculty of Information Technology, Thai - Nichi Institute of
Technology
23
Define the Project Objective
• Goal: Clearly and explicitly specifying the model target as a sharp
question which is use to drive the customer engagement.
• Responsibility: This will be customer driven to maximize business value,
with guidance from the data science team to make the end objective
answerable and actionable.
• The first step towards a successful data science project is to define the
question we are interested in answering. This is where we define a
hypothesis we’d like to test, or the objective of the project. It helps to
describe what the expected end result of the engagement would be, so
that we can use these results to add business value.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
24
Define the Project Objective
• A key component of successful data science projects is defining the project objective with a sharp
question. A sharp question is well defined and can be answered with a name or number.
Remember that data science can only be used to answer five different types of questions:
How much or how many? (regression)
Which category? (classification)
Which group? (clustering)
Is this weird? (anomaly detection)
Which option should be taken? (recommendation)
• The type (or class) of the question restricts and informs the following:
Which algorithms the data scientist can use to address the problem.
How to measure the algorithms accuracy.
Data requirements.
A success metric is typically determined by which question is asked. The metric is defined by how
we measure accuracy within that question class. Once we have an idea of the measure, we can
discuss what success would look like in terms of this metric.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
25
Deliverable
• Deliverable: Project Objective This is usually a single-page document
clearly stating the question of interest and how the expected answer
will look. The document should also include some criteria for
customer acceptance of the final solution and an expected
implementation of the solution.
• We can think of this as an initial contract that defines the customer
expectations in terms of an achievable end point of the engagement.
This is often an exercise that is completed in collaboration between
the customer and data science team. This deliverable will prove to be
valuable as it encourages customer engagement in the process.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
26
Identifying Data Sources
Goal: Clearly specifying where to find the data sources of interest. Define the machine
learning target in this step and determine if we need to bring in ancillary data from other
sources.
Responsibility: Typically, the customer comes with data in hand. With a sharp question, the
data science team can begin formulating an answer by locating the data required to answer
that question.
Just because we have a lot of data does not mean we will use it all, or that it contains all
that we need to answer the question. In addition, all data sources are not equally helpful in
answering the specific question of interest. We are looking for:
• Data that is Relevant to the question. Do we have measures of the target and features
that are related to the target?
• Data that is an Accurate measure of our model target and the features of interest.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
27
Identifying Data Sources
We are typically using data sources that are collected for reasons other than
answering our specific question. This means we are collecting data sources
opportunistically, so some information that could be extremely helpful in
answering the question may not have been collected. We also are not
controlling the environment of observations, which means we are only able
to determine correlations between collected information and the outcome
of interest, not specific causal inferences.
Deliverable: Data Sources Usually a single-page document clearly stating
where the data resides. This could include one or more data sources and
possibly the associated entity-relation diagrams. This document should also
include the target variable definition.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
28
Initial Data Exploration
Goal: To determine if the data we have can be used to answer the question.
If not, we may need to collect more data.
Responsibility: Data science team begins to evaluate the data.
Once we know where to find the data, this initial pass will help us determine
the quality of the data provided to answer the question. Here we are looking
to determine if the data is:
• Connected to the target.
• Large enough to move forward.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
29
Initial Data Exploration
• At this point graphical methods are extremely helpful. Have we measured the features
consistently enough for them to be useful or are there a lot of missing values in the data?
Has the data been consistently collected over the time period of interest or are there
blocks of missing observations? If the data does not pass this quality check, we may need
to go back to the previous step to correct or get more data.
• We also need enough observations to build a meaningful model and enough features for
our methods to differentiate between different observations. If we’re trying to
differentiate between groups or categories, are there enough examples of all possible
outcomes?
• The initial data exploration step (step 3) is done in parallel with identifying data
sources (step 2). As we determine if the data is connected or if we have enough data, we
may need to find new data sources with more accurate or more relevant data to
complete the data set initially identified in step 2.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
30
Initial Data Exploration
Deliverables: Data Exploration This step should produce the initial draft of the following documents:
Exploratory Data Analysis Report: A document detailing data requirements, quality (accuracy, connectedness)
and relevance to the target and the ability to answer the question of interest. It is best to use graphical
methods to clearly show data features in an understandable way. Additionally, we should have an idea if there
enough data to answer the question of interest with some confidence in the end result.
Analytics Architecture Diagram (initial draft): With the data sources in hand, we can start to define how the
machine learning pipeline will work? How often will the data sources be updated? What actions should be
taken on those updates? Is there a retraining criteria as we collect and label new observations? Documenting
this now can help us define and capture the required artifacts for use in later steps.
Checkpoint Decision
Before we begin to do the full feature engineering and model building process, we can reevaluate the project
to determine value in continuing this effort. We may be ready to proceed, need to collect more data, or it’s
possible the data does not exist to answer the question.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
31
Construction of Analysis Data
Goal: Construct the analysis data set, with associated feature
engineering, for building the machine learning model.
Responsibility: Data science team usually made up of data engineers,
experts in getting data from disparate sources, and data scientists
performing additional quality and quantity checks.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
32
Construction of Analysis Data
The analysis data set is defined by the following:
Inclusion/Exclusion criteria: Evaluate observations on multiple levels to determine if they are part of the
population of interest. Are they connected in time? Are there observations that are missing large chunks of
information? We look at both business reasons and data quality reasons for observation inclusion/exclusion
criteria.
Feature engineering involves inclusion, aggregation and transformation of raw variables to create the features
used in the analysis. If we want insight into what is driving the model, then we need to take care in how
features are related to each other, and how the machine learning method will be using those features. This is a
balancing act of including informative variables without including too many unrelated variables. Informative
variables will improve our result; unrelated variables will introduce unnecessary noise into the model.
Avoid leakage: Leakage is caused by including variables that can perfectly predict the target. These are usually
variables that may have been used to detect the target initially. As the target is redefined, these dependencies
can be hidden from the original definition. To avoid this often requires iterating between building an analysis
data set, and creating a model and evaluating the accuracy. Leakage is a major reason data scientists get
nervous when they get really good predictive results.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
33
Construction of Analysis Data
Deliverable: Feature Engineering
This step produces the following initial draft artifacts:
• The analysis data set itself, which will be used to train and test the machine learning
model in the next step.
• A document describing the feature engineering required to construct the analysis data
set.
The source code to build the analysis data set, including queries or other source code to
produce the model features and the model targets. The model features should be held
separate from the target calculations for use when predicting on new observations in a
production setting. This artifact will be directly used in the production pipeline of step 7.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
34
Machine Learning Model
Goal: Answer the question by constructing and evaluating an
informative model to predict the target.
Responsibility: Data science.
After a large amount of data specific work, we are now ready to start
building a model. This machine learning step is often executed in
parallel with constructing the analysis data set as information from our
model can be used to build better features in the analysis data set.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
35
Machine Learning Model
The process involves:
• Splitting analysis data into training and testing data sources.
• Evaluate (training and testing) a series of competing machine learning
methods that are geared toward answering the question of interest
with the data we currently have at hand.
• Determine the “best” solution to answer the question by comparing
the success metric between alternative methods.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
36
Machine Learning Model
Deliverables: Machine Learning
• The machine learning model which can be used to predict the target for new
observations. This artifact will be directly used in the production pipeline of step 7.
• A document describing the model, how to use the model and findings from the
modelling process. What do these initial results look like? What do these tell us about
our hypotheses and about the data we are using? Additionally, we can define
visualizations of the model results here.
Checkpoint Decision
• Again, we can reevaluate if moving on to a production system here. Does the model
answer the question sufficiently given the test data? Should we go back and collect more
data (step 2) or change how the data is being used (step 4)?
Faculty of Information Technology, Thai - Nichi Institute of
Technology
37
Validation and Customer Acceptance
Goal: To finalize the machine learning deliverable by confirming the
model and the evidence for the model acceptance.
Responsibility: Customer focused evaluation of the project artifacts.
In order to get to this point, the data science team has some
confidence that the project has progressed in answering the question
of interest. The answer may not be perfect, but given the data sources,
data exploration, the analysis data set, and the machine learning
model, the data science team has some estimates of the ability and
accuracy of the model attaining the project objective.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
38
Validation and Customer Acceptance
This step formalizes the delivery of the engagement artifacts and results to the customer for final review before
committing to building out the production pipeline. The customer can then determine if the model meets the
success metrics and whether the production pipeline would add business value.
Deliverable:
The following finalized documents and artifacts from each of the project milestones:
• Project Objective (step 1)
• Data Sources (step 2)
• Data Exploration (step 3)
• Feature Engineering (step 4).
• Machine Learning (step 5)
Faculty of Information Technology, Thai - Nichi Institute of
Technology
39
Validation and Customer Acceptance
Checkpoint Decision
For the most part, the customer should be familiar with all of these
deliverables, and be aware of the current state of the project
throughout the process. The validation and customer acceptance step
gives the customer a change to evaluate the validity and value of the
data science solution from a business perspective, before committing
to continue with the production implementation.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
40
Production Pipeline Implementation
Goal: Implement the full process to use the model and insights
obtained from the engagement. The pipeline is the actual delivery of
the business value to the customer.
Responsibility: The data science team, typically data engineers building
out the system described initially in the initial data exploration step.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
41
Production Pipeline Implementation
Deliverable: The deliverable here is defined by how the customer
intends on using the results of this engagement. This could and should
include delivery of actionable insights obtained throughout the
engagement. These insights can be delivered through:
Data and machine learning visualizations.
Operationalized data/machine learning pipeline to predict outcomes on
new observations as they become available.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
42
Goals of Data Science Process
• The goal of this process is to continue to move a data science project
forward towards a clear engagement end point.
• We recognize that data science is a research activity and that progress
often entails an approach that moves two steps forward and one step
(or worse) backwards.
• Being able to clearly communicate this to customers can help avoid
misunderstanding and frustration for all parties involved, and increase
the odds of success.
Faculty of Information Technology, Thai - Nichi Institute of
Technology
43
Activity
• Perform Data Science Process on Olympic medal tally for events post
WW2
Faculty of Information Technology, Thai - Nichi Institute of
Technology
44
Next Week…
• Tools and Technologies in Data Science
Faculty of Information Technology, Thai - Nichi Institute of
Technology
45

More Related Content

What's hot

Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
Arnab Majumdar
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
swethaT16
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
Introduction to Data Science - Week 2 - Predictive Analytics
Introduction to Data Science - Week 2 - Predictive AnalyticsIntroduction to Data Science - Week 2 - Predictive Analytics
Introduction to Data Science - Week 2 - Predictive Analytics
Ferdin Joe John Joseph PhD
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
Peter Kua
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
Alex Rayón Jerez
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
LuisaFernandaParraTabares
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
bodaceacat
 
Data Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-MakingData Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-Making
Training Industry Conference & Expo
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
Mark West
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
ShilpaKrishna6
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
LOS BANOS NATIONAL HIGH SCHOOL
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
Mahesh Kumar CV
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
bhavesh lande
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
SwapnilDahake2
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
Spotle.ai
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Vala Ali Rohani
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Anastasiia Kornilova
 

What's hot (20)

Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Introduction to Data Science - Week 2 - Predictive Analytics
Introduction to Data Science - Week 2 - Predictive AnalyticsIntroduction to Data Science - Week 2 - Predictive Analytics
Introduction to Data Science - Week 2 - Predictive Analytics
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Data Analytics.03. Data processing
Data Analytics.03. Data processingData Analytics.03. Data processing
Data Analytics.03. Data processing
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-MakingData Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-Making
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
BIG DATA ANALYTICS-OPPORTUNITIES,CHALLENGES AND THE FUTURE:CERTIFICATE OF ACH...
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 

Similar to Introduction to Data Science - Week 3 - Steps involved in Data Science

2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
Ferdin Joe John Joseph PhD
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
VamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
saitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
SaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science training
DIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
VamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
VamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
KumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
Nithinsunil1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
KumarNaik21
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
AkhilGGM
 
Challenges in adapting predictive analytics
Challenges  in  adapting  predictive  analyticsChallenges  in  adapting  predictive  analytics
Challenges in adapting predictive analytics
Prasad Narasimhan
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
SayyedYusufali
 

Similar to Introduction to Data Science - Week 3 - Steps involved in Data Science (20)

2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Challenges in adapting predictive analytics
Challenges  in  adapting  predictive  analyticsChallenges  in  adapting  predictive  analytics
Challenges in adapting predictive analytics
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 

More from Ferdin Joe John Joseph PhD

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
Ferdin Joe John Joseph PhD
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Ferdin Joe John Joseph PhD
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Ferdin Joe John Joseph PhD
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Ferdin Joe John Joseph PhD
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Ferdin Joe John Joseph PhD
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Ferdin Joe John Joseph PhD
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
Ferdin Joe John Joseph PhD
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
Ferdin Joe John Joseph PhD
 

More from Ferdin Joe John Joseph PhD (20)

Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022Invited Talk DGTiCon 2022
Invited Talk DGTiCon 2022
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Week 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud ComputingWeek 11: Cloud Native- DSA 441 Cloud Computing
Week 11: Cloud Native- DSA 441 Cloud Computing
 
Week 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud ComputingWeek 10: Cloud Security- DSA 441 Cloud Computing
Week 10: Cloud Security- DSA 441 Cloud Computing
 
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
Week 9: Relational Database Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud ComputingWeek 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
Week 7: Object Storage Service Alibaba Cloud- DSA 441 Cloud Computing
 
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
Week 6: Server Load Balancer and Auto Scaling Alibaba Cloud- DSA 441 Cloud Co...
 
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
Week 5: Elastic Compute Service (ECS) with Alibaba Cloud- DSA 441 Cloud Compu...
 
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud ComputingWeek 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
Week 4: Big Data and Hadoop in Alibaba Cloud - DSA 441 Cloud Computing
 
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
Week 3: Virtual Private Cloud, On Premise, IaaS, PaaS, SaaS - DSA 441 Cloud C...
 
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud ComputingWeek 2: Virtualization and VM Ware - DSA 441 Cloud Computing
Week 2: Virtualization and VM Ware - DSA 441 Cloud Computing
 
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud ComputingWeek 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
Week 1: Introduction to Cloud Computing - DSA 441 Cloud Computing
 
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculumSept 6 2021 BTech Artificial Intelligence and Data Science curriculum
Sept 6 2021 BTech Artificial Intelligence and Data Science curriculum
 
Hadoop in Alibaba Cloud
Hadoop in Alibaba CloudHadoop in Alibaba Cloud
Hadoop in Alibaba Cloud
 
Cloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba CloudCloud Computing Essentials in Alibaba Cloud
Cloud Computing Essentials in Alibaba Cloud
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Week 11: Programming for Data Analysis
Week 11: Programming for Data AnalysisWeek 11: Programming for Data Analysis
Week 11: Programming for Data Analysis
 
Week 10: Programming for Data Analysis
Week 10: Programming for Data AnalysisWeek 10: Programming for Data Analysis
Week 10: Programming for Data Analysis
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Introduction to Data Science - Week 3 - Steps involved in Data Science

  • 1. DSA – 105 Introduction to Data Science Week 3 – Steps involved in Data Science Ferdin Joe John Joseph, PhD Faculty of Information Technology Thai-Nichi Institute of Technology
  • 2. Week 3 Agenda • Steps involved in Data Science Faculty of Information Technology, Thai - Nichi Institute of Technology 2
  • 3. Process in Data Science Life Cycle (DSLC) Faculty of Information Technology, Thai - Nichi Institute of Technology 3
  • 4. DSLC • Business understanding • Data acquisition and understanding • Modeling • Deployment • Customer acceptance Faculty of Information Technology, Thai - Nichi Institute of Technology 4
  • 5. Business Understanding Faculty of Information Technology, Thai - Nichi Institute of Technology 5
  • 6. Data Acquisition and Understanding Faculty of Information Technology, Thai - Nichi Institute of Technology 6
  • 7. Data Modelling Faculty of Information Technology, Thai - Nichi Institute of Technology 7
  • 8. Data Modelling (Contd) Types of Data Models • Conceptual: This Data Model defines WHAT the system contains. This model is typically created by Business stakeholders and Data Architects. The purpose is to organize, scope and define business concepts and rules. • Logical: Defines HOW the system should be implemented regardless of the DBMS. This model is typically created by Data Architects and Business Analysts. The purpose is to developed technical map of rules and data structures. • Physical: This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is typically created by DBA and developers. The purpose is actual implementation of the database. Faculty of Information Technology, Thai - Nichi Institute of Technology 8
  • 9. Advantages and Disadvantages of Data Model Advantages of Data model: • The main goal of a designing data model is to make certain that data objects offered by the functional team are represented accurately. • The data model should be detailed enough to be used for building the physical database. • The information in the data model can be used for defining the relationship between tables, primary and foreign keys, and stored procedures. • Data Model helps business to communicate the within and across organizations. • Data model helps to documents data mappings in ETL process • Help to recognize correct sources of data to populate the model Disadvantages of Data model: • To developer Data model one should know physical data stored characteristics. • This is a navigational system produces complex application development, management. Thus, it requires a knowledge of the biographical truth. • Even smaller change made in structure require modification in the entire application. • There is no set data manipulation language in DBMS. Faculty of Information Technology, Thai - Nichi Institute of Technology 9
  • 10. Data Model - Nutshell • Data modeling is the process of developing data model for the data to be stored in a Database. • Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring quality of the data. • Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures. • There are three types of conceptual, logical, and physical. • The main aim of conceptual model is to establish the entities, their attributes, and their relationships. • Logical data model defines the structure of the data elements and set the relationships between them. • A Physical Data Model describes the database specific implementation of the data model. • The main goal of a designing data model is to make certain that data objects offered by the functional team are represented accurately. • The biggest drawback is that even smaller change made in structure require modification in the entire application. Faculty of Information Technology, Thai - Nichi Institute of Technology 10
  • 11. Data Vs Meta Data Faculty of Information Technology, Thai - Nichi Institute of Technology 11
  • 12. Data Model Definition Faculty of Information Technology, Thai - Nichi Institute of Technology 12
  • 13. Data Model Representation Faculty of Information Technology, Thai - Nichi Institute of Technology 13
  • 14. Representation of Data Model Faculty of Information Technology, Thai - Nichi Institute of Technology 14
  • 15. Scope of Data Modelling Faculty of Information Technology, Thai - Nichi Institute of Technology 15
  • 16. Data Model Aspects • Business • Technical Faculty of Information Technology, Thai - Nichi Institute of Technology 16
  • 17. Data Model Aspects: Business Faculty of Information Technology, Thai - Nichi Institute of Technology 17
  • 18. Data Model Aspects: Technical Faculty of Information Technology, Thai - Nichi Institute of Technology 18
  • 19. Levels of Data Models • Logical • Enterprise • Conceptual • Physical Faculty of Information Technology, Thai - Nichi Institute of Technology 19
  • 20. Logical Data Modelling Components Faculty of Information Technology, Thai - Nichi Institute of Technology 20
  • 21. Logical Data Modelling Components Faculty of Information Technology, Thai - Nichi Institute of Technology 21
  • 22. Faculty of Information Technology, Thai - Nichi Institute of Technology 22
  • 23. Steps in Data Science Process Faculty of Information Technology, Thai - Nichi Institute of Technology 23
  • 24. Define the Project Objective • Goal: Clearly and explicitly specifying the model target as a sharp question which is use to drive the customer engagement. • Responsibility: This will be customer driven to maximize business value, with guidance from the data science team to make the end objective answerable and actionable. • The first step towards a successful data science project is to define the question we are interested in answering. This is where we define a hypothesis we’d like to test, or the objective of the project. It helps to describe what the expected end result of the engagement would be, so that we can use these results to add business value. Faculty of Information Technology, Thai - Nichi Institute of Technology 24
  • 25. Define the Project Objective • A key component of successful data science projects is defining the project objective with a sharp question. A sharp question is well defined and can be answered with a name or number. Remember that data science can only be used to answer five different types of questions: How much or how many? (regression) Which category? (classification) Which group? (clustering) Is this weird? (anomaly detection) Which option should be taken? (recommendation) • The type (or class) of the question restricts and informs the following: Which algorithms the data scientist can use to address the problem. How to measure the algorithms accuracy. Data requirements. A success metric is typically determined by which question is asked. The metric is defined by how we measure accuracy within that question class. Once we have an idea of the measure, we can discuss what success would look like in terms of this metric. Faculty of Information Technology, Thai - Nichi Institute of Technology 25
  • 26. Deliverable • Deliverable: Project Objective This is usually a single-page document clearly stating the question of interest and how the expected answer will look. The document should also include some criteria for customer acceptance of the final solution and an expected implementation of the solution. • We can think of this as an initial contract that defines the customer expectations in terms of an achievable end point of the engagement. This is often an exercise that is completed in collaboration between the customer and data science team. This deliverable will prove to be valuable as it encourages customer engagement in the process. Faculty of Information Technology, Thai - Nichi Institute of Technology 26
  • 27. Identifying Data Sources Goal: Clearly specifying where to find the data sources of interest. Define the machine learning target in this step and determine if we need to bring in ancillary data from other sources. Responsibility: Typically, the customer comes with data in hand. With a sharp question, the data science team can begin formulating an answer by locating the data required to answer that question. Just because we have a lot of data does not mean we will use it all, or that it contains all that we need to answer the question. In addition, all data sources are not equally helpful in answering the specific question of interest. We are looking for: • Data that is Relevant to the question. Do we have measures of the target and features that are related to the target? • Data that is an Accurate measure of our model target and the features of interest. Faculty of Information Technology, Thai - Nichi Institute of Technology 27
  • 28. Identifying Data Sources We are typically using data sources that are collected for reasons other than answering our specific question. This means we are collecting data sources opportunistically, so some information that could be extremely helpful in answering the question may not have been collected. We also are not controlling the environment of observations, which means we are only able to determine correlations between collected information and the outcome of interest, not specific causal inferences. Deliverable: Data Sources Usually a single-page document clearly stating where the data resides. This could include one or more data sources and possibly the associated entity-relation diagrams. This document should also include the target variable definition. Faculty of Information Technology, Thai - Nichi Institute of Technology 28
  • 29. Initial Data Exploration Goal: To determine if the data we have can be used to answer the question. If not, we may need to collect more data. Responsibility: Data science team begins to evaluate the data. Once we know where to find the data, this initial pass will help us determine the quality of the data provided to answer the question. Here we are looking to determine if the data is: • Connected to the target. • Large enough to move forward. Faculty of Information Technology, Thai - Nichi Institute of Technology 29
  • 30. Initial Data Exploration • At this point graphical methods are extremely helpful. Have we measured the features consistently enough for them to be useful or are there a lot of missing values in the data? Has the data been consistently collected over the time period of interest or are there blocks of missing observations? If the data does not pass this quality check, we may need to go back to the previous step to correct or get more data. • We also need enough observations to build a meaningful model and enough features for our methods to differentiate between different observations. If we’re trying to differentiate between groups or categories, are there enough examples of all possible outcomes? • The initial data exploration step (step 3) is done in parallel with identifying data sources (step 2). As we determine if the data is connected or if we have enough data, we may need to find new data sources with more accurate or more relevant data to complete the data set initially identified in step 2. Faculty of Information Technology, Thai - Nichi Institute of Technology 30
  • 31. Initial Data Exploration Deliverables: Data Exploration This step should produce the initial draft of the following documents: Exploratory Data Analysis Report: A document detailing data requirements, quality (accuracy, connectedness) and relevance to the target and the ability to answer the question of interest. It is best to use graphical methods to clearly show data features in an understandable way. Additionally, we should have an idea if there enough data to answer the question of interest with some confidence in the end result. Analytics Architecture Diagram (initial draft): With the data sources in hand, we can start to define how the machine learning pipeline will work? How often will the data sources be updated? What actions should be taken on those updates? Is there a retraining criteria as we collect and label new observations? Documenting this now can help us define and capture the required artifacts for use in later steps. Checkpoint Decision Before we begin to do the full feature engineering and model building process, we can reevaluate the project to determine value in continuing this effort. We may be ready to proceed, need to collect more data, or it’s possible the data does not exist to answer the question. Faculty of Information Technology, Thai - Nichi Institute of Technology 31
  • 32. Construction of Analysis Data Goal: Construct the analysis data set, with associated feature engineering, for building the machine learning model. Responsibility: Data science team usually made up of data engineers, experts in getting data from disparate sources, and data scientists performing additional quality and quantity checks. Faculty of Information Technology, Thai - Nichi Institute of Technology 32
  • 33. Construction of Analysis Data The analysis data set is defined by the following: Inclusion/Exclusion criteria: Evaluate observations on multiple levels to determine if they are part of the population of interest. Are they connected in time? Are there observations that are missing large chunks of information? We look at both business reasons and data quality reasons for observation inclusion/exclusion criteria. Feature engineering involves inclusion, aggregation and transformation of raw variables to create the features used in the analysis. If we want insight into what is driving the model, then we need to take care in how features are related to each other, and how the machine learning method will be using those features. This is a balancing act of including informative variables without including too many unrelated variables. Informative variables will improve our result; unrelated variables will introduce unnecessary noise into the model. Avoid leakage: Leakage is caused by including variables that can perfectly predict the target. These are usually variables that may have been used to detect the target initially. As the target is redefined, these dependencies can be hidden from the original definition. To avoid this often requires iterating between building an analysis data set, and creating a model and evaluating the accuracy. Leakage is a major reason data scientists get nervous when they get really good predictive results. Faculty of Information Technology, Thai - Nichi Institute of Technology 33
  • 34. Construction of Analysis Data Deliverable: Feature Engineering This step produces the following initial draft artifacts: • The analysis data set itself, which will be used to train and test the machine learning model in the next step. • A document describing the feature engineering required to construct the analysis data set. The source code to build the analysis data set, including queries or other source code to produce the model features and the model targets. The model features should be held separate from the target calculations for use when predicting on new observations in a production setting. This artifact will be directly used in the production pipeline of step 7. Faculty of Information Technology, Thai - Nichi Institute of Technology 34
  • 35. Machine Learning Model Goal: Answer the question by constructing and evaluating an informative model to predict the target. Responsibility: Data science. After a large amount of data specific work, we are now ready to start building a model. This machine learning step is often executed in parallel with constructing the analysis data set as information from our model can be used to build better features in the analysis data set. Faculty of Information Technology, Thai - Nichi Institute of Technology 35
  • 36. Machine Learning Model The process involves: • Splitting analysis data into training and testing data sources. • Evaluate (training and testing) a series of competing machine learning methods that are geared toward answering the question of interest with the data we currently have at hand. • Determine the “best” solution to answer the question by comparing the success metric between alternative methods. Faculty of Information Technology, Thai - Nichi Institute of Technology 36
  • 37. Machine Learning Model Deliverables: Machine Learning • The machine learning model which can be used to predict the target for new observations. This artifact will be directly used in the production pipeline of step 7. • A document describing the model, how to use the model and findings from the modelling process. What do these initial results look like? What do these tell us about our hypotheses and about the data we are using? Additionally, we can define visualizations of the model results here. Checkpoint Decision • Again, we can reevaluate if moving on to a production system here. Does the model answer the question sufficiently given the test data? Should we go back and collect more data (step 2) or change how the data is being used (step 4)? Faculty of Information Technology, Thai - Nichi Institute of Technology 37
  • 38. Validation and Customer Acceptance Goal: To finalize the machine learning deliverable by confirming the model and the evidence for the model acceptance. Responsibility: Customer focused evaluation of the project artifacts. In order to get to this point, the data science team has some confidence that the project has progressed in answering the question of interest. The answer may not be perfect, but given the data sources, data exploration, the analysis data set, and the machine learning model, the data science team has some estimates of the ability and accuracy of the model attaining the project objective. Faculty of Information Technology, Thai - Nichi Institute of Technology 38
  • 39. Validation and Customer Acceptance This step formalizes the delivery of the engagement artifacts and results to the customer for final review before committing to building out the production pipeline. The customer can then determine if the model meets the success metrics and whether the production pipeline would add business value. Deliverable: The following finalized documents and artifacts from each of the project milestones: • Project Objective (step 1) • Data Sources (step 2) • Data Exploration (step 3) • Feature Engineering (step 4). • Machine Learning (step 5) Faculty of Information Technology, Thai - Nichi Institute of Technology 39
  • 40. Validation and Customer Acceptance Checkpoint Decision For the most part, the customer should be familiar with all of these deliverables, and be aware of the current state of the project throughout the process. The validation and customer acceptance step gives the customer a change to evaluate the validity and value of the data science solution from a business perspective, before committing to continue with the production implementation. Faculty of Information Technology, Thai - Nichi Institute of Technology 40
  • 41. Production Pipeline Implementation Goal: Implement the full process to use the model and insights obtained from the engagement. The pipeline is the actual delivery of the business value to the customer. Responsibility: The data science team, typically data engineers building out the system described initially in the initial data exploration step. Faculty of Information Technology, Thai - Nichi Institute of Technology 41
  • 42. Production Pipeline Implementation Deliverable: The deliverable here is defined by how the customer intends on using the results of this engagement. This could and should include delivery of actionable insights obtained throughout the engagement. These insights can be delivered through: Data and machine learning visualizations. Operationalized data/machine learning pipeline to predict outcomes on new observations as they become available. Faculty of Information Technology, Thai - Nichi Institute of Technology 42
  • 43. Goals of Data Science Process • The goal of this process is to continue to move a data science project forward towards a clear engagement end point. • We recognize that data science is a research activity and that progress often entails an approach that moves two steps forward and one step (or worse) backwards. • Being able to clearly communicate this to customers can help avoid misunderstanding and frustration for all parties involved, and increase the odds of success. Faculty of Information Technology, Thai - Nichi Institute of Technology 43
  • 44. Activity • Perform Data Science Process on Olympic medal tally for events post WW2 Faculty of Information Technology, Thai - Nichi Institute of Technology 44
  • 45. Next Week… • Tools and Technologies in Data Science Faculty of Information Technology, Thai - Nichi Institute of Technology 45