SlideShare a Scribd company logo
1 of 24
Manoj Mishra
November 23, 2017
DataScience Lifecycle
 Introduction
 LifeCycle
 SkillTree
 Questions
AGENDA
DATA SCIENTIST
60%
19%
9%
7%
5%
Effort Organize & Clean Data
Collect data / Dataset
Data Mining to draw
pattern
Model Selection ,
training and refining
Other Tasks
Data
Acquisition
Data
Preparation
Hypothesis &
Modelling
DATA SCIENCE LIFE CYCLE
Evaluation &
Interpretation
DeploymentOperations Optimization
Business Understanding
DATA ACQUISITION
Static
• Feedback system
• CSV Data sets / text files
Live
• Logs data, memory dumps
• Sensors, controllers etc.
Virtual
• Data Virtualization
• Caching , Storing
DATA SAMPLE DATASET INVENTORY
PROJECT - PREDICTING FAILURE – PROACTIVE
MAINTENANCE
• Baseline normal operational
patterns by modelling the
unstructured Log data.
• Use Domain Experts to identify
patterns before failures.
• Use statistical measurements &
Machine Learning to determine
threshold.
• Identify patterns of activity to
anticipate and react to
circumstances that might
otherwise disrupt operations
SME -
Domain
DATA PREPARATION
• Need for Data Preparation
• Bad data or poor quality data can alter
accuracy & lead to incorrect Insights
• Gartner- Poor quality data costs an avg.
organization $13.5M / year.
• Dataset might contain discrepancies in the
names or codes.
• Dataset might contain outliers or errors.
• Dataset lacks your attributes of interest for
analysis.
• All in all the dataset is not qualitative but is
just quantitative.
• Steps Involved
DATA PREPARATION
• Includes steps to explore, preprocess, and condition data
• Create robust environment – analytics sandbox
• Data preparation tends to be the most labor-intensive step in the
analytics lifecycle
• Often at least 50 – 60% of the data science project’s time
• The data preparation phase is generally the most iterative and the one
that data scientists tend to underestimate most often 
Database :
Understand the data
Understand the Business
Airlines :
NYC FLIGHTS 13
e.g. tailnum :
A tail number refers to an
identification number painte
d on an aircraft, frequently
on the tail
Goal : Predicting
flight delays Modelling
PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13
• Exploratory Data Analysis
of the flight data for
inbound and outbound
flights for year 2013 in
NYC.
• Find patterns, benchmark,
model and find
predictors.
• To predict the flight
delays for NYC Inbound/
Outbound flights.
• Ref. DataSet
PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13
• Exploratory Data Analysis
of the flight data for
inbound and outbound
flights for year 2013 in
NYC.
• Find patterns, benchmark,
model and find
predictors.
• To predict the flight
delays for NYC Inbound/
Outbound flights.
• Ref. DataSet
OBSERVATIONS - NYC FLIGHTS 13
REF. DATASET
COMMON TOOLS - FOR DATA
PREPARATION
• Alpine Miner provides a graphical user interface for creating analytic
workflows
• OpenRefine (formerly Google Refine) is a free, open source tool for
working with messy data
• Similar to OpenRefine, Data Wrangler is an interactive tool for data
cleansing an transformation
• Alteryx and Informatica also can be tried.
HYPOTHESIS & MODELLING
• There are three main tasks addressed in this stage:
• Feature engineering: Create data features from the raw data to
facilitate model training.
• Model training: Find the model that answers the question most
accurately by comparing their success metrics.
• Determine if your model is suitable for production.
FEATURE SELECTION & ENGINEERING
Select
Features
Research
feature
relevance
Experiment
& Validate
Change the
feature set
if required
Go back to
feature
selection
FEATURE ENGINEERING
Date # footfalls in Dubai Mall
01/07/2017 124532
02/07/2017 65434
03/07/2017 12333
04/07/2017 60009
05/07/2017 46567
06/07/2017 98001
07/07/2017 146543
08/07/2017 112345
09/07/2017 76543
Date # footfalls in Dubai Mall IsHoliday?
01/07/2017 124532Yes
02/07/2017 65434No
03/07/2017 12333No
04/07/2017 60009No
05/07/2017 46567No
06/07/2017 98001No
07/07/2017 146543yes
08/07/2017 112345yes
09/07/2017 76543No
FEATURE ENGINEERING
Seasonality (holiday season):
Jun, Jul & Dec account for the
highest avg. delays
MODELLING
CREATE YOUR MODEL & EVALUATE
• Split the input data randomly for modeling into a training data set and a test data
set.
• Build the models by using the training data set.
• Evaluate the training and the test data set. Use a series of competing machine-
learning algorithms along with the various associated tuning parameters (known as
a parameter sweep) that are geared toward answering the question of interest with
the current data.
• Determine the “best” solution to answer the question by comparing the success
metrics between alternative methods.
CREATE YOUR MODEL & EVALUATE
• Supervised Learning
• Naive Bayes
• KNN
• Support Vector Machines
(SVM)
• Linear Regression
• Unsupervised Learning
• Principal Component Analysis.
• K Means
• Classification Metrics
• Accuracy Score
• Classification Report
• Confusion Matrix
• Regression Metrics
• Mean Absolute Error.
• Mean Squared Error
• R2 Score
• Clustering Metrics
• Adjusted Rand Index.
• Homogeneity
• V - measure
DEPLOYMENT
After you have a set of models that perform well, you can operationalize
them for other applications through APIs or other interface to consume
from various applications, such as:
• Online websites
• Spreadsheets
• Dashboards
• Line-of-business applications
• Back-end applications
THANK YOU !

More Related Content

What's hot

introduction to data science
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science IntroductionGang Tao
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introductionkrishna singh
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 

What's hot (20)

introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data Science Introduction
Data Science IntroductionData Science Introduction
Data Science Introduction
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Data analytics
Data analyticsData analytics
Data analytics
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
Data Science
Data ScienceData Science
Data Science
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Science
Data ScienceData Science
Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Similar to Data science life cycle

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDatabricks
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1Jessie_N
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
So your boss says you need to learn data science
So your boss says you need to learn data scienceSo your boss says you need to learn data science
So your boss says you need to learn data scienceSusan Ibach
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Bhakthi Liyanage
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks
 
Fractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For HireFractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For HireValue Amplify Consulting
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data ScienceGanes Kesari
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptAsadkhan47384
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?SAS Canada
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...Francesca Lazzeri, PhD
 
Audience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google AnalyticsAudience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google AnalyticsPrefix Technologies
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 

Similar to Data science life cycle (20)

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
So your boss says you need to learn data science
So your boss says you need to learn data scienceSo your boss says you need to learn data science
So your boss says you need to learn data science
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
Integrating Azure Machine Learning and Predictive Analytics with SharePoint O...
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Fractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For HireFractional Chief AI Officer Services For Hire
Fractional Chief AI Officer Services For Hire
 
Project Management Careers in Data Science
Project Management Careers in Data ScienceProject Management Careers in Data Science
Project Management Careers in Data Science
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...The machine learning process: From ideation to deployment with Azure Machine ...
The machine learning process: From ideation to deployment with Azure Machine ...
 
Audience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google AnalyticsAudience Measurement: Nielsen Online vs Google Analytics
Audience Measurement: Nielsen Online vs Google Analytics
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 

Recently uploaded

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Recently uploaded (20)

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Data science life cycle

  • 1. Manoj Mishra November 23, 2017 DataScience Lifecycle
  • 2.  Introduction  LifeCycle  SkillTree  Questions AGENDA
  • 3. DATA SCIENTIST 60% 19% 9% 7% 5% Effort Organize & Clean Data Collect data / Dataset Data Mining to draw pattern Model Selection , training and refining Other Tasks
  • 4. Data Acquisition Data Preparation Hypothesis & Modelling DATA SCIENCE LIFE CYCLE Evaluation & Interpretation DeploymentOperations Optimization Business Understanding
  • 5. DATA ACQUISITION Static • Feedback system • CSV Data sets / text files Live • Logs data, memory dumps • Sensors, controllers etc. Virtual • Data Virtualization • Caching , Storing
  • 7. PROJECT - PREDICTING FAILURE – PROACTIVE MAINTENANCE • Baseline normal operational patterns by modelling the unstructured Log data. • Use Domain Experts to identify patterns before failures. • Use statistical measurements & Machine Learning to determine threshold. • Identify patterns of activity to anticipate and react to circumstances that might otherwise disrupt operations SME - Domain
  • 8. DATA PREPARATION • Need for Data Preparation • Bad data or poor quality data can alter accuracy & lead to incorrect Insights • Gartner- Poor quality data costs an avg. organization $13.5M / year. • Dataset might contain discrepancies in the names or codes. • Dataset might contain outliers or errors. • Dataset lacks your attributes of interest for analysis. • All in all the dataset is not qualitative but is just quantitative. • Steps Involved
  • 9. DATA PREPARATION • Includes steps to explore, preprocess, and condition data • Create robust environment – analytics sandbox • Data preparation tends to be the most labor-intensive step in the analytics lifecycle • Often at least 50 – 60% of the data science project’s time • The data preparation phase is generally the most iterative and the one that data scientists tend to underestimate most often 
  • 10. Database : Understand the data Understand the Business Airlines : NYC FLIGHTS 13 e.g. tailnum : A tail number refers to an identification number painte d on an aircraft, frequently on the tail Goal : Predicting flight delays Modelling
  • 11. PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13 • Exploratory Data Analysis of the flight data for inbound and outbound flights for year 2013 in NYC. • Find patterns, benchmark, model and find predictors. • To predict the flight delays for NYC Inbound/ Outbound flights. • Ref. DataSet
  • 12. PREDICTING FLIGHT DELAYS - NYC FLIGHTS 13 • Exploratory Data Analysis of the flight data for inbound and outbound flights for year 2013 in NYC. • Find patterns, benchmark, model and find predictors. • To predict the flight delays for NYC Inbound/ Outbound flights. • Ref. DataSet
  • 13. OBSERVATIONS - NYC FLIGHTS 13 REF. DATASET
  • 14. COMMON TOOLS - FOR DATA PREPARATION • Alpine Miner provides a graphical user interface for creating analytic workflows • OpenRefine (formerly Google Refine) is a free, open source tool for working with messy data • Similar to OpenRefine, Data Wrangler is an interactive tool for data cleansing an transformation • Alteryx and Informatica also can be tried.
  • 15. HYPOTHESIS & MODELLING • There are three main tasks addressed in this stage: • Feature engineering: Create data features from the raw data to facilitate model training. • Model training: Find the model that answers the question most accurately by comparing their success metrics. • Determine if your model is suitable for production.
  • 16. FEATURE SELECTION & ENGINEERING Select Features Research feature relevance Experiment & Validate Change the feature set if required Go back to feature selection
  • 17. FEATURE ENGINEERING Date # footfalls in Dubai Mall 01/07/2017 124532 02/07/2017 65434 03/07/2017 12333 04/07/2017 60009 05/07/2017 46567 06/07/2017 98001 07/07/2017 146543 08/07/2017 112345 09/07/2017 76543 Date # footfalls in Dubai Mall IsHoliday? 01/07/2017 124532Yes 02/07/2017 65434No 03/07/2017 12333No 04/07/2017 60009No 05/07/2017 46567No 06/07/2017 98001No 07/07/2017 146543yes 08/07/2017 112345yes 09/07/2017 76543No
  • 18. FEATURE ENGINEERING Seasonality (holiday season): Jun, Jul & Dec account for the highest avg. delays
  • 20. CREATE YOUR MODEL & EVALUATE • Split the input data randomly for modeling into a training data set and a test data set. • Build the models by using the training data set. • Evaluate the training and the test data set. Use a series of competing machine- learning algorithms along with the various associated tuning parameters (known as a parameter sweep) that are geared toward answering the question of interest with the current data. • Determine the “best” solution to answer the question by comparing the success metrics between alternative methods.
  • 21. CREATE YOUR MODEL & EVALUATE • Supervised Learning • Naive Bayes • KNN • Support Vector Machines (SVM) • Linear Regression • Unsupervised Learning • Principal Component Analysis. • K Means • Classification Metrics • Accuracy Score • Classification Report • Confusion Matrix • Regression Metrics • Mean Absolute Error. • Mean Squared Error • R2 Score • Clustering Metrics • Adjusted Rand Index. • Homogeneity • V - measure
  • 22. DEPLOYMENT After you have a set of models that perform well, you can operationalize them for other applications through APIs or other interface to consume from various applications, such as: • Online websites • Spreadsheets • Dashboards • Line-of-business applications • Back-end applications
  • 23.