SlideShare a Scribd company logo
1 of 43
DATA SCIENCE; WHY, WHAT,
HOW?
MUHAMMAD SHAHID
Data Science with Dr Shahid
FACEBOOK.COM/DRSHAHID.PHD
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
Data Science with Dr Shahid
•Nominal
•Ordinal
•Binary
Qualitative
•Discrete
•Continuous
Quantitative
Data Science with Dr Shahid
• Amount of data
Volume
• Different types(structured, semi-structured,
unstructured), sources, resolutions
• e.g., text, images, videos, audio
Variety
• Data generation and handling speed
Velocity
• Data in doubt (varying levels of noise ad
processing errors)
Veracity
Data Science with Dr Shahid
Data Science with Dr Shahid
Statistics
• Traditionally concerned with
analyzing primary (e.g.
Experimental) data collected
for checking specific
hypotheses(ideas)
• Primary data analysis or top-
down(confirmatory) analysis
• Hypothesis evaluation or
testing
Data Science
• Typically concerned with
analyzing secondary (e.g.,
observational) data collected
for other reasons
• Secondary data analysis or
bottom-up(exploratory)
analysis
• Hypothesis generation
• Knowledge discovery
Data Science with Dr Shahid
Data science is an interdisciplinary field
Encompasses the usage of computing tools in order to extract
knowledge from data by deploying statistical methods
Multiple definitions exist, reason being the nature of
cross-disciplinary skills needed to create value
Holy-grail of data science can be ascertained
through Venn diagrams, e.g., Drew Conway’s
Data Science with Dr Shahid
Data science as portrayed by Drew Conway
Data Science with Dr Shahid
Data Science with Dr Shahid
Stephan
Kolassa on StackExchange:
Big data
Artificial neural
networks
Machine
learning
Data mining
Deep
learning
Data Science with Dr Shahid
Machine Learning
Deep
Learning
Data Science
Artificial
Intelligence
Big
Data
Data Science with Dr Shahid
Gregory Piatetsky-Shapiro, Ph.D
Knowledge Discovery to
Data Mining to Predictive
Analytics and now to
Data Science
Essence is always: discovery
of what is true and useful
Data Science with Dr Shahid
Data Science with Dr Shahid
How?
Data Science with Dr Shahid
•Asking right questions!
•Requirements on data collection
•Analysis/Modeling
•Conveying results
MS
Azure
documentation
Data Science with Dr Shahid
Business
Understanding
Goals
• Specify key
variables
(model targets,
metrics of
success)
• Relevant data
sources
How?
• Define
*objectives
(business
problems,
stakeholders)
• **SMART
metrics
• Find the data
Artifacts
• Iterating charter
• Data Sources
• Data
Dictionaries
Data Science with Dr Shahid
Objectives
How much/many: Regression
Which category: Classification
Which group: Clustering
Is it weird: Anomaly Detection
Which opinion: Recommendation
Specific
Measurable
Achievable
Relevant
Time-bound
Data Science with Dr Shahid
MS
Azure
documentation
Data Science with Dr Shahid
Data
Acqusition
Goals
• Clean, high
quality
• Architecture of
data pipeline
(refresh & score)
How?
• Data Ingestion
• Explore the data
(quality, eda)
• Setup data
pipeline (Batch-based
,Streaming or real time, A hybrid)
Artifacts
• Data Q report
• Solution
Architecture
• Checkpoint
decision (re-evaluate
before full-feature engineering/model
building)
Data Science with Dr Shahid
MS
Azure
documentation
Data Science with Dr Shahid
Modeling
Goals
• Optimal
features
• Informative
model
• Production
ready model
How?
• Feature
engineering
• Model Training
• Production
Ready?
Artifacts
• Feature sets
• Model report
• Checkpoint
decision (Evaluate for
production)
Data Science with Dr Shahid
Model Training
Raw data Features
Starting data
Training split (70-80%)
Validation split
(10-15%)
Test split
(10-15%)
Model gets
trained
Hyper
parameters
Model gets
evaluated
Data Science with Dr Shahid
MS
Azure
documentation
Data Science with Dr Shahid
Deployement
Goals
• Deploy models
with a data
pipeline to a
production env
How?
• Operationalize
the model
Artifacts
• Status
dashboard
(system health
& KPIs)
• Final Modeling
report
• Final solution
arch doc
Data Science with Dr Shahid
Customer
acceptance
Goals
• Finalize project
deliverables
Confirm that the
pipeline, the model,
and their deployment
in a production
environment satisfy
the customer's
objectives.
How?
• System
validation
• Project hand-off
Artifacts
• Exit report of
the project for
the customer
Data Science with Dr Shahid
Data Science with Dr Shahid
UC
Berkeley
School
of
Information
Kirk Borne
Descriptive [Hindsight]
Diagnostics [Oversight]
Predictive [Foresight]
Prescriptive [Insight]
Cognitive [Rightsight]
Data Science with Dr Shahid
Data Science with Dr Shahid
What does
it take?
Data Science with Dr Shahid
Data Science with Dr Shahid
• Linear algebra, Calculus
• Probability theory, Graph theory
• Distributions, summary stats, hypothesis testing
Math/Statistics
• Supervised learning
• Unsupervised learning
• Validation, model comparison
Machine
learning
• Algorithms and data structures
• Data Visualization
• Data processing
Software engg
Data Science with Dr Shahid
Data
Scientists
Data Analyst
ML
engineer
Data engineer
Data
Architect
BI developer
Data Science with Dr Shahid
Data Science with Dr Shahid
Python for Data Science
Contact me!
Data Science with Dr Shahid
Data Science with Dr Shahid
https://www.facebook.com/drshahid.phd
https://www.linkedin.com/in/muhammad-shahid-67876212
muhammad.shahid@ieee.org
Thank You!

More Related Content

Similar to Data Science: why, what, and how?

Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningAbzetdin Adamov
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcFurore_com
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...TEST Huddle
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management Rachel Di Cresce
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217lyarmey
 
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...Christina Silver
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelKrzysztof Gorgolewski
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 

Similar to Data Science: why, what, and how? (20)

TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine Learning
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Making data sharing count
Making data sharing countMaking data sharing count
Making data sharing count
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Data science
Data scienceData science
Data science
 
Medical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: RadboudumcMedical Intelligence EDW 20 juni: Radboudumc
Medical Intelligence EDW 20 juni: Radboudumc
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...Saksham Sarode - Building Effective test Data Management in Distributed Envir...
Saksham Sarode - Building Effective test Data Management in Distributed Envir...
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
 
CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217CSU-ACADIS_dataManagement101-20120217
CSU-ACADIS_dataManagement101-20120217
 
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...Christina Silver   Seeing the wood amongst the trees - choosing an appropriat...
Christina Silver Seeing the wood amongst the trees - choosing an appropriat...
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next level
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 

Recently uploaded

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...mikehavy0
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Payal Garg #K09
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 

Recently uploaded (20)

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 

Data Science: why, what, and how?

Editor's Notes

  1. Continuous Features A measurable difference exists between the values continuous features take on. Also continuous features are usually a subset of all real numbers. Some example features are: Distance, Time. Cost, Temperature Categorical Features With categorical features, there is a specified number of discrete, possible feature values. These values may or may not have an ordering to them. If they do have a natural ordering, they are called ordinal categorical features. Otherwise if there is no intrinsic ordering, they are called nominal categorical features. Nominal Car Models Colors TV Shows Ordinal High-Medium-Low 1-10 Years Old, 11-20 Years Old, 30-40 Years Old Happy, Neutral, Sad