SlideShare a Scribd company logo
zekeLabs
Master Guide to become a
Data Scientist
Learning made Simpler !
www.zekeLabs.com
“Goal - Become a Data Scientist”
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Goal without a Plan is just a wish”
Complete Data Science / AI / ML in 20 Modules - 50 hours
Numerical Computation using NumPy Linear Regression
Essential Statistics & Maths Logistic Regression
Pandas & scipy for Data Wrangling & Statistics Naive Bayes
Data Visualization Trees
Introducing Machine Learning & Knowing Datasets Ensemble Methods
Data Preprocessing Nearest Neighbors
Feature Engineering Support Vector Machines
Feature Selection Techniques Clustering
Model Evaluation Machine Learning at Scale & Deployment
Model Selection 10 Projects
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
0. Prerequisite
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Basic Programming using Python
● Object Oriented Programming in Python
● Connecting databases & SQL
● Web scraping
● Parsing
1. Numerical Computation using NumPy - 3 hrs
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Why NumPy ?
● Performance
● Creation
● Access
● Concat & Split
● Axes
● Understanding Vectors
● Reshape
● Matrix Operation
● Utility functions
● Common NumPy utilities
● Broadcasting
2. Essential Statistics & Maths - 5 hrs
● Relationships - Deterministic vs Statistical
● Statistics - Descriptive vs Inferential
● Sampling
● Variables
● Distribution
● Summarizing Distribution
● Correlation, Collinearity, Causation
● Probability
● Normal Distribution
● Confidence Interval
● Hypothesis Testing
● Calculus
● Linear Algebra
● Matrix Ops
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs
● Series vs DataFrames
● Loading CSV, JSON, DB etc.
● Access & Filters
● DataFrame
● Exploratory Data Analysis
● Finding & Handling Missing Data
● Duplicate Handling
● Rolling averages
● Applying functions
● Handling Time Series Data
● Merging & Grouping Data
● Pivot Table & Crosstab
● Random data using scipy
● Comparing datasets using scipy
● Analyzing sample using scipy
● Kernel Density Estimation using scipy
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
4. Data Visualization - 4 hrs
● Understanding matplotlib
● Plotting Quantitative data
● Plotting Qualitative data
● Histograms
● Frequency Polygons
● Box-Plots
● Bar charts
● Line Graphs
● Scatter Plots
● 3D Plots
● Exploring seaborn & Bokeh
● Introduction to Tableau
● Plotting scatter plot
● Bubble chart
● Bullet chart
● Gantt chart
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
5. Introducing Machine Learning & Knowing Datasets - 1 hr
● Introduction to Machine Learning
● Supervised Learning
● Unsupervised Learning
● Reinforced Learning
● Regression
● Classification
● Clustering
● Machine Learning in Big Companies
● Machine Learning in Small Companies
● Machine Learning in startups
● UCI
● Kaggle
● Inbuilt scikit-learn datasets
● Generating datasets
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
6. Data Preprocessing - 4 hrs
● Standardize feature
● Normalize
● Encoding categorical features
● Encoding Ordinal Features
● Non-linear transformation
● Polynomial features
● Handling Time Feature
● Rolling Time window
● Custom Transformers
● DictVectorizer, CountVectorizer, TF-IDF
● NLTK - stemming, lemma, stop-words
● Skimage library for image processing
● Crop, resize, gray
● Outlier detection
● Handling Outlier data
● Handling Imbalanced classes
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
7. Feature Engineering - 3 hrs
● Principal Component Analysis
● Linear Discriminant Analysis
● Generalized Discriminant Analysis
● FastICA
● Non-negative Matrix Factorization
● TruncatedSVD
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
8. Feature Selection 2 hrs
● SelectKBest for Regression
● SelectKBest for Classification
● Variance Threshold
● Drop Highly correlated features
● Dropping based on non null values
● SelectFromModel
● Feature Selection using RandomForest
● Based on correlation with target
● Univariate Feature Selection
● Recursive Feature Elimination
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
9. Model Evaluation - 1 hr
● Why do we need to evaluate at all ?
● Metrics for Classification
● Metrics for Regression
● Clustering matrices
● Probability Calibration
● Pairwise matrices
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
10. Model Selection 1 hr
● Motivation
● KFold
● StratifiedKFold
● Splitting training testing data
● Cross Validate
● GridSearchCV
● RandomizedSearchCV
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
11. Linear Regression - 3 hrs
● Understanding Ordinary Least Squares
● Cost Function
● Bias & Variance
● Coefficients & Intercept
● Simple Linear Regression
● Polynomial Linear Regression
● Ridge
● Lasso
● Elastic Net
● Stochastic Gradient Descent
● Robustness Regression
● Problem - Insurance Payout Prediction
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
12. Logistic Regression - 2 hrs
● Basics of Logistic Regression
● Sigmoid
● Cost Function
● Understanding important
hyperparameters
● Predicting linear separator
● Predicting nonlinear decision boundary
● Handling Imbalanced classes
● Project - Predicting if income is less than
50K or more
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
13. Naive Bayes - 2 hrs
● Bayes Theorem
● Gaussian Naive Bayes
● Multinomial Naive Bayes
● Bernoulli’s Naive Bayes
● Out-of-core naive bayes using partial-fit
● Limitations of naive bayes
● Choosing right
● Problem - Mail data classification
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
14. Trees - 2 hrs
● Understanding Information Theory
● Entropy
● Decision Tree creation
● Tree for Classification
● Tree for Regression
● Advantages of Decision Tree
● Important Hyper-parameters
● Limitations of Decision Tree
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
15. Ensemble Methods - 3 hrs
● Bagging vs Boosting
● Forests
● AdaBoost
● XGBoost
● Gradient Tree Boosting
● Voting Classifier
● Role weak estimators play
● Problem - Attack detection on network
data
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
16. Nearest Neighbors - 2 hrs
● Unsupervised Nearest Neighbor
● Nearest Neighbor for Classification
● Nearest Neighbor for Regression
● Effect of k
● Nearest Neighbor Algorithms
● Choosing algorithm
● Nearest Centroid Classifier
● Developing recommendation engine
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
17. Support Vector Machine 3 hrs
● Understanding SVM
● Classification
● Regression
● OneClassSVM
● Imbalanced Classes
● Kernel Functions
● Understanding Maths behind it
● Problem - Face recognition
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
17b. Novelty & Outlier Detection 1 hr
● Novelty vs Outlier
● OneClassSVM
● Fitting data in Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● When to use what
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
18. Clustering - 3 hrs
● Objectives of clustering
● Agglomerative clustering
● DBSCAN clustering
● KMeans
● Affinity Propagation
● Meanshift clustering
● Spectral clustering
● Hierarchical clustering
● Birch
● Clustering evaluation
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
19. Deployment & Scaling - 3 hrs
● Bottom-Up approach for dealing with large
data
● Extracting features using Hashing
Techniques
● Incremental learning
● Serializing data for quicker access
● Running as a Python .egg or wheel
● Model behind REST server
● Persisting & Loading model
● Deploying model behind web application
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
20. Use Cases
● Credit Risk - Predicting Defaulters
● Amazon Food Review Sentiment
● Predicting Employee Attrition
● Identify characters on unknown language
● Predicting insurance payout amount
● Text Categorization
● Churn Prediction
● Attack Prediction on network data
● Identifying faces
● Predict patient stay in hospital
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
Way Forward - Deep Learning
● Basics of TensorFlow & Keras
● Foundations of Neural Network
● Activation Functions & Optimizers
● Regularization Techniques & Loss
Functions
● Implementation Deep Neural Network
for Fashion-MNIST
● Introduction to Convolutional Neural
Network
● Filters, pooling, strides
● Different initialization techniques
● Implement CNN for Fashion-MNIST
● Hyper-parameter tuning CNN
● Understanding popular trained model
Complete Deep Learning in 10 Modules - 50 hours
● Transfer Learning & Fine Tuning
● Understanding Recurrent Neural
Networks
● LSTM
● GRU
● Implement Text Classification using
LSTM
● Autoencoders
● GAN
● Implement GAN & DCGAN
● Implementing image captioning
● Implementing chatbot
● Implementing MNIST generator
● Hyperparameter tuning
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880
Thank You !!!
Visit : www.zekeLabs.com for more details
Let us know how can we help your organization to Upskill the employees to
stay updated in the ever-evolving IT Industry.
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

More Related Content

Similar to Master guide to become a data scientist

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
Nicholas McClure
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
SigOpt
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
NEEVEE Technologies
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
CodeOps Technologies LLP
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Lviv Startup Club
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
zekeLabs Technologies
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
André Karpištšenko
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
 

Similar to Master guide to become a data scientist (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
InfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition ArchitectureInfoEducatie - Face Recognition Architecture
InfoEducatie - Face Recognition Architecture
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
Vitalii Bondarenko - Масштабована бізнес-аналітика у Cloud Big Data Cluster. ...
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 

More from zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
zekeLabs Technologies
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
zekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
zekeLabs Technologies
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
zekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
zekeLabs Technologies
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
zekeLabs Technologies
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
zekeLabs Technologies
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
zekeLabs Technologies
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
zekeLabs Technologies
 
Naive bayes
Naive bayesNaive bayes
Linear regression
Linear regressionLinear regression
Linear regression
zekeLabs Technologies
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
zekeLabs Technologies
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
zekeLabs Technologies
 
Feature selection
Feature selectionFeature selection
Feature selection
zekeLabs Technologies
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
zekeLabs Technologies
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
zekeLabs Technologies
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
zekeLabs Technologies
 

More from zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Master guide to become a data scientist

  • 1. zekeLabs Master Guide to become a Data Scientist Learning made Simpler ! www.zekeLabs.com
  • 2. “Goal - Become a Data Scientist” info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
  • 3. “The Plan” info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 “A Goal without a Plan is just a wish”
  • 4. Complete Data Science / AI / ML in 20 Modules - 50 hours Numerical Computation using NumPy Linear Regression Essential Statistics & Maths Logistic Regression Pandas & scipy for Data Wrangling & Statistics Naive Bayes Data Visualization Trees Introducing Machine Learning & Knowing Datasets Ensemble Methods Data Preprocessing Nearest Neighbors Feature Engineering Support Vector Machines Feature Selection Techniques Clustering Model Evaluation Machine Learning at Scale & Deployment Model Selection 10 Projects info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 5. 0. Prerequisite info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Basic Programming using Python ● Object Oriented Programming in Python ● Connecting databases & SQL ● Web scraping ● Parsing
  • 6. 1. Numerical Computation using NumPy - 3 hrs info@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Why NumPy ? ● Performance ● Creation ● Access ● Concat & Split ● Axes ● Understanding Vectors ● Reshape ● Matrix Operation ● Utility functions ● Common NumPy utilities ● Broadcasting
  • 7. 2. Essential Statistics & Maths - 5 hrs ● Relationships - Deterministic vs Statistical ● Statistics - Descriptive vs Inferential ● Sampling ● Variables ● Distribution ● Summarizing Distribution ● Correlation, Collinearity, Causation ● Probability ● Normal Distribution ● Confidence Interval ● Hypothesis Testing ● Calculus ● Linear Algebra ● Matrix Ops info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 8. 3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs ● Series vs DataFrames ● Loading CSV, JSON, DB etc. ● Access & Filters ● DataFrame ● Exploratory Data Analysis ● Finding & Handling Missing Data ● Duplicate Handling ● Rolling averages ● Applying functions ● Handling Time Series Data ● Merging & Grouping Data ● Pivot Table & Crosstab ● Random data using scipy ● Comparing datasets using scipy ● Analyzing sample using scipy ● Kernel Density Estimation using scipy info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 9. 4. Data Visualization - 4 hrs ● Understanding matplotlib ● Plotting Quantitative data ● Plotting Qualitative data ● Histograms ● Frequency Polygons ● Box-Plots ● Bar charts ● Line Graphs ● Scatter Plots ● 3D Plots ● Exploring seaborn & Bokeh ● Introduction to Tableau ● Plotting scatter plot ● Bubble chart ● Bullet chart ● Gantt chart info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 10. 5. Introducing Machine Learning & Knowing Datasets - 1 hr ● Introduction to Machine Learning ● Supervised Learning ● Unsupervised Learning ● Reinforced Learning ● Regression ● Classification ● Clustering ● Machine Learning in Big Companies ● Machine Learning in Small Companies ● Machine Learning in startups ● UCI ● Kaggle ● Inbuilt scikit-learn datasets ● Generating datasets info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 11. 6. Data Preprocessing - 4 hrs ● Standardize feature ● Normalize ● Encoding categorical features ● Encoding Ordinal Features ● Non-linear transformation ● Polynomial features ● Handling Time Feature ● Rolling Time window ● Custom Transformers ● DictVectorizer, CountVectorizer, TF-IDF ● NLTK - stemming, lemma, stop-words ● Skimage library for image processing ● Crop, resize, gray ● Outlier detection ● Handling Outlier data ● Handling Imbalanced classes info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 12. 7. Feature Engineering - 3 hrs ● Principal Component Analysis ● Linear Discriminant Analysis ● Generalized Discriminant Analysis ● FastICA ● Non-negative Matrix Factorization ● TruncatedSVD info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 13. 8. Feature Selection 2 hrs ● SelectKBest for Regression ● SelectKBest for Classification ● Variance Threshold ● Drop Highly correlated features ● Dropping based on non null values ● SelectFromModel ● Feature Selection using RandomForest ● Based on correlation with target ● Univariate Feature Selection ● Recursive Feature Elimination info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 14. 9. Model Evaluation - 1 hr ● Why do we need to evaluate at all ? ● Metrics for Classification ● Metrics for Regression ● Clustering matrices ● Probability Calibration ● Pairwise matrices info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 15. 10. Model Selection 1 hr ● Motivation ● KFold ● StratifiedKFold ● Splitting training testing data ● Cross Validate ● GridSearchCV ● RandomizedSearchCV info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 16. 11. Linear Regression - 3 hrs ● Understanding Ordinary Least Squares ● Cost Function ● Bias & Variance ● Coefficients & Intercept ● Simple Linear Regression ● Polynomial Linear Regression ● Ridge ● Lasso ● Elastic Net ● Stochastic Gradient Descent ● Robustness Regression ● Problem - Insurance Payout Prediction info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 17. 12. Logistic Regression - 2 hrs ● Basics of Logistic Regression ● Sigmoid ● Cost Function ● Understanding important hyperparameters ● Predicting linear separator ● Predicting nonlinear decision boundary ● Handling Imbalanced classes ● Project - Predicting if income is less than 50K or more info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 18. 13. Naive Bayes - 2 hrs ● Bayes Theorem ● Gaussian Naive Bayes ● Multinomial Naive Bayes ● Bernoulli’s Naive Bayes ● Out-of-core naive bayes using partial-fit ● Limitations of naive bayes ● Choosing right ● Problem - Mail data classification info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 19. 14. Trees - 2 hrs ● Understanding Information Theory ● Entropy ● Decision Tree creation ● Tree for Classification ● Tree for Regression ● Advantages of Decision Tree ● Important Hyper-parameters ● Limitations of Decision Tree info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 20. 15. Ensemble Methods - 3 hrs ● Bagging vs Boosting ● Forests ● AdaBoost ● XGBoost ● Gradient Tree Boosting ● Voting Classifier ● Role weak estimators play ● Problem - Attack detection on network data info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 21. 16. Nearest Neighbors - 2 hrs ● Unsupervised Nearest Neighbor ● Nearest Neighbor for Classification ● Nearest Neighbor for Regression ● Effect of k ● Nearest Neighbor Algorithms ● Choosing algorithm ● Nearest Centroid Classifier ● Developing recommendation engine info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 22. 17. Support Vector Machine 3 hrs ● Understanding SVM ● Classification ● Regression ● OneClassSVM ● Imbalanced Classes ● Kernel Functions ● Understanding Maths behind it ● Problem - Face recognition info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 23. 17b. Novelty & Outlier Detection 1 hr ● Novelty vs Outlier ● OneClassSVM ● Fitting data in Elliptical Envelop ● Isolation Forest ● Local Outlier Factor ● When to use what info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 24. 18. Clustering - 3 hrs ● Objectives of clustering ● Agglomerative clustering ● DBSCAN clustering ● KMeans ● Affinity Propagation ● Meanshift clustering ● Spectral clustering ● Hierarchical clustering ● Birch ● Clustering evaluation info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 25. 19. Deployment & Scaling - 3 hrs ● Bottom-Up approach for dealing with large data ● Extracting features using Hashing Techniques ● Incremental learning ● Serializing data for quicker access ● Running as a Python .egg or wheel ● Model behind REST server ● Persisting & Loading model ● Deploying model behind web application info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 26. 20. Use Cases ● Credit Risk - Predicting Defaulters ● Amazon Food Review Sentiment ● Predicting Employee Attrition ● Identify characters on unknown language ● Predicting insurance payout amount ● Text Categorization ● Churn Prediction ● Attack Prediction on network data ● Identifying faces ● Predict patient stay in hospital info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 27. Way Forward - Deep Learning
  • 28. ● Basics of TensorFlow & Keras ● Foundations of Neural Network ● Activation Functions & Optimizers ● Regularization Techniques & Loss Functions ● Implementation Deep Neural Network for Fashion-MNIST ● Introduction to Convolutional Neural Network ● Filters, pooling, strides ● Different initialization techniques ● Implement CNN for Fashion-MNIST ● Hyper-parameter tuning CNN ● Understanding popular trained model Complete Deep Learning in 10 Modules - 50 hours ● Transfer Learning & Fine Tuning ● Understanding Recurrent Neural Networks ● LSTM ● GRU ● Implement Text Classification using LSTM ● Autoencoders ● GAN ● Implement GAN & DCGAN ● Implementing image captioning ● Implementing chatbot ● Implementing MNIST generator ● Hyperparameter tuning info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 29. Repositories ● https://github.com/zekelabs/machine-learning-for-beginners ● https://github.com/zekelabs/tensorflow-tutorial/ ● Dog breed prediction - https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D D17AA47/ ● Python learning course - https://www.edyoda.com/resources/videolisting/98/ info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
  • 31. Visit : www.zekeLabs.com for more details Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com