SlideShare a Scribd company logo
1 of 31
zekeLabs
Master Guide to become a
Data Scientist
Learning made Simpler !
www.zekeLabs.com
“Goal - Become a Data Scientist”
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
“The Plan”
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Goal without a Plan is just a wish”
Complete Data Science in 20 Modules - 50 hours
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
Numerical Computation using NumPy Linear Regression
Essential Statistics & Maths Logistic Regression
Pandas & scipy for Data Wrangling & Statistics Naive Bayes
Data Visualization Trees
Introducing Machine Learning & Knowing Datasets Ensemble Methods
Data Preprocessing Nearest Neighbors
Feature Engineering Support Vector Machines
Feature Selection Techniques Clustering
Model Evaluation Machine Learning at Scale & Deployment
Model Selection 10 Projects
0. Prerequisite
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Basic Programming using Python
● Object Oriented Programming in Python
● Connecting databases & SQL
● Web scraping
● Parsing
1. Numerical Computation using NumPy - 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Why NumPy ?
● Performance
● Creation
● Access
● Concat & Split
● Axes
● Understanding Vectors
● Reshape
● Matrix Operation
● Utility functions
● Common NumPy utilities
● Broadcasting
2. Essential Statistics & Maths - 5 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Relationships - Deterministic vs Statistical
● Statistics - Descriptive vs Inferential
● Sampling
● Variables
● Distribution
● Summarizing Distribution
● Correlation, Collinearity, Causation
● Probability
● Normal Distribution
● Confidence Interval
● Hypothesis Testing
● Calculus
● Linear Algebra
● Matrix Ops
3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Series vs DataFrames
● Loading CSV, JSON, DB etc.
● Access & Filters
● DataFrame
● Exploratory Data Analysis
● Finding & Handling Missing Data
● Duplicate Handling
● Rolling averages
● Applying functions
● Handling Time Series Data
● Merging & Grouping Data
● Pivot Table & Crosstab
● Random data using scipy
● Comparing datasets using scipy
● Analyzing sample using scipy
● Kernel Density Estimation using scipy
4. Data Visualization - 4 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Understanding matplotlib
● Plotting Quantitative data
● Plotting Qualitative data
● Histograms
● Frequency Polygons
● Box-Plots
● Bar charts
● Line Graphs
● Scatter Plots
● 3D Plots
● Exploring seaborn & Bokeh
● Introduction to Tableau
● Plotting scatter plot
● Bubble chart
● Bullet chart
● Gantt chart
5. Introducing Machine Learning & Knowing Datasets - 1 hr
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Introduction to Machine Learning
● Supervised Learning
● Unsupervised Learning
● Reinforced Learning
● Regression
● Classification
● Clustering
● Machine Learning in Big Companies
● Machine Learning in Small Companies
● Machine Learning in startups
● UCI
● Kaggle
● Inbuilt scikit-learn datasets
● Generating datasets
6. Data Preprocessing - 4 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Standardize feature
● Normalize
● Encoding categorical features
● Encoding Ordinal Features
● Non-linear transformation
● Polynomial features
● Handling Time Feature
● Rolling Time window
● Custom Transformers
● DictVectorizer, CountVectorizer, TF-IDF
● NLTK - stemming, lemma, stop-words
● Skimage library for image processing
● Crop, resize, gray
● Outlier detection
● Handling Outlier data
● Handling Imbalanced classes
7. Feature Engineering - 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Principal Component Analysis
● Linear Discriminant Analysis
● Generalized Discriminant Analysis
● FastICA
● Non-negative Matrix Factorization
● TruncatedSVD
8. Feature Selection 2 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● SelectKBest for Regression
● SelectKBest for Classification
● Variance Threshold
● Drop Highly correlated features
● Dropping based on non null values
● SelectFromModel
● Feature Selection using RandomForest
● Based on correlation with target
● Univariate Feature Selection
● Recursive Feature Elimination
9. Model Evaluation - 1 hr
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Why do we need to evaluate at all ?
● Metrics for Classification
● Metrics for Regression
● Clustering matrices
● Probability Calibration
● Pairwise matrices
10. Model Selection 1 hr
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Motivation
● KFold
● StratifiedKFold
● Splitting training testing data
● Cross Validate
● GridSearchCV
● RandomizedSearchCV
11. Linear Regression - 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Understanding Ordinary Least Squares
● Cost Function
● Bias & Variance
● Coefficients & Intercept
● Simple Linear Regression
● Polynomial Linear Regression
● Ridge
● Lasso
● Elastic Net
● Stochastic Gradient Descent
● Robustness Regression
● Problem - Insurance Payout Prediction
12. Logistic Regression - 2 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Basics of Logistic Regression
● Sigmoid
● Cost Function
● Understanding important
hyperparameters
● Predicting linear separator
● Predicting nonlinear decision boundary
● Handling Imbalanced classes
● Project - Predicting if income is less than
50K or more
13. Naive Bayes - 2 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Bayes Theorem
● Gaussian Naive Bayes
● Multinomial Naive Bayes
● Bernoulli’s Naive Bayes
● Out-of-core naive bayes using partial-fit
● Limitations of naive bayes
● Choosing right
● Problem - Mail data classification
14. Trees - 2 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Understanding Information Theory
● Entropy
● Decision Tree creation
● Tree for Classification
● Tree for Regression
● Advantages of Decision Tree
● Important Hyper-parameters
● Limitations of Decision Tree
15. Ensemble Methods - 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Bagging vs Boosting
● Forests
● AdaBoost
● XGBoost
● Gradient Tree Boosting
● Voting Classifier
● Role weak estimators play
● Problem - Attack detection on network
data
16. Nearest Neighbors - 2 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Unsupervised Nearest Neighbor
● Nearest Neighbor for Classification
● Nearest Neighbor for Regression
● Effect of k
● Nearest Neighbor Algorithms
● Choosing algorithm
● Nearest Centroid Classifier
● Developing recommendation engine
17. Support Vector Machine 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Understanding SVM
● Classification
● Regression
● OneClassSVM
● Imbalanced Classes
● Kernel Functions
● Understanding Maths behind it
● Problem - Face recognition
17b. Novelty & Outlier Detection 1 hr
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Novelty vs Outlier
● OneClassSVM
● Fitting data in Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● When to use what
18. Clustering - 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Objectives of clustering
● Agglomerative clustering
● DBSCAN clustering
● KMeans
● Affinity Propagation
● Meanshift clustering
● Spectral clustering
● Hierarchical clustering
● Birch
● Clustering evaluation
19. Deployment & Scaling - 3 hrs
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Bottom-Up approach for dealing with large
data
● Extracting features using Hashing
Techniques
● Incremental learning
● Serializing data for quicker access
● Running as a Python .egg or wheel
● Model behind REST server
● Persisting & Loading model
● Deploying model behind web application
20. Use Cases
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Credit Risk - Predicting Defaulters
● Amazon Food Review Sentiment
● Predicting Employee Attrition
● Identify characters on unknown language
● Predicting insurance payout amount
● Text Categorization
● Churn Prediction
● Attack Prediction on network data
● Identifying faces
● Predict patient stay in hospital
Way Forward - Deep Learning
● Basics of TensorFlow & Keras
● Foundations of Neural Network
● Activation Functions & Optimizers
● Regularization Techniques & Loss
Functions
● Implementation Deep Neural Network
for Fashion-MNIST
● Introduction to Convolutional Neural
Network
● Filters, pooling, strides
● Different initialization techniques
● Implement CNN for Fashion-MNIST
● Hyper-parameter tuning CNN
● Understanding popular trained model
Complete Deep Learning in 10 Modules - 50 hours
support@zekeLabs.com | www.zekeLabs.com | +91 8095465880
● Transfer Learning & Fine Tuning
● Understanding Recurrent Neural
Networks
● LSTM
● GRU
● Implement Text Classification using
LSTM
● Autoencoders
● GAN
● Implement GAN & DCGAN
● Implementing image captioning
● Implementing chatbot
● Implementing MNIST generator
● Hyperparameter tuning
Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/
Thank You !!!
Visit : www.zekeLabs.com for more details
Let us know how can we help your organization to Upskill the employees to
stay updated in the ever-evolving IT Industry.
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

More Related Content

Similar to Master guide to become a Data Scientist -by zekeLabs

Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017VisageCloud
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - DeepnetsBigML, Inc
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentuŁukasz Grala
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
BitBootCamp Evening Classes
BitBootCamp Evening ClassesBitBootCamp Evening Classes
BitBootCamp Evening ClassesMenish Gupta
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABCodeOps Technologies LLP
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 

Similar to Master guide to become a Data Scientist -by zekeLabs (20)

Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
A Kaggle Talk
A Kaggle TalkA Kaggle Talk
A Kaggle Talk
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentu
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
BitBootCamp Evening Classes
BitBootCamp Evening ClassesBitBootCamp Evening Classes
BitBootCamp Evening Classes
 
Biomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLABBiomedical Signal and Image Analytics using MATLAB
Biomedical Signal and Image Analytics using MATLAB
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 

More from zekeLabs Technologies

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...zekeLabs Technologies
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabszekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabszekeLabs Technologies
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KuberneteszekeLabs Technologies
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldzekeLabs Technologies
 

More from zekeLabs Technologies (20)

Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabsDesign Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
 
[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs[Webinar] Following the Agile Footprint - zekeLabs
[Webinar] Following the Agile Footprint - zekeLabs
 
A curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & KubernetesA curtain-raiser to the container world Docker & Kubernetes
A curtain-raiser to the container world Docker & Kubernetes
 
Docker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container worldDocker - A curtain raiser to the Container world
Docker - A curtain raiser to the Container world
 
Serverless and cloud computing
Serverless and cloud computingServerless and cloud computing
Serverless and cloud computing
 
SQL
SQLSQL
SQL
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
08 Terraform: Provisioners
08 Terraform: Provisioners08 Terraform: Provisioners
08 Terraform: Provisioners
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Linear models of classification
Linear models of classificationLinear models of classification
Linear models of classification
 
Grid search, pipeline, featureunion
Grid search, pipeline, featureunionGrid search, pipeline, featureunion
Grid search, pipeline, featureunion
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Dimentionality reduction
Dimentionality reductionDimentionality reduction
Dimentionality reduction
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Recently uploaded

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 

Master guide to become a Data Scientist -by zekeLabs

  • 1. zekeLabs Master Guide to become a Data Scientist Learning made Simpler ! www.zekeLabs.com
  • 2. “Goal - Become a Data Scientist” support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 “A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett
  • 3. “The Plan” support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 “A Goal without a Plan is just a wish”
  • 4. Complete Data Science in 20 Modules - 50 hours support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 Numerical Computation using NumPy Linear Regression Essential Statistics & Maths Logistic Regression Pandas & scipy for Data Wrangling & Statistics Naive Bayes Data Visualization Trees Introducing Machine Learning & Knowing Datasets Ensemble Methods Data Preprocessing Nearest Neighbors Feature Engineering Support Vector Machines Feature Selection Techniques Clustering Model Evaluation Machine Learning at Scale & Deployment Model Selection 10 Projects
  • 5. 0. Prerequisite support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Basic Programming using Python ● Object Oriented Programming in Python ● Connecting databases & SQL ● Web scraping ● Parsing
  • 6. 1. Numerical Computation using NumPy - 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Why NumPy ? ● Performance ● Creation ● Access ● Concat & Split ● Axes ● Understanding Vectors ● Reshape ● Matrix Operation ● Utility functions ● Common NumPy utilities ● Broadcasting
  • 7. 2. Essential Statistics & Maths - 5 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Relationships - Deterministic vs Statistical ● Statistics - Descriptive vs Inferential ● Sampling ● Variables ● Distribution ● Summarizing Distribution ● Correlation, Collinearity, Causation ● Probability ● Normal Distribution ● Confidence Interval ● Hypothesis Testing ● Calculus ● Linear Algebra ● Matrix Ops
  • 8. 3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Series vs DataFrames ● Loading CSV, JSON, DB etc. ● Access & Filters ● DataFrame ● Exploratory Data Analysis ● Finding & Handling Missing Data ● Duplicate Handling ● Rolling averages ● Applying functions ● Handling Time Series Data ● Merging & Grouping Data ● Pivot Table & Crosstab ● Random data using scipy ● Comparing datasets using scipy ● Analyzing sample using scipy ● Kernel Density Estimation using scipy
  • 9. 4. Data Visualization - 4 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Understanding matplotlib ● Plotting Quantitative data ● Plotting Qualitative data ● Histograms ● Frequency Polygons ● Box-Plots ● Bar charts ● Line Graphs ● Scatter Plots ● 3D Plots ● Exploring seaborn & Bokeh ● Introduction to Tableau ● Plotting scatter plot ● Bubble chart ● Bullet chart ● Gantt chart
  • 10. 5. Introducing Machine Learning & Knowing Datasets - 1 hr support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Introduction to Machine Learning ● Supervised Learning ● Unsupervised Learning ● Reinforced Learning ● Regression ● Classification ● Clustering ● Machine Learning in Big Companies ● Machine Learning in Small Companies ● Machine Learning in startups ● UCI ● Kaggle ● Inbuilt scikit-learn datasets ● Generating datasets
  • 11. 6. Data Preprocessing - 4 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Standardize feature ● Normalize ● Encoding categorical features ● Encoding Ordinal Features ● Non-linear transformation ● Polynomial features ● Handling Time Feature ● Rolling Time window ● Custom Transformers ● DictVectorizer, CountVectorizer, TF-IDF ● NLTK - stemming, lemma, stop-words ● Skimage library for image processing ● Crop, resize, gray ● Outlier detection ● Handling Outlier data ● Handling Imbalanced classes
  • 12. 7. Feature Engineering - 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Principal Component Analysis ● Linear Discriminant Analysis ● Generalized Discriminant Analysis ● FastICA ● Non-negative Matrix Factorization ● TruncatedSVD
  • 13. 8. Feature Selection 2 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● SelectKBest for Regression ● SelectKBest for Classification ● Variance Threshold ● Drop Highly correlated features ● Dropping based on non null values ● SelectFromModel ● Feature Selection using RandomForest ● Based on correlation with target ● Univariate Feature Selection ● Recursive Feature Elimination
  • 14. 9. Model Evaluation - 1 hr support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Why do we need to evaluate at all ? ● Metrics for Classification ● Metrics for Regression ● Clustering matrices ● Probability Calibration ● Pairwise matrices
  • 15. 10. Model Selection 1 hr support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Motivation ● KFold ● StratifiedKFold ● Splitting training testing data ● Cross Validate ● GridSearchCV ● RandomizedSearchCV
  • 16. 11. Linear Regression - 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Understanding Ordinary Least Squares ● Cost Function ● Bias & Variance ● Coefficients & Intercept ● Simple Linear Regression ● Polynomial Linear Regression ● Ridge ● Lasso ● Elastic Net ● Stochastic Gradient Descent ● Robustness Regression ● Problem - Insurance Payout Prediction
  • 17. 12. Logistic Regression - 2 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Basics of Logistic Regression ● Sigmoid ● Cost Function ● Understanding important hyperparameters ● Predicting linear separator ● Predicting nonlinear decision boundary ● Handling Imbalanced classes ● Project - Predicting if income is less than 50K or more
  • 18. 13. Naive Bayes - 2 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Bayes Theorem ● Gaussian Naive Bayes ● Multinomial Naive Bayes ● Bernoulli’s Naive Bayes ● Out-of-core naive bayes using partial-fit ● Limitations of naive bayes ● Choosing right ● Problem - Mail data classification
  • 19. 14. Trees - 2 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Understanding Information Theory ● Entropy ● Decision Tree creation ● Tree for Classification ● Tree for Regression ● Advantages of Decision Tree ● Important Hyper-parameters ● Limitations of Decision Tree
  • 20. 15. Ensemble Methods - 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Bagging vs Boosting ● Forests ● AdaBoost ● XGBoost ● Gradient Tree Boosting ● Voting Classifier ● Role weak estimators play ● Problem - Attack detection on network data
  • 21. 16. Nearest Neighbors - 2 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Unsupervised Nearest Neighbor ● Nearest Neighbor for Classification ● Nearest Neighbor for Regression ● Effect of k ● Nearest Neighbor Algorithms ● Choosing algorithm ● Nearest Centroid Classifier ● Developing recommendation engine
  • 22. 17. Support Vector Machine 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Understanding SVM ● Classification ● Regression ● OneClassSVM ● Imbalanced Classes ● Kernel Functions ● Understanding Maths behind it ● Problem - Face recognition
  • 23. 17b. Novelty & Outlier Detection 1 hr support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Novelty vs Outlier ● OneClassSVM ● Fitting data in Elliptical Envelop ● Isolation Forest ● Local Outlier Factor ● When to use what
  • 24. 18. Clustering - 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Objectives of clustering ● Agglomerative clustering ● DBSCAN clustering ● KMeans ● Affinity Propagation ● Meanshift clustering ● Spectral clustering ● Hierarchical clustering ● Birch ● Clustering evaluation
  • 25. 19. Deployment & Scaling - 3 hrs support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Bottom-Up approach for dealing with large data ● Extracting features using Hashing Techniques ● Incremental learning ● Serializing data for quicker access ● Running as a Python .egg or wheel ● Model behind REST server ● Persisting & Loading model ● Deploying model behind web application
  • 26. 20. Use Cases support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Credit Risk - Predicting Defaulters ● Amazon Food Review Sentiment ● Predicting Employee Attrition ● Identify characters on unknown language ● Predicting insurance payout amount ● Text Categorization ● Churn Prediction ● Attack Prediction on network data ● Identifying faces ● Predict patient stay in hospital
  • 27. Way Forward - Deep Learning
  • 28. ● Basics of TensorFlow & Keras ● Foundations of Neural Network ● Activation Functions & Optimizers ● Regularization Techniques & Loss Functions ● Implementation Deep Neural Network for Fashion-MNIST ● Introduction to Convolutional Neural Network ● Filters, pooling, strides ● Different initialization techniques ● Implement CNN for Fashion-MNIST ● Hyper-parameter tuning CNN ● Understanding popular trained model Complete Deep Learning in 10 Modules - 50 hours support@zekeLabs.com | www.zekeLabs.com | +91 8095465880 ● Transfer Learning & Fine Tuning ● Understanding Recurrent Neural Networks ● LSTM ● GRU ● Implement Text Classification using LSTM ● Autoencoders ● GAN ● Implement GAN & DCGAN ● Implementing image captioning ● Implementing chatbot ● Implementing MNIST generator ● Hyperparameter tuning
  • 29. Repositories ● https://github.com/zekelabs/machine-learning-for-beginners ● https://github.com/zekelabs/tensorflow-tutorial/ ● Dog breed prediction - https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D D17AA47/ ● Python learning course - https://www.edyoda.com/resources/videolisting/98/
  • 31. Visit : www.zekeLabs.com for more details Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com