Master guide to become a data scientist

zekeLabs
Master Guide to become a
Data Scientist
Learning made Simpler !
www.zekeLabs.com

“Goal - Become a Data Scientist”
info@zekeLabs.com | www.zekeLabs.com | +91 8095465880
“A Dream becomes a Goal when action is taken towards its achievement” - Bo Bennett

“The Plan”
“A Goal without a Plan is just a wish”

Complete Data Science / AI / ML in 20 Modules - 50 hours
Numerical Computation using NumPy Linear Regression
Essential Statistics & Maths Logistic Regression
Pandas & scipy for Data Wrangling & Statistics Naive Bayes
Data Visualization Trees
Introducing Machine Learning & Knowing Datasets Ensemble Methods
Data Preprocessing Nearest Neighbors
Feature Engineering Support Vector Machines
Feature Selection Techniques Clustering
Model Evaluation Machine Learning at Scale & Deployment
Model Selection 10 Projects

0. Prerequisite
● Basic Programming using Python
● Object Oriented Programming in Python
● Connecting databases & SQL
● Web scraping
● Parsing

1. Numerical Computation using NumPy - 3 hrs
● Why NumPy ?
● Performance
● Creation
● Access
● Concat & Split
● Axes
● Understanding Vectors
● Reshape
● Matrix Operation
● Utility functions
● Common NumPy utilities
● Broadcasting

2. Essential Statistics & Maths - 5 hrs
● Relationships - Deterministic vs Statistical
● Statistics - Descriptive vs Inferential
● Sampling
● Variables
● Distribution
● Summarizing Distribution
● Correlation, Collinearity, Causation
● Probability
● Normal Distribution
● Confidence Interval
● Hypothesis Testing
● Calculus
● Linear Algebra
● Matrix Ops

3. Pandas & scipy for Data Wrangling & Statistics - 5 hrs
● Series vs DataFrames
● Loading CSV, JSON, DB etc.
● Access & Filters
● DataFrame
● Exploratory Data Analysis
● Finding & Handling Missing Data
● Duplicate Handling
● Rolling averages
● Applying functions
● Handling Time Series Data
● Merging & Grouping Data
● Pivot Table & Crosstab
● Random data using scipy
● Comparing datasets using scipy
● Analyzing sample using scipy
● Kernel Density Estimation using scipy

4. Data Visualization - 4 hrs
● Understanding matplotlib
● Plotting Quantitative data
● Plotting Qualitative data
● Histograms
● Frequency Polygons
● Box-Plots
● Bar charts
● Line Graphs
● Scatter Plots
● 3D Plots
● Exploring seaborn & Bokeh
● Introduction to Tableau
● Plotting scatter plot
● Bubble chart
● Bullet chart
● Gantt chart

5. Introducing Machine Learning & Knowing Datasets - 1 hr
● Introduction to Machine Learning
● Supervised Learning
● Unsupervised Learning
● Reinforced Learning
● Regression
● Classification
● Clustering
● Machine Learning in Big Companies
● Machine Learning in Small Companies
● Machine Learning in startups
● UCI
● Kaggle
● Inbuilt scikit-learn datasets
● Generating datasets

6. Data Preprocessing - 4 hrs
● Standardize feature
● Normalize
● Encoding categorical features
● Encoding Ordinal Features
● Non-linear transformation
● Polynomial features
● Handling Time Feature
● Rolling Time window
● Custom Transformers
● DictVectorizer, CountVectorizer, TF-IDF
● NLTK - stemming, lemma, stop-words
● Skimage library for image processing
● Crop, resize, gray
● Outlier detection
● Handling Outlier data
● Handling Imbalanced classes

7. Feature Engineering - 3 hrs
● Principal Component Analysis
● Linear Discriminant Analysis
● Generalized Discriminant Analysis
● FastICA
● Non-negative Matrix Factorization
● TruncatedSVD

8. Feature Selection 2 hrs
● SelectKBest for Regression
● SelectKBest for Classification
● Variance Threshold
● Drop Highly correlated features
● Dropping based on non null values
● SelectFromModel
● Feature Selection using RandomForest
● Based on correlation with target
● Univariate Feature Selection
● Recursive Feature Elimination

9. Model Evaluation - 1 hr
● Why do we need to evaluate at all ?
● Metrics for Classification
● Metrics for Regression
● Clustering matrices
● Probability Calibration
● Pairwise matrices

10. Model Selection 1 hr
● Motivation
● KFold
● StratifiedKFold
● Splitting training testing data
● Cross Validate
● GridSearchCV
● RandomizedSearchCV

11. Linear Regression - 3 hrs
● Understanding Ordinary Least Squares
● Cost Function
● Bias & Variance
● Coefficients & Intercept
● Simple Linear Regression
● Polynomial Linear Regression
● Ridge
● Lasso
● Elastic Net
● Stochastic Gradient Descent
● Robustness Regression
● Problem - Insurance Payout Prediction

12. Logistic Regression - 2 hrs
● Basics of Logistic Regression
● Sigmoid
● Cost Function
● Understanding important
hyperparameters
● Predicting linear separator
● Predicting nonlinear decision boundary
● Handling Imbalanced classes
● Project - Predicting if income is less than
50K or more

13. Naive Bayes - 2 hrs
● Bayes Theorem
● Gaussian Naive Bayes
● Multinomial Naive Bayes
● Bernoulli’s Naive Bayes
● Out-of-core naive bayes using partial-fit
● Limitations of naive bayes
● Choosing right
● Problem - Mail data classification

14. Trees - 2 hrs
● Understanding Information Theory
● Entropy
● Decision Tree creation
● Tree for Classification
● Tree for Regression
● Advantages of Decision Tree
● Important Hyper-parameters
● Limitations of Decision Tree

15. Ensemble Methods - 3 hrs
● Bagging vs Boosting
● Forests
● AdaBoost
● XGBoost
● Gradient Tree Boosting
● Voting Classifier
● Role weak estimators play
● Problem - Attack detection on network
data

16. Nearest Neighbors - 2 hrs
● Unsupervised Nearest Neighbor
● Nearest Neighbor for Classification
● Nearest Neighbor for Regression
● Effect of k
● Nearest Neighbor Algorithms
● Choosing algorithm
● Nearest Centroid Classifier
● Developing recommendation engine

17. Support Vector Machine 3 hrs
● Understanding SVM
● Classification
● Regression
● OneClassSVM
● Imbalanced Classes
● Kernel Functions
● Understanding Maths behind it
● Problem - Face recognition

17b. Novelty & Outlier Detection 1 hr
● Novelty vs Outlier
● OneClassSVM
● Fitting data in Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● When to use what

18. Clustering - 3 hrs
● Objectives of clustering
● Agglomerative clustering
● DBSCAN clustering
● KMeans
● Affinity Propagation
● Meanshift clustering
● Spectral clustering
● Hierarchical clustering
● Birch
● Clustering evaluation

19. Deployment & Scaling - 3 hrs
● Bottom-Up approach for dealing with large
data
● Extracting features using Hashing
Techniques
● Incremental learning
● Serializing data for quicker access
● Running as a Python .egg or wheel
● Model behind REST server
● Persisting & Loading model
● Deploying model behind web application

20. Use Cases
● Credit Risk - Predicting Defaulters
● Amazon Food Review Sentiment
● Predicting Employee Attrition
● Identify characters on unknown language
● Predicting insurance payout amount
● Text Categorization
● Churn Prediction
● Attack Prediction on network data
● Identifying faces
● Predict patient stay in hospital

● Basics of TensorFlow & Keras
● Foundations of Neural Network
● Activation Functions & Optimizers
● Regularization Techniques & Loss
Functions
● Implementation Deep Neural Network
for Fashion-MNIST
● Introduction to Convolutional Neural
Network
● Filters, pooling, strides
● Different initialization techniques
● Implement CNN for Fashion-MNIST
● Hyper-parameter tuning CNN
● Understanding popular trained model
Complete Deep Learning in 10 Modules - 50 hours
● Transfer Learning & Fine Tuning
● Understanding Recurrent Neural
Networks
● LSTM
● GRU
● Implement Text Classification using
LSTM
● Autoencoders
● GAN
● Implement GAN & DCGAN
● Implementing image captioning
● Implementing chatbot
● Implementing MNIST generator
● Hyperparameter tuning

Repositories
● https://github.com/zekelabs/machine-learning-for-beginners
● https://github.com/zekelabs/tensorflow-tutorial/
● Dog breed prediction -
https://www.edyoda.com/resources/watch/54AEA4CDC35394F1183A9D
D17AA47/
● Python learning course -
https://www.edyoda.com/resources/videolisting/98/
info@zekeLabs.com | www.zekeLabs.com | +91
8095465880

Visit : www.zekeLabs.com for more details
Let us know how can we help your organization to Upskill the employees to
stay updated in the ever-evolving IT Industry.
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Master guide to become a data scientist

Recommended

Recommended

More Related Content

Similar to Master guide to become a data scientist

Similar to Master guide to become a data scientist (20)

More from zekeLabs Technologies

More from zekeLabs Technologies (20)

Recently uploaded

Recently uploaded (20)

Master guide to become a data scientist