SlideShare a Scribd company logo
1 of 14
Download to read offline
Presentation on Machine Learning with
Scikit-Learn
Sanjay Nayak
IKST-Bangalore
Traditional Programming Vs. Machine Learning
Courtesy: Internet
Workflow of Machine Learning
Courtesy: Internet
Machine Learning Algorithms
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Regression/Classification
I. Linear Regression
II. Logistic Regression
III. Decision Tree
IV. SVM
V. Naive Bayes
VI. kNN
VII. K-Means
VIII. Random Forest
IX. Dimensionality Reduction Algorithms
X. Gradient Boosting algorithms
XI. Artificial Neural Network
Linear Regression
• Regression: a statistical technique for estimating the relationships among variables
y = X.β +ε
• X is a tensor in ML (in our work mostly a multidimensional matrix) called feature vector
• y is the target (what we want to predict? e.g. adsorption energy, barrier height, bandgap,
dielectric loss etc.)
• β is/are the coefficient(s)
• ε is the error in prediction
• Goal is to find β for which ε is minimum
• X and y are multidimensional: Solution is Least square
(Ordinary) Least Squares Solution
• When the number variables are not equal to number of equation: No exact solution
• Approximation in solution: Least squares
• Least squares: Overall solution minimizes the sum of the squares of the residuals made
in the results of every single equation (Source: Wikipedia)
• If number of equation is larger than number of unknown variables the solution for β is
β = (XT.X)-1 .XT .y
• If number of equation is smaller than or equal to number of unknown variables:
β = XT.(X.XT)-1 .y
• Solution is valid only if the inverse matrix exist (collinearlity?)
Minimization of the function of residual sum of square (RSS) = ||y-X.β||2
Collinearity in Matrix
1. Inversion of a matrix
2. If matrix is collinear: Determinant is Zero
Solutions?
Remove the collinearity
1. See the correlation coefficient b/w features
and remove the features which are highly correlated
2. Add a penalty term to the inverse matrix (Lasso, Ridge etc...)
Pearson's correlation coefficient
Python Code
Scikit-Learn Library
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)
What if features available to us are highly collinear?
i.e. not sufficient features to elliminate them!
Partial Least Squares (PLS) Solution
• Find new latent variables from the old features by principal component analysis (PCA)
• PCA: Find a orthonormal matrix P where U = P.X so that (1/n-1) UUT is diagonalizable
• Rows of P are the principal component X
• The new variables are chosen to simulteneously satisfy three conditions:
1. They are highly correlated to dependent variables
2. They model as much as the variance among the independent
variable as possible (Signal to noise ratio max)
3. They are uncorrelated with each other (minimizes the no. of variables )
Disadvantages: Latent variables are abstract and difficult to interpret
Scikit-Learn Library
from sklearn.linear_model import PLSRegression
model = PLSRegression(n_components=5)
optimization of n_components is required!
Ridge Regression (L2 regularization)
1. Developed to overcome the issue of Collinearity problem
2. Add a loss function to inverse matrix (least square regression)
3. Ridge Function:
Lridge (β,λ) = ||y-X.β||2 + λ||β||2
4. Solution for β is
β=(XT.X+λIpp)-1XT.y
5. In practice we have to optimize λ (Hyperparameter)
Scikit-Learn Library
from sklearn.linear_model import Ridge
model = Ridge(alpha=0.0000001, max_iter=10000, tol=0.001)
Lasso Regression (L1 regularization)
• Difference between Lasso and Ridge is the nature of the loss function
• Lasso Function:
LLasso (β,λ) = ||y-X.β||2 + λ||β||
• Solution for β is
• β = sgn (βi
LS) (|βi
LS|-λ)+
Scikit-Learn Library
from sklearn.linear_model import Lasso
model = Lasso(alpha=0.00001,max_iter=100000)
signum function (sgn) for real number
Prepocessing of Data
• Prior to construct any ML model the data need to be preprocess (Majorly time goes here)
• All NaN data should be removed
• Normalize the features (Not Target)
• Creating the feature vector is essentially our job (differnce b/w ML in other fields and material
science)
• Expertise is extremely important (Using elemental properties does not works always)
• Stuructural and chemical descriptors are needed for better precision
• We should look into minimum number but effective ones as features
Concept of Overfitting
• Overfitting is a modeling error which occurs when a function is too closely fit to a
limited set of data points
• Limited number of data: high probability of over fitting (Our case, we should be very careful)
Conclusions
• Basic overview on Machine Learning
• Briefly discussed Least squares regression
• Issues of collinearity
• Discussed about PLS, Lasso, Ridge regression
• A small discussion on preprocessing of data
• Presented a discussion on Overfitting of models

More Related Content

What's hot

K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture modelsVu Pham
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learningVivek Maskara
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN홍배 김
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain홍배 김
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsKrishnan MuthuManickam
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu
 
Solving 0-1 knapsack problems based on amoeboid organism algorithm
Solving 0-1 knapsack problems based on amoeboid organism algorithmSolving 0-1 knapsack problems based on amoeboid organism algorithm
Solving 0-1 knapsack problems based on amoeboid organism algorithmjuanjo_23
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
On the Scalability of Graph Kernels Applied to Collaborative Recommenders
On the Scalability of Graph Kernels Applied to Collaborative RecommendersOn the Scalability of Graph Kernels Applied to Collaborative Recommenders
On the Scalability of Graph Kernels Applied to Collaborative RecommendersJérôme KUNEGIS
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningMLAI2
 

What's hot (20)

K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
CS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of AlgorithmsCS8451 - Design and Analysis of Algorithms
CS8451 - Design and Analysis of Algorithms
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Solving 0-1 knapsack problems based on amoeboid organism algorithm
Solving 0-1 knapsack problems based on amoeboid organism algorithmSolving 0-1 knapsack problems based on amoeboid organism algorithm
Solving 0-1 knapsack problems based on amoeboid organism algorithm
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
On the Scalability of Graph Kernels Applied to Collaborative Recommenders
On the Scalability of Graph Kernels Applied to Collaborative RecommendersOn the Scalability of Graph Kernels Applied to Collaborative Recommenders
On the Scalability of Graph Kernels Applied to Collaborative Recommenders
 
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual LearningSequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
 

Similar to Presentation on machine learning

Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepSanjanaSaxena17
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Least Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear SolverLeast Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear SolverJi-yong Kwon
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptxAbdusSadik
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deepKNaveenKumarECE
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMayuraD1
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptxsghorai
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Introduction to Matlab
Introduction to MatlabIntroduction to Matlab
Introduction to MatlabAmr Rashed
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedJonathan Mitchell
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the thesanjaibalajeessn
 
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulinkMATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulinkreddyprasad reddyvari
 

Similar to Presentation on machine learning (20)

Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Least Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear SolverLeast Square Optimization and Sparse-Linear Solver
Least Square Optimization and Sparse-Linear Solver
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Introduction to Matlab
Introduction to MatlabIntroduction to Matlab
Introduction to Matlab
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights ReservedMachine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
1619 quantum computing
1619 quantum computing1619 quantum computing
1619 quantum computing
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulinkMATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
 

Recently uploaded

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 

Presentation on machine learning

  • 1. Presentation on Machine Learning with Scikit-Learn Sanjay Nayak IKST-Bangalore
  • 2. Traditional Programming Vs. Machine Learning Courtesy: Internet
  • 3. Workflow of Machine Learning Courtesy: Internet
  • 4. Machine Learning Algorithms 1. Supervised Learning 2. Unsupervised Learning 3. Reinforcement Learning Regression/Classification I. Linear Regression II. Logistic Regression III. Decision Tree IV. SVM V. Naive Bayes VI. kNN VII. K-Means VIII. Random Forest IX. Dimensionality Reduction Algorithms X. Gradient Boosting algorithms XI. Artificial Neural Network
  • 5. Linear Regression • Regression: a statistical technique for estimating the relationships among variables y = X.β +ε • X is a tensor in ML (in our work mostly a multidimensional matrix) called feature vector • y is the target (what we want to predict? e.g. adsorption energy, barrier height, bandgap, dielectric loss etc.) • β is/are the coefficient(s) • ε is the error in prediction • Goal is to find β for which ε is minimum • X and y are multidimensional: Solution is Least square
  • 6. (Ordinary) Least Squares Solution • When the number variables are not equal to number of equation: No exact solution • Approximation in solution: Least squares • Least squares: Overall solution minimizes the sum of the squares of the residuals made in the results of every single equation (Source: Wikipedia) • If number of equation is larger than number of unknown variables the solution for β is β = (XT.X)-1 .XT .y • If number of equation is smaller than or equal to number of unknown variables: β = XT.(X.XT)-1 .y • Solution is valid only if the inverse matrix exist (collinearlity?) Minimization of the function of residual sum of square (RSS) = ||y-X.β||2
  • 7. Collinearity in Matrix 1. Inversion of a matrix 2. If matrix is collinear: Determinant is Zero Solutions? Remove the collinearity 1. See the correlation coefficient b/w features and remove the features which are highly correlated 2. Add a penalty term to the inverse matrix (Lasso, Ridge etc...) Pearson's correlation coefficient
  • 8. Python Code Scikit-Learn Library from sklearn.linear_model import LinearRegression model = LinearRegression(fit_intercept=True) What if features available to us are highly collinear? i.e. not sufficient features to elliminate them!
  • 9. Partial Least Squares (PLS) Solution • Find new latent variables from the old features by principal component analysis (PCA) • PCA: Find a orthonormal matrix P where U = P.X so that (1/n-1) UUT is diagonalizable • Rows of P are the principal component X • The new variables are chosen to simulteneously satisfy three conditions: 1. They are highly correlated to dependent variables 2. They model as much as the variance among the independent variable as possible (Signal to noise ratio max) 3. They are uncorrelated with each other (minimizes the no. of variables ) Disadvantages: Latent variables are abstract and difficult to interpret Scikit-Learn Library from sklearn.linear_model import PLSRegression model = PLSRegression(n_components=5) optimization of n_components is required!
  • 10. Ridge Regression (L2 regularization) 1. Developed to overcome the issue of Collinearity problem 2. Add a loss function to inverse matrix (least square regression) 3. Ridge Function: Lridge (β,λ) = ||y-X.β||2 + λ||β||2 4. Solution for β is β=(XT.X+λIpp)-1XT.y 5. In practice we have to optimize λ (Hyperparameter) Scikit-Learn Library from sklearn.linear_model import Ridge model = Ridge(alpha=0.0000001, max_iter=10000, tol=0.001)
  • 11. Lasso Regression (L1 regularization) • Difference between Lasso and Ridge is the nature of the loss function • Lasso Function: LLasso (β,λ) = ||y-X.β||2 + λ||β|| • Solution for β is • β = sgn (βi LS) (|βi LS|-λ)+ Scikit-Learn Library from sklearn.linear_model import Lasso model = Lasso(alpha=0.00001,max_iter=100000) signum function (sgn) for real number
  • 12. Prepocessing of Data • Prior to construct any ML model the data need to be preprocess (Majorly time goes here) • All NaN data should be removed • Normalize the features (Not Target) • Creating the feature vector is essentially our job (differnce b/w ML in other fields and material science) • Expertise is extremely important (Using elemental properties does not works always) • Stuructural and chemical descriptors are needed for better precision • We should look into minimum number but effective ones as features
  • 13. Concept of Overfitting • Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points • Limited number of data: high probability of over fitting (Our case, we should be very careful)
  • 14. Conclusions • Basic overview on Machine Learning • Briefly discussed Least squares regression • Issues of collinearity • Discussed about PLS, Lasso, Ridge regression • A small discussion on preprocessing of data • Presented a discussion on Overfitting of models