SlideShare a Scribd company logo
WMD Journal Club
2nd October 2018
Suzanne Wallace
Motivation?
Machine learning (ML) overview
 Subfield of artificial intelligent (AI)
 Involves algorithms whose performance improves with data
 Identify/ exploit non-randomness in data and use for prediction or analysis
 Types
 Supervised
 Mapping function is learned to map inputs x to labels y with a training data set,
mapping function is then used to predict labels for new data
 c.f. company on click episode with employees labelling pixels of images for image recognition models
 e.g. regression
 Unsupervised
 e.g. dimensionality reduction
ML for quantum chemistry?
Examples of applications in QM
 Ab initio molecular dynamics to learn potential energy surface (PES)
 Orbital-free DFT to learn mapping from electron densities to their kinetic energy
 Molecular property prediction to map molecules to property values
 c.f. Dan’s work using ML to predict band gaps?
Principles of ML for quantum chemistry
 ‘Similarity principle’ – exploit redundancy
 In QM, could avoid having to repeat calculations for similar systems
 Interpolate between calculations to obtain approximate solutions
for the remaining systems
 Decisive factor is control of the interpolation error!
 How far could we push this...?
 ...train models with other models?
 E.g. use ML to map out PES of a molecule… but use various PES’s to
predict PES of other molecules based on similarity of species and
coordination environments...?
Main technical topics of the tutorial
 Kernel-based ML methods + assessing model performance
 (more details to follow…)
 Numerical representation of system (descriptor), e.g.
 Where domain knowledge for specific system is important?
 ‘The problem of learning a function from a finite sample of its values has no
unique solution (there are infinitely many functions that are compatible with
the training data) […]Essentially, one chooses the simplest model that is
compatible with the data (Occam’s razor)’
 c.f. Berkeley blog for interesting discussion of underfitting and overfitting (wrt
Fukushima)  https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/
 Overfitting may represent training data well, but perform poorly for unseen data
 ‘too much predictive power to quirks in our training data’ [cite Berkeley blog]
 Underfitting will just give nonsense for training and unseen data
Importance of
choice method +
testing model
once built
But now onto
nuts and bolts of
this learning
algorithm...
Kernel-based ML methods
(alternatives include artificial neural networks)
 Central idea  derive non-linear versions of ML algorithms by mapping
inputs into a higher dimensional space and applying the linear algorithm there
 ‘Kernel trick’  re-write linear ML algorithms to use only inner products
between inputs (norms, angles, distances between inputs)
 Functions called kernels operate on input space vectors, but gives same
results as evaluating inner products in feature space
 Essentially, are able to avoid explicit calculations in a high-dimensional
feature space
Dusting off the mathematical cobwebs…
 = for all
 = is an element of
 = real numbers
 = dot product of vectors (inner product), way of multiplying
two vectors together to obtain a scalar (I was initially massively confused by
use of the cross here…)
 = non-negative norm of a vector (a scalar value)
 = Euclidean norm (see later)
 = 1-norm (see later)
Kernel functions
 Section outlines various general conditions for an inner product of vectors in a
given vector space
 A kernel is a function that corresponds to an inner product in a feature space
 A function is only a kernel if there exists a map between the vector space and feature
space (but do not need to know form, existence is sufficient)
 A vector space with an inner product = an ‘inner product space’
 Kernel functions allow replacing computations in high-dimensional feature space
by computations in input space
Specific kernels: linear
Simple, linear kernel 
 Has identical input and feature space 
 Equivalent of using original linear algorithm (use as initial test for new system?)
 Gives a linear regression model
www.matlabsolutions.com/blog/Tensorflow-Linear-
regression-understanding-the-concept.php
Regression co-effs Training inputs
Inputs to predict
Specific kernels: Gaussian
(or squared exponential kernel or radial basis function kernel)
 Non-linear kernel 
 (non-linear  change of output not proportional to change of input)
 Maps into an infinite-dimensional feature space
 𝝈>0  hyperparameter determining the length scale on which the kernel
operates
 Something to tune for optimal model performance
 Limiting cases for 𝜎  0 or ∞ relate to overfitting and underfitting respectively
 For intermediate values of 𝜎, kernel value depends on
 Kernel approaches 1 as above  0
 “ “ “ 0 as above  ∞
 Samples close in input space are correlated in feature space
 Samples faraway however are mapped to orthogonal subspaces
Gaussian kernel is
local approximator
where scale
depends on 𝝈
Gaussian kernel is local approximator
where scale depends on 𝜎
Specific kernels: Laplacian
 Similar to Gaussian
 Uses exp, but using 1-norm instead of
euclidean norm (see next)
 Demonstrated to perform better for
molecular properties in refs 45 and 59-61
Aside: 1-norm vs. Euclidean norm
(source wikipedia)
The one we all know and love:
Summat to do with taxi drivers and the American road system:
(kind of like a constrained norm?)
Regression methods
Multiple linear regression
 Find co-effs to minimize generalization error (av. error on new inputs)
 However, in practice, due to finite size of training set can only minimize empirical
error  care must be taken to avoid over or underfitting
Ridge regression
 Added regularization to avoid overfitting
 Increases bias but reduces variance
 c.f. bias-variance tradeoff, e.g. https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/)
 Adds a penalty term where strength of regularization is determined by
hyperparameter, 𝜆 (larger values give simpler and smoother models)
Kernel ridge regression
 Applying ‘kernel trick’ to linear ridge regression  nonlinear version
(linear model in d dimensions each weighted by regression co-eff)
(term to allow modelling functions
that do not pass through origin)+
Implementation
Importance of model selection
 How to choose between different ML models
 Choice of kernel, k
 How to choose hyperparameters?
 e.g. 𝜆 for regularisation
 and 𝜎 if using Gaussian or Laplacian kernels
 Regression coefficients 𝛼 and 𝛽 for set hyperparameters determined by kernel?
 Therefore choice is dependent upon quality of training set? (methods such as bootstrapping and
cross-validation allow for re-use of data if set is small)
 Occam’s razor as general guiding principle  use simplest model that fits data
Estimating model performance
 ‘Risk of model’, f 
 R has to be estimated from a finite set of training data as the empirical risk
 Again, use regularization to avoid over-fitting to training set
loss function measuring the error of a prediction
Kernel ridge regression fits to 5 data points to
represent cosine function with different values of
hyperparameter, 𝜎
E.g. for predicting atomization energies
 Use 1k reference DFT calculations for atomization energy of organic
molecules to estimate for remaining molecules in full set of 7k molecules
(dataset and notebook for e.g. included in SI)
Considerations for…
Preparation of training dataset
 How large and homogeneous?
 In this e.g. inhomogeneous wrt no. of non-H atoms so had to include all with
four or fewer)  requires insights for relevant inhomogeneities?
 Split the training set and ‘hold out’ set  requires insights for size of sets?
 Can use methods cross-validation to reuse data if dataset is fairly small
Representation of data
Considerations for…
The model
 Choose kernel
 Choose hyperparameters 𝜆 and 𝜎, for chosen params
 Compute kernel matrices
 Algorithm computes regression coefficients
 Compute prediction performance statistics (using ‘hold out dataset’)
 Perform grid search to determine values of 𝜆 and 𝜎 for best
performance
 … Although is this not also influenced by regression coefficients which are
determined by initial choice of hyperparameters?
 Try different kernel (and repeat above steps)
 Compare performance with different kernels
Grid search to determine values of 𝜆 and 𝜎
for best performance
Comparing predictions using different
kernels
Key themes/ central ideas to method
 Exploit non-randomness in data to avoid having to perform additional QM calculations
 Minimising interpolation error when using some QM calculations to predict results of
others
 Kernel-based ML methods systematically derive nonlinear versions of linear ML
algorithms
 Avoid costly evaluations in a high-dimensional feature space through use of inner
products
 Avoiding underfitting or overfitting to training data (since we want a general model but
due to finite training data set, this will always be an empirical fit)
 Principle of Occam’s razor
 Choice of kernel
 Choice of kernel hyperparameters to minimise underfitting or overfitting
 Regularization to penalize for overfitting
 Build, optimize, tweak, repeat, optimize, tweak, repeat!
…verdict!
How useful as a starting point?
…verdict!
How useful as a starting point?
 A little hard to follow at the start of the more technical bits/ gauge what the
point in all the definitions were + some notation hard to follow, especially use of
cross when discussing dot products
 Had to google a fair bit!
 Later sections kind of easier to follow + nice example at the end
 Possibly easier to follow by not reading in order? Or re-reading the start after!
 …but nice plots to explain use of different kernels and influence of
hyperparameters on fits!

More Related Content

What's hot

OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
MLReview
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
Anirudh Ganguly
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
Aboul Ella Hassanien
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
Ashish Patel
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainDeep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Joonhyung Lee
 
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.aiAutomatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
Sri Ambati
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
Xiang Zhang
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
ananth
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 
Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5Vidur Prasad
 
One shot learning
One shot learningOne shot learning
One shot learning
Vuong Ho Ngoc
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 

What's hot (20)

OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Nn devs
Nn devsNn devs
Nn devs
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
 
A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...A hybrid sine cosine optimization algorithm for solving global optimization p...
A hybrid sine cosine optimization algorithm for solving global optimization p...
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainDeep Learning Fast MRI Using Channel Attention in Magnitude Domain
Deep Learning Fast MRI Using Channel Attention in Magnitude Domain
 
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.aiAutomatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
 
Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5
 
AR model
AR modelAR model
AR model
 
One shot learning
One shot learningOne shot learning
One shot learning
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 

Similar to ML_in_QM_JC_02-10-18

November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 butest
 
Summary.ppt
Summary.pptSummary.ppt
Summary.pptbutest
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
ssuser2023c6
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
MadhuriChandanbatwe
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
IJITCA Journal
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
Shaleen Kumar Gupta
 
Observations
ObservationsObservations
Observationsbutest
 
Presentation
PresentationPresentation
Presentationbutest
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
Fatimakhan325
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature SetOptimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
ijccmsjournal
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
ijccmsjournal
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...butest
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
infopapers
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
Gaurav Bhalotia
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
Data Con LA
 

Similar to ML_in_QM_JC_02-10-18 (20)

November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
 
Summary.ppt
Summary.pptSummary.ppt
Summary.ppt
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Observations
ObservationsObservations
Observations
 
Presentation
PresentationPresentation
Presentation
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature SetOptimal Feature Selection from VMware ESXi 5.1 Feature Set
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
 
Optimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature setOptimal feature selection from v mware esxi 5.1 feature set
Optimal feature selection from v mware esxi 5.1 feature set
 
A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...A Survey of Machine Learning Methods Applied to Computer ...
A Survey of Machine Learning Methods Applied to Computer ...
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
D0931621
D0931621D0931621
D0931621
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 

More from Suzanne Wallace

defect_supercell_finite_size_schemes_10-09-18
defect_supercell_finite_size_schemes_10-09-18defect_supercell_finite_size_schemes_10-09-18
defect_supercell_finite_size_schemes_10-09-18
Suzanne Wallace
 
DeepLearning_JC_talk
DeepLearning_JC_talkDeepLearning_JC_talk
DeepLearning_JC_talk
Suzanne Wallace
 
MRS Fall Meeting 2017
MRS Fall Meeting 2017MRS Fall Meeting 2017
MRS Fall Meeting 2017
Suzanne Wallace
 
NREL PV seminar
NREL PV seminarNREL PV seminar
NREL PV seminar
Suzanne Wallace
 
NREL_rapid_development_intro
NREL_rapid_development_introNREL_rapid_development_intro
NREL_rapid_development_intro
Suzanne Wallace
 
NREL_defect_tolerance
NREL_defect_toleranceNREL_defect_tolerance
NREL_defect_tolerance
Suzanne Wallace
 
UG masters project
UG masters projectUG masters project
UG masters project
Suzanne Wallace
 
Hoffmann band structures JC talk
Hoffmann band structures JC talkHoffmann band structures JC talk
Hoffmann band structures JC talk
Suzanne Wallace
 
FE-PV JC talk
FE-PV JC talkFE-PV JC talk
FE-PV JC talk
Suzanne Wallace
 
Defect tolerance
Defect toleranceDefect tolerance
Defect tolerance
Suzanne Wallace
 
CZTS PL data JC talk
CZTS PL data JC talkCZTS PL data JC talk
CZTS PL data JC talk
Suzanne Wallace
 
CIS GBs JC talk
CIS GBs JC talkCIS GBs JC talk
CIS GBs JC talk
Suzanne Wallace
 
Band alignment JC talk
Band alignment JC talkBand alignment JC talk
Band alignment JC talk
Suzanne Wallace
 
APS March Meeting 2016
APS March Meeting 2016APS March Meeting 2016
APS March Meeting 2016
Suzanne Wallace
 
EMRS Meeting 2017
EMRS Meeting 2017EMRS Meeting 2017
EMRS Meeting 2017
Suzanne Wallace
 

More from Suzanne Wallace (15)

defect_supercell_finite_size_schemes_10-09-18
defect_supercell_finite_size_schemes_10-09-18defect_supercell_finite_size_schemes_10-09-18
defect_supercell_finite_size_schemes_10-09-18
 
DeepLearning_JC_talk
DeepLearning_JC_talkDeepLearning_JC_talk
DeepLearning_JC_talk
 
MRS Fall Meeting 2017
MRS Fall Meeting 2017MRS Fall Meeting 2017
MRS Fall Meeting 2017
 
NREL PV seminar
NREL PV seminarNREL PV seminar
NREL PV seminar
 
NREL_rapid_development_intro
NREL_rapid_development_introNREL_rapid_development_intro
NREL_rapid_development_intro
 
NREL_defect_tolerance
NREL_defect_toleranceNREL_defect_tolerance
NREL_defect_tolerance
 
UG masters project
UG masters projectUG masters project
UG masters project
 
Hoffmann band structures JC talk
Hoffmann band structures JC talkHoffmann band structures JC talk
Hoffmann band structures JC talk
 
FE-PV JC talk
FE-PV JC talkFE-PV JC talk
FE-PV JC talk
 
Defect tolerance
Defect toleranceDefect tolerance
Defect tolerance
 
CZTS PL data JC talk
CZTS PL data JC talkCZTS PL data JC talk
CZTS PL data JC talk
 
CIS GBs JC talk
CIS GBs JC talkCIS GBs JC talk
CIS GBs JC talk
 
Band alignment JC talk
Band alignment JC talkBand alignment JC talk
Band alignment JC talk
 
APS March Meeting 2016
APS March Meeting 2016APS March Meeting 2016
APS March Meeting 2016
 
EMRS Meeting 2017
EMRS Meeting 2017EMRS Meeting 2017
EMRS Meeting 2017
 

Recently uploaded

DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 

Recently uploaded (20)

DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 

ML_in_QM_JC_02-10-18

  • 1. WMD Journal Club 2nd October 2018 Suzanne Wallace
  • 2.
  • 4. Machine learning (ML) overview  Subfield of artificial intelligent (AI)  Involves algorithms whose performance improves with data  Identify/ exploit non-randomness in data and use for prediction or analysis  Types  Supervised  Mapping function is learned to map inputs x to labels y with a training data set, mapping function is then used to predict labels for new data  c.f. company on click episode with employees labelling pixels of images for image recognition models  e.g. regression  Unsupervised  e.g. dimensionality reduction
  • 5. ML for quantum chemistry? Examples of applications in QM  Ab initio molecular dynamics to learn potential energy surface (PES)  Orbital-free DFT to learn mapping from electron densities to their kinetic energy  Molecular property prediction to map molecules to property values  c.f. Dan’s work using ML to predict band gaps?
  • 6. Principles of ML for quantum chemistry  ‘Similarity principle’ – exploit redundancy  In QM, could avoid having to repeat calculations for similar systems  Interpolate between calculations to obtain approximate solutions for the remaining systems  Decisive factor is control of the interpolation error!  How far could we push this...?  ...train models with other models?  E.g. use ML to map out PES of a molecule… but use various PES’s to predict PES of other molecules based on similarity of species and coordination environments...?
  • 7. Main technical topics of the tutorial  Kernel-based ML methods + assessing model performance  (more details to follow…)  Numerical representation of system (descriptor), e.g.  Where domain knowledge for specific system is important?  ‘The problem of learning a function from a finite sample of its values has no unique solution (there are infinitely many functions that are compatible with the training data) […]Essentially, one chooses the simplest model that is compatible with the data (Occam’s razor)’  c.f. Berkeley blog for interesting discussion of underfitting and overfitting (wrt Fukushima)  https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/  Overfitting may represent training data well, but perform poorly for unseen data  ‘too much predictive power to quirks in our training data’ [cite Berkeley blog]  Underfitting will just give nonsense for training and unseen data Importance of choice method + testing model once built But now onto nuts and bolts of this learning algorithm...
  • 8. Kernel-based ML methods (alternatives include artificial neural networks)  Central idea  derive non-linear versions of ML algorithms by mapping inputs into a higher dimensional space and applying the linear algorithm there  ‘Kernel trick’  re-write linear ML algorithms to use only inner products between inputs (norms, angles, distances between inputs)  Functions called kernels operate on input space vectors, but gives same results as evaluating inner products in feature space  Essentially, are able to avoid explicit calculations in a high-dimensional feature space
  • 9. Dusting off the mathematical cobwebs…  = for all  = is an element of  = real numbers  = dot product of vectors (inner product), way of multiplying two vectors together to obtain a scalar (I was initially massively confused by use of the cross here…)  = non-negative norm of a vector (a scalar value)  = Euclidean norm (see later)  = 1-norm (see later)
  • 10. Kernel functions  Section outlines various general conditions for an inner product of vectors in a given vector space  A kernel is a function that corresponds to an inner product in a feature space  A function is only a kernel if there exists a map between the vector space and feature space (but do not need to know form, existence is sufficient)  A vector space with an inner product = an ‘inner product space’  Kernel functions allow replacing computations in high-dimensional feature space by computations in input space
  • 11. Specific kernels: linear Simple, linear kernel   Has identical input and feature space   Equivalent of using original linear algorithm (use as initial test for new system?)  Gives a linear regression model www.matlabsolutions.com/blog/Tensorflow-Linear- regression-understanding-the-concept.php Regression co-effs Training inputs Inputs to predict
  • 12. Specific kernels: Gaussian (or squared exponential kernel or radial basis function kernel)  Non-linear kernel   (non-linear  change of output not proportional to change of input)  Maps into an infinite-dimensional feature space  𝝈>0  hyperparameter determining the length scale on which the kernel operates  Something to tune for optimal model performance  Limiting cases for 𝜎  0 or ∞ relate to overfitting and underfitting respectively  For intermediate values of 𝜎, kernel value depends on  Kernel approaches 1 as above  0  “ “ “ 0 as above  ∞  Samples close in input space are correlated in feature space  Samples faraway however are mapped to orthogonal subspaces Gaussian kernel is local approximator where scale depends on 𝝈
  • 13. Gaussian kernel is local approximator where scale depends on 𝜎
  • 14. Specific kernels: Laplacian  Similar to Gaussian  Uses exp, but using 1-norm instead of euclidean norm (see next)  Demonstrated to perform better for molecular properties in refs 45 and 59-61
  • 15. Aside: 1-norm vs. Euclidean norm (source wikipedia) The one we all know and love: Summat to do with taxi drivers and the American road system: (kind of like a constrained norm?)
  • 16.
  • 17. Regression methods Multiple linear regression  Find co-effs to minimize generalization error (av. error on new inputs)  However, in practice, due to finite size of training set can only minimize empirical error  care must be taken to avoid over or underfitting Ridge regression  Added regularization to avoid overfitting  Increases bias but reduces variance  c.f. bias-variance tradeoff, e.g. https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/)  Adds a penalty term where strength of regularization is determined by hyperparameter, 𝜆 (larger values give simpler and smoother models) Kernel ridge regression  Applying ‘kernel trick’ to linear ridge regression  nonlinear version (linear model in d dimensions each weighted by regression co-eff) (term to allow modelling functions that do not pass through origin)+
  • 18. Implementation Importance of model selection  How to choose between different ML models  Choice of kernel, k  How to choose hyperparameters?  e.g. 𝜆 for regularisation  and 𝜎 if using Gaussian or Laplacian kernels  Regression coefficients 𝛼 and 𝛽 for set hyperparameters determined by kernel?  Therefore choice is dependent upon quality of training set? (methods such as bootstrapping and cross-validation allow for re-use of data if set is small)  Occam’s razor as general guiding principle  use simplest model that fits data Estimating model performance  ‘Risk of model’, f   R has to be estimated from a finite set of training data as the empirical risk  Again, use regularization to avoid over-fitting to training set loss function measuring the error of a prediction
  • 19. Kernel ridge regression fits to 5 data points to represent cosine function with different values of hyperparameter, 𝜎
  • 20. E.g. for predicting atomization energies  Use 1k reference DFT calculations for atomization energy of organic molecules to estimate for remaining molecules in full set of 7k molecules (dataset and notebook for e.g. included in SI)
  • 21. Considerations for… Preparation of training dataset  How large and homogeneous?  In this e.g. inhomogeneous wrt no. of non-H atoms so had to include all with four or fewer)  requires insights for relevant inhomogeneities?  Split the training set and ‘hold out’ set  requires insights for size of sets?  Can use methods cross-validation to reuse data if dataset is fairly small Representation of data
  • 22. Considerations for… The model  Choose kernel  Choose hyperparameters 𝜆 and 𝜎, for chosen params  Compute kernel matrices  Algorithm computes regression coefficients  Compute prediction performance statistics (using ‘hold out dataset’)  Perform grid search to determine values of 𝜆 and 𝜎 for best performance  … Although is this not also influenced by regression coefficients which are determined by initial choice of hyperparameters?  Try different kernel (and repeat above steps)  Compare performance with different kernels
  • 23. Grid search to determine values of 𝜆 and 𝜎 for best performance
  • 24. Comparing predictions using different kernels
  • 25. Key themes/ central ideas to method  Exploit non-randomness in data to avoid having to perform additional QM calculations  Minimising interpolation error when using some QM calculations to predict results of others  Kernel-based ML methods systematically derive nonlinear versions of linear ML algorithms  Avoid costly evaluations in a high-dimensional feature space through use of inner products  Avoiding underfitting or overfitting to training data (since we want a general model but due to finite training data set, this will always be an empirical fit)  Principle of Occam’s razor  Choice of kernel  Choice of kernel hyperparameters to minimise underfitting or overfitting  Regularization to penalize for overfitting  Build, optimize, tweak, repeat, optimize, tweak, repeat!
  • 26. …verdict! How useful as a starting point?
  • 27. …verdict! How useful as a starting point?  A little hard to follow at the start of the more technical bits/ gauge what the point in all the definitions were + some notation hard to follow, especially use of cross when discussing dot products  Had to google a fair bit!  Later sections kind of easier to follow + nice example at the end  Possibly easier to follow by not reading in order? Or re-reading the start after!  …but nice plots to explain use of different kernels and influence of hyperparameters on fits!