201907 AutoML and Neural Architecture SearchDaeJin Kim
Brief introduction of NAS
Review of EfficientNet (Google Brain), RandWire (FAIR) papers
NAS flow slide from KihoSuh's slideshare (https://www.slideshare.net/KihoSuh/neural-architecture-search-with-reinforcement-learning-76883153)
[References]
[1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (https://arxiv.org/abs/1905.11946)
[2] Exploring Randomly Wired Neural Networks for Image Recognition (https://arxiv.org/abs/1904.01569)
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. We created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorization machines for recommendations, time-series forecasting, linear regression, topic modeling, and image classification. This talk will discuss those algorithms, understand where and how they can be used.
201907 AutoML and Neural Architecture SearchDaeJin Kim
Brief introduction of NAS
Review of EfficientNet (Google Brain), RandWire (FAIR) papers
NAS flow slide from KihoSuh's slideshare (https://www.slideshare.net/KihoSuh/neural-architecture-search-with-reinforcement-learning-76883153)
[References]
[1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (https://arxiv.org/abs/1905.11946)
[2] Exploring Randomly Wired Neural Networks for Image Recognition (https://arxiv.org/abs/1904.01569)
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. We created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorization machines for recommendations, time-series forecasting, linear regression, topic modeling, and image classification. This talk will discuss those algorithms, understand where and how they can be used.
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainJoonhyung Lee
My presentation on how we participated in the fastMRI Challanege in 2019.
Aside from theoretical considerations, it also explains key implementation issues that arise in all deep learning for MRI such as disk I/O and CPU/GPU load balancing.
Used for presentation at ISBI 2020 Oral session.
Accidentally wrote the title as "Deep Learning Sum-of-Squares Images in Accelerated Parallel MRI". Sorry for the mistake!
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/bas3-Ue2qxc.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
Auto Visualization involves the problem of producing meaningful graphics when presented with data. Relevant to this task are the strategies that expert statisticians and data analysts use to gain insights through visualization, as well as the portfolio of diagnostic methods devised by statisticians in the last 50 years. While some researchers and companies may claim to do automatic visualization, the problem is much deeper than simply producing collections of histograms, bar charts, and scatterplots. The deeper problem is what subset of these graphics is critical to recognizing anomalies, outliers, unusual distributions, missing values, and so on. This talk will cover aspects of this deeper problem and will introduce H2O software that implements some of these algorithms.
Leland Wilkinson is Chief Scientist at H2O.ai and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O.ai. Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).
Generative Adversarial Networks : Basic architecture and variantsananth
In this presentation we review the fundamentals behind GANs and look at different variants. We quickly review the theory such as the cost functions, training procedure, challenges and go on to look at variants such as CycleGAN, SAGAN etc.
Visual diagnostics for more effective machine learningBenjamin Bengfort
The model selection process is a search for the best combination of features, algorithm, and hyperparameters that maximize F1, R2, or silhouette scores after cross-validation. This view of machine learning often leads us toward automated processes such as grid searches and random walks. Although this approach allows us to try many combinations, we are often left wondering if we have actually succeeded.
By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and greater understanding of underlying processes.
Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this talk, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more effective.
One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.
Deep Learning Fast MRI Using Channel Attention in Magnitude DomainJoonhyung Lee
My presentation on how we participated in the fastMRI Challanege in 2019.
Aside from theoretical considerations, it also explains key implementation issues that arise in all deep learning for MRI such as disk I/O and CPU/GPU load balancing.
Used for presentation at ISBI 2020 Oral session.
Accidentally wrote the title as "Deep Learning Sum-of-Squares Images in Accelerated Parallel MRI". Sorry for the mistake!
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/bas3-Ue2qxc.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
Auto Visualization involves the problem of producing meaningful graphics when presented with data. Relevant to this task are the strategies that expert statisticians and data analysts use to gain insights through visualization, as well as the portfolio of diagnostic methods devised by statisticians in the last 50 years. While some researchers and companies may claim to do automatic visualization, the problem is much deeper than simply producing collections of histograms, bar charts, and scatterplots. The deeper problem is what subset of these graphics is critical to recognizing anomalies, outliers, unusual distributions, missing values, and so on. This talk will cover aspects of this deeper problem and will introduce H2O software that implements some of these algorithms.
Leland Wilkinson is Chief Scientist at H2O.ai and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O.ai. Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).
Generative Adversarial Networks : Basic architecture and variantsananth
In this presentation we review the fundamentals behind GANs and look at different variants. We quickly review the theory such as the cost functions, training procedure, challenges and go on to look at variants such as CycleGAN, SAGAN etc.
Visual diagnostics for more effective machine learningBenjamin Bengfort
The model selection process is a search for the best combination of features, algorithm, and hyperparameters that maximize F1, R2, or silhouette scores after cross-validation. This view of machine learning often leads us toward automated processes such as grid searches and random walks. Although this approach allows us to try many combinations, we are often left wondering if we have actually succeeded.
By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and greater understanding of underlying processes.
Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this talk, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more effective.
One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
Optimal Feature Selection from VMware ESXi 5.1 Feature Setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which suggest usage of different resources of the server. Feature selection algorithms have been used to extract the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command. Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index respectively. The best cluster is further identified by the determined indices. The features of the best cluster are considered into a set of optimal parameters.
Optimal feature selection from v mware esxi 5.1 feature setijccmsjournal
A study of VMware ESXi 5.1 server has been carried out to find the optimal set of parameters which
suggest usage of different resources of the server. Feature selection algorithms have been used to extract
the optimum set of parameters of the data obtained from VMware ESXi 5.1 server using esxtop command.
Multiple virtual machines (VMs) are running in the mentioned server. K-means algorithm is used for
clustering the VMs. The goodness of each cluster is determined by Davies Bouldin index and Dunn index
respectively. The best cluster is further identified by the determined indices. The features of the best cluster
are considered into a set of optimal parameters.
A general frame for building optimal multiple SVM kernelsinfopapers
Dana Simian, Florin Stoica, A General Frame for Building Optimal Multiple SVM Kernels, Large-Scale Scientific Computing, Lecture Notes in Computer Science, 2012, Volume 7116/2012, 256-263, DOI: 10.1007/978-3-642-29843-1_29
Approaches to online quantile estimationData Con LA
Data Con LA 2020
Description
This talk will explore and compare several compact data structures for estimation of quantiles on streams, including a discussion of how they balance accuracy against computational resource efficiency. A new approach providing more flexibility in specifying how computational resources should be expended across the distribution will also be explained. Quantiles (e.g., median, 99th percentile) are fundamental summary statistics of one-dimensional distributions. They are particularly important for SLA-type calculations and characterizing latency distributions, but unlike their simpler counterparts such as the mean and standard deviation, their computation is somewhat more expensive. The increasing importance of stream processing (in observability and other domains) and the impossibility of exact online quantile calculation together motivate the construction of compact data structures for estimation of quantiles on streams. In this talk we will explore and compare several such data structures (e.g., moment-based, KLL sketch, t-digest) with an eye towards how they balance accuracy against resource efficiency, theoretical guarantees, and desirable properties such as mergeability. We will also discuss a recent variation of the t-digest which provides more flexibility in specifying how computational resources should be expended across the distribution. No prior knowledge of the subject is assumed. Some familiarity with the general problem area would be helpful but is not required.
Speaker
Joe Ross, Splunk, Principal Data Scientist
Brief introduction to Monte Carlo cation disorder model for CZTS, candidate photoferroic absorber materials for solar cells and ideas for using theory predictions to accelerate optimisation of devices.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
4. Machine learning (ML) overview
Subfield of artificial intelligent (AI)
Involves algorithms whose performance improves with data
Identify/ exploit non-randomness in data and use for prediction or analysis
Types
Supervised
Mapping function is learned to map inputs x to labels y with a training data set,
mapping function is then used to predict labels for new data
c.f. company on click episode with employees labelling pixels of images for image recognition models
e.g. regression
Unsupervised
e.g. dimensionality reduction
5. ML for quantum chemistry?
Examples of applications in QM
Ab initio molecular dynamics to learn potential energy surface (PES)
Orbital-free DFT to learn mapping from electron densities to their kinetic energy
Molecular property prediction to map molecules to property values
c.f. Dan’s work using ML to predict band gaps?
6. Principles of ML for quantum chemistry
‘Similarity principle’ – exploit redundancy
In QM, could avoid having to repeat calculations for similar systems
Interpolate between calculations to obtain approximate solutions
for the remaining systems
Decisive factor is control of the interpolation error!
How far could we push this...?
...train models with other models?
E.g. use ML to map out PES of a molecule… but use various PES’s to
predict PES of other molecules based on similarity of species and
coordination environments...?
7. Main technical topics of the tutorial
Kernel-based ML methods + assessing model performance
(more details to follow…)
Numerical representation of system (descriptor), e.g.
Where domain knowledge for specific system is important?
‘The problem of learning a function from a finite sample of its values has no
unique solution (there are infinitely many functions that are compatible with
the training data) […]Essentially, one chooses the simplest model that is
compatible with the data (Occam’s razor)’
c.f. Berkeley blog for interesting discussion of underfitting and overfitting (wrt
Fukushima) https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/
Overfitting may represent training data well, but perform poorly for unseen data
‘too much predictive power to quirks in our training data’ [cite Berkeley blog]
Underfitting will just give nonsense for training and unseen data
Importance of
choice method +
testing model
once built
But now onto
nuts and bolts of
this learning
algorithm...
8. Kernel-based ML methods
(alternatives include artificial neural networks)
Central idea derive non-linear versions of ML algorithms by mapping
inputs into a higher dimensional space and applying the linear algorithm there
‘Kernel trick’ re-write linear ML algorithms to use only inner products
between inputs (norms, angles, distances between inputs)
Functions called kernels operate on input space vectors, but gives same
results as evaluating inner products in feature space
Essentially, are able to avoid explicit calculations in a high-dimensional
feature space
9. Dusting off the mathematical cobwebs…
= for all
= is an element of
= real numbers
= dot product of vectors (inner product), way of multiplying
two vectors together to obtain a scalar (I was initially massively confused by
use of the cross here…)
= non-negative norm of a vector (a scalar value)
= Euclidean norm (see later)
= 1-norm (see later)
10. Kernel functions
Section outlines various general conditions for an inner product of vectors in a
given vector space
A kernel is a function that corresponds to an inner product in a feature space
A function is only a kernel if there exists a map between the vector space and feature
space (but do not need to know form, existence is sufficient)
A vector space with an inner product = an ‘inner product space’
Kernel functions allow replacing computations in high-dimensional feature space
by computations in input space
11. Specific kernels: linear
Simple, linear kernel
Has identical input and feature space
Equivalent of using original linear algorithm (use as initial test for new system?)
Gives a linear regression model
www.matlabsolutions.com/blog/Tensorflow-Linear-
regression-understanding-the-concept.php
Regression co-effs Training inputs
Inputs to predict
12. Specific kernels: Gaussian
(or squared exponential kernel or radial basis function kernel)
Non-linear kernel
(non-linear change of output not proportional to change of input)
Maps into an infinite-dimensional feature space
𝝈>0 hyperparameter determining the length scale on which the kernel
operates
Something to tune for optimal model performance
Limiting cases for 𝜎 0 or ∞ relate to overfitting and underfitting respectively
For intermediate values of 𝜎, kernel value depends on
Kernel approaches 1 as above 0
“ “ “ 0 as above ∞
Samples close in input space are correlated in feature space
Samples faraway however are mapped to orthogonal subspaces
Gaussian kernel is
local approximator
where scale
depends on 𝝈
14. Specific kernels: Laplacian
Similar to Gaussian
Uses exp, but using 1-norm instead of
euclidean norm (see next)
Demonstrated to perform better for
molecular properties in refs 45 and 59-61
15. Aside: 1-norm vs. Euclidean norm
(source wikipedia)
The one we all know and love:
Summat to do with taxi drivers and the American road system:
(kind of like a constrained norm?)
16.
17. Regression methods
Multiple linear regression
Find co-effs to minimize generalization error (av. error on new inputs)
However, in practice, due to finite size of training set can only minimize empirical
error care must be taken to avoid over or underfitting
Ridge regression
Added regularization to avoid overfitting
Increases bias but reduces variance
c.f. bias-variance tradeoff, e.g. https://ml.berkeley.edu/blog/2017/07/13/tutorial-4/)
Adds a penalty term where strength of regularization is determined by
hyperparameter, 𝜆 (larger values give simpler and smoother models)
Kernel ridge regression
Applying ‘kernel trick’ to linear ridge regression nonlinear version
(linear model in d dimensions each weighted by regression co-eff)
(term to allow modelling functions
that do not pass through origin)+
18. Implementation
Importance of model selection
How to choose between different ML models
Choice of kernel, k
How to choose hyperparameters?
e.g. 𝜆 for regularisation
and 𝜎 if using Gaussian or Laplacian kernels
Regression coefficients 𝛼 and 𝛽 for set hyperparameters determined by kernel?
Therefore choice is dependent upon quality of training set? (methods such as bootstrapping and
cross-validation allow for re-use of data if set is small)
Occam’s razor as general guiding principle use simplest model that fits data
Estimating model performance
‘Risk of model’, f
R has to be estimated from a finite set of training data as the empirical risk
Again, use regularization to avoid over-fitting to training set
loss function measuring the error of a prediction
19. Kernel ridge regression fits to 5 data points to
represent cosine function with different values of
hyperparameter, 𝜎
20. E.g. for predicting atomization energies
Use 1k reference DFT calculations for atomization energy of organic
molecules to estimate for remaining molecules in full set of 7k molecules
(dataset and notebook for e.g. included in SI)
21. Considerations for…
Preparation of training dataset
How large and homogeneous?
In this e.g. inhomogeneous wrt no. of non-H atoms so had to include all with
four or fewer) requires insights for relevant inhomogeneities?
Split the training set and ‘hold out’ set requires insights for size of sets?
Can use methods cross-validation to reuse data if dataset is fairly small
Representation of data
22. Considerations for…
The model
Choose kernel
Choose hyperparameters 𝜆 and 𝜎, for chosen params
Compute kernel matrices
Algorithm computes regression coefficients
Compute prediction performance statistics (using ‘hold out dataset’)
Perform grid search to determine values of 𝜆 and 𝜎 for best
performance
… Although is this not also influenced by regression coefficients which are
determined by initial choice of hyperparameters?
Try different kernel (and repeat above steps)
Compare performance with different kernels
23. Grid search to determine values of 𝜆 and 𝜎
for best performance
25. Key themes/ central ideas to method
Exploit non-randomness in data to avoid having to perform additional QM calculations
Minimising interpolation error when using some QM calculations to predict results of
others
Kernel-based ML methods systematically derive nonlinear versions of linear ML
algorithms
Avoid costly evaluations in a high-dimensional feature space through use of inner
products
Avoiding underfitting or overfitting to training data (since we want a general model but
due to finite training data set, this will always be an empirical fit)
Principle of Occam’s razor
Choice of kernel
Choice of kernel hyperparameters to minimise underfitting or overfitting
Regularization to penalize for overfitting
Build, optimize, tweak, repeat, optimize, tweak, repeat!
27. …verdict!
How useful as a starting point?
A little hard to follow at the start of the more technical bits/ gauge what the
point in all the definitions were + some notation hard to follow, especially use of
cross when discussing dot products
Had to google a fair bit!
Later sections kind of easier to follow + nice example at the end
Possibly easier to follow by not reading in order? Or re-reading the start after!
…but nice plots to explain use of different kernels and influence of
hyperparameters on fits!