Automated machine learning lectures given at the Advanced Course on Data Science & Machine Learning. AutoML, hyperparameter optimization, Bayesian optimization, Neural Architecture Search, Meta-learning, MAML
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industry. The core of AI is the model design. Automated machine learning technology reduces the barriers to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI ubiquitous.
Using AI to build AI is a promising solution to give the power of AI to those who can't afford it as those multinational corporations. The technology is also known as Automatic Machine Learning (AutoML). OneClick.ai is the first deep learning AutoML platform that make the latest AI technology accessible to anyone with/without AI background. The deck gives a 30 minutes overview of the recent history of AutoML, and how OneClick.ai innovates on it. Check out our platform at http://www.oneclick.ai
- What is Meta-Learning?
- What are the pros and cons of Meta-Learning?
- Different Methods for Meta-Learning in AutoML.
Copyrights are reserved for Mohamed Maher - University of Tartu
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industry. The core of AI is the model design. Automated machine learning technology reduces the barriers to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI ubiquitous.
Using AI to build AI is a promising solution to give the power of AI to those who can't afford it as those multinational corporations. The technology is also known as Automatic Machine Learning (AutoML). OneClick.ai is the first deep learning AutoML platform that make the latest AI technology accessible to anyone with/without AI background. The deck gives a 30 minutes overview of the recent history of AutoML, and how OneClick.ai innovates on it. Check out our platform at http://www.oneclick.ai
- What is Meta-Learning?
- What are the pros and cons of Meta-Learning?
- Different Methods for Meta-Learning in AutoML.
Copyrights are reserved for Mohamed Maher - University of Tartu
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
As the complexity of choosing optimised and task specific steps and ML models is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
Although it focuses on end users without expert knowledge, AutoML also offers new tools to machine learning experts, for example to:
1. Perform architecture search over deep representations
2. Analyse the importance of hyperparameters.
Scalable Automatic Machine Learning in H2OSri Ambati
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O and other tools have made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks, in particular, are notoriously difficult for a non-expert to tune properly.
In this presentation, Erin LeDell (Chief Machine Learning Scientist, H2O.ai), provides an overview of the field of "Automatic Machine Learning" and introduces the new AutoML functionality in H2O. Erin also provides simple code examples to get you started using AutoML.
H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.
H2O AutoML (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) is available in all the H2O interfaces including the h2o R package, Python module, Scala/Java library, and the Flow web GUI.
Speaker Bio:
Erin LeDell is the Chief Machine Learning Scientist at H2O.ai, the company that produces the open source machine learning platform, H2O. Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from UC Berkeley. Before joining H2O.ai, she was the Principal Data Scientist at Wise.io (acquired by GE in 2016) and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc.
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
Machine learning (ML) is currently disrupting almost every industry and is being used as the core component in many systems. The decisions made by these systems may have a great impact on society and specific individuals and thus the decision-making process has to be clear and explainable so humans can trust it. Explainable AI (XAI) is a rather new field in ML in which researchers try to develop models that are able to explain the decision-making process behind ML models. In this talk, we'll learn about the fundamentals of XAI and discuss why we need to start to integrate XAI with our ML models!
Presented in Edmonton DataScience Meetup on October 2nd, 2019. Learn more: https://youtu.be/gEkPXOsDt_w
The session is about creating, training, evaluating and deploying machine learning with no-code approach using Azure AutoML.
* NO MACHINE LEARNING EXPERIENCE REQUIRED *
Agenda:
1. Introduction to Machine Learning
2. What is AutoML (Automated Machine Learning) ?
3. AutoML versus Conventional ML practices
4. Intro to Azure Automated Machine Learning
5. Hands-on demo
6 Contest
6. Learning resources
7. Conclusion
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Feature Engineering in Machine LearningKnoldus Inc.
In this Knolx we are going to explore Data Preprocessing and Feature Engineering Techniques. We will also understand what is Feature Engineering and its importance in Machine Learning. How Feature Engineering can help in getting the best results from the algorithms.
Presentation in Vietnam Japan AI Community in 2019-05-26.
The presentation summarizes what I've learned about Regularization in Deep Learning.
Disclaimer: The presentation is given in a community event, so it wasn't thoroughly reviewed or revised.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/42Oo8TOl85I.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O has made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular, are notoriously difficult for a non-expert to tune properly. In this presentation, we provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard. H2O AutoML is available in all the H2O interfaces including the h2o R package, Python module and the Flow web GUI. We will also provide simple code examples to get you started using AutoML.
Erin's Bio:
Erin is a Statistician and Machine Learning Scientist at H2O.ai. She is the main author of H2O Ensemble. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
As the complexity of choosing optimised and task specific steps and ML models is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.
Although it focuses on end users without expert knowledge, AutoML also offers new tools to machine learning experts, for example to:
1. Perform architecture search over deep representations
2. Analyse the importance of hyperparameters.
Scalable Automatic Machine Learning in H2OSri Ambati
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O and other tools have made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks, in particular, are notoriously difficult for a non-expert to tune properly.
In this presentation, Erin LeDell (Chief Machine Learning Scientist, H2O.ai), provides an overview of the field of "Automatic Machine Learning" and introduces the new AutoML functionality in H2O. Erin also provides simple code examples to get you started using AutoML.
H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.
H2O AutoML (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) is available in all the H2O interfaces including the h2o R package, Python module, Scala/Java library, and the Flow web GUI.
Speaker Bio:
Erin LeDell is the Chief Machine Learning Scientist at H2O.ai, the company that produces the open source machine learning platform, H2O. Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from UC Berkeley. Before joining H2O.ai, she was the Principal Data Scientist at Wise.io (acquired by GE in 2016) and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc.
An Introduction to XAI! Towards Trusting Your ML Models!Mansour Saffar
Machine learning (ML) is currently disrupting almost every industry and is being used as the core component in many systems. The decisions made by these systems may have a great impact on society and specific individuals and thus the decision-making process has to be clear and explainable so humans can trust it. Explainable AI (XAI) is a rather new field in ML in which researchers try to develop models that are able to explain the decision-making process behind ML models. In this talk, we'll learn about the fundamentals of XAI and discuss why we need to start to integrate XAI with our ML models!
Presented in Edmonton DataScience Meetup on October 2nd, 2019. Learn more: https://youtu.be/gEkPXOsDt_w
The session is about creating, training, evaluating and deploying machine learning with no-code approach using Azure AutoML.
* NO MACHINE LEARNING EXPERIENCE REQUIRED *
Agenda:
1. Introduction to Machine Learning
2. What is AutoML (Automated Machine Learning) ?
3. AutoML versus Conventional ML practices
4. Intro to Azure Automated Machine Learning
5. Hands-on demo
6 Contest
6. Learning resources
7. Conclusion
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
Feature Engineering in Machine LearningKnoldus Inc.
In this Knolx we are going to explore Data Preprocessing and Feature Engineering Techniques. We will also understand what is Feature Engineering and its importance in Machine Learning. How Feature Engineering can help in getting the best results from the algorithms.
Presentation in Vietnam Japan AI Community in 2019-05-26.
The presentation summarizes what I've learned about Regularization in Deep Learning.
Disclaimer: The presentation is given in a community event, so it wasn't thoroughly reviewed or revised.
MLflow is an MLOps tool that enables data scientist to quickly productionize their Machine Learning projects. To achieve this, MLFlow has four major components which are Tracking, Projects, Models, and Registry. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps. MLflow is designed to work with any machine learning library and require minimal changes to integrate into an existing codebase. In this session, we will cover the common pain points of machine learning developers such as tracking experiments, reproducibility, deployment tool and model versioning. Ready to get your hands dirty by doing quick ML project using mlflow and release to production to understand the ML-Ops lifecycle.
BERT: Bidirectional Encoder Representation from Transformer.
BERT is a Pretrained Model by Google for State of the art NLP tasks.
BERT has the ability to take into account Syntaxtic and Semantic meaning of Text.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/42Oo8TOl85I.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O has made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular, are notoriously difficult for a non-expert to tune properly. In this presentation, we provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard. H2O AutoML is available in all the H2O interfaces including the h2o R package, Python module and the Flow web GUI. We will also provide simple code examples to get you started using AutoML.
Erin's Bio:
Erin is a Statistician and Machine Learning Scientist at H2O.ai. She is the main author of H2O Ensemble. Before joining H2O, she was the Principal Data Scientist at Wise.io and Marvin Mobile Security (acquired by Veracode in 2012) and the founder of DataScientific, Inc. Erin received her Ph.D. in Biostatistics with a Designated Emphasis in Computational Science and Engineering from University of California, Berkeley. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing. She also holds a B.S. and M.A. in Mathematics.
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
Paper link: https://arxiv.org/abs/2003.03384
Video presentation link: https://youtu.be/J__uJ79m01Q
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Accelerating the Development of Efficient CP Optimizer ModelsPhilippe Laborie
The IBM Constraint Programming optimization system CP Optimizer was designed to provide automatic search and a simple modeling of discrete optimization problems, with a particular focus on scheduling applications. It is used in industry for solving operational planning and scheduling problems. We will give an overview of CP Optimizer and then describe in further detail a set of features such as input/output file format, warm-start or conflict refinement that help accelerate the development of efficient models.
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
Local Search Optimization for Hyper-Parameter Tuning: Many machine learning algorithms are sensitive to their hyper-parameter settings, lacking good universal rule-of-thumb defaults. In this talk we discuss the use of black-box local search optimization (LSO) for machine learning hyper-parameter tuning. Viewed as a black-box objective function of hyper-parameters, machine learning algorithms create a difficult class of optimization problems. The corresponding objective functions involved tend to be nonsmooth, discontinuous, unpredictably computationally expensive, requiring support for both continuous, categorical, and integer variables. Further evaluations can fail for a variety of reasons such as early exits due to node failure or hitting max time. Additionally, not all hyper-parameter combinations are compatible (creating so called “hidden constraints”). In this context, we apply a parallel hybrid derivative-free optimization algorithm that can make progress despite these difficulties providing significantly improved results over default settings with minimal user interaction. Further, we will address efficient parallel paradigms for different types of machine learning problems, while exploring the importance of validation to avoid overfitting and emphasizing that even for small data problems, the need to perform cross validations can create computationally intense functions that benefit from a distributed/threaded environment.
ChatGPT
Data analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It involves applying various techniques and methods to extract insights from data sets, often with the goal of uncovering patterns, trends, relationships, or making predictions.
Here's an overview of the key steps and techniques involved in data analysis:
Data Collection: The first step in data analysis is gathering relevant data from various sources. This can include structured data from databases, spreadsheets, or surveys, as well as unstructured data such as text documents, social media posts, or sensor readings.
Data Cleaning and Preprocessing: Once the data is collected, it often needs to be cleaned and preprocessed to ensure its quality and suitability for analysis. This involves handling missing values, removing duplicates, addressing inconsistencies, and transforming data into a suitable format for analysis.
Exploratory Data Analysis (EDA): EDA involves examining and understanding the data through summary statistics, visualizations, and statistical techniques. It helps identify patterns, distributions, outliers, and potential relationships between variables. EDA also helps in formulating hypotheses and guiding further analysis.
Data Modeling and Statistical Analysis: In this step, various statistical techniques and models are applied to the data to gain deeper insights. This can include descriptive statistics, inferential statistics, hypothesis testing, regression analysis, time series analysis, clustering, classification, and more. The choice of techniques depends on the nature of the data and the research questions being addressed.
Data Visualization: Data visualization plays a crucial role in data analysis. It involves creating meaningful and visually appealing representations of data through charts, graphs, plots, and interactive dashboards. Visualizations help in communicating insights effectively and spotting trends or patterns that may be difficult to identify in raw data.
Interpretation and Conclusion: Once the analysis is performed, the findings need to be interpreted in the context of the problem or research objectives. Conclusions are drawn based on the results, and recommendations or insights are provided to stakeholders or decision-makers.
Reporting and Communication: The final step is to present the results and findings of the data analysis in a clear and concise manner. This can be in the form of reports, presentations, or interactive visualizations. Effective communication of the analysis results is crucial for stakeholders to understand and make informed decisions based on the insights gained.
Data analysis is widely used in various fields, including business, finance, marketing, healthcare, social sciences, and more. It plays a crucial role in extracting value from data, supporting evidence-based decision-making, and driving actionable insig
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
OpenML: Making machine learning research more reproducible (and easier) by bringing it online.
From the ICML 2017 Reproducibility in Machine Learning Workshop
Building machine learning systems remains something of an art, from gathering and transforming the right data to selecting and finetuning the most fitting modeling techniques. If we want to make machine learning more accessible and foster skilfull use, we need novel ways to share and reuse findings, and streamline online collaboration. OpenML is an open science platform for machine learning, allowing anyone to easily share data sets, code, and experiments, and collaborate with people all over the world to build better models. It shows, for any known data set, which are the best models, who built them, and how to reproduce and reuse them in different ways. It is readily integrated into several machine learning environments, so that you can share results with the touch of a button or a line of code. As such, it enables large-scale, real-time collaboration, allowing anyone to explore, build on, and contribute to the combined knowledge of the field. Ultimately, this provides a wealth of information for a novel, data-driven approach to machine learning, where we learn from millions of previous experiments to either assist people while analyzing data (e.g., which modeling techniques will likely work well and why), or automate the process altogether.
Presentation on the OpenML initiative to enable open, collaborative machine learning during the data@Sheffield event. We discuss how data, machine learning algorithms and experiments can be analysed collaboratively by data scientists and domain scientists, as well as citizen scientists.
Tutorial given at the European Conference for Machine Learning (ECMLPKDD 2015). It covers OpenML, how to use it in your research, interfaces in Java, R, Python, use through machine learning tools such as WEKA and MOA. Also covers topics in open science and reproducible research.
OpenML Tutorial: Networked Science in Machine LearningJoaquin Vanschoren
Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this presentation, we introduce OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can collaborate more effectively with others to tackle harder problems. We discuss what benefits it brings for machine learning research, individual scientists, as well as students and practitioners. We show practical use cases and APIs for interacting with the system from machine learning software.
Tutorial on data science, what's it like to be a data scientist, big data, the data scientific method, probabilistic algorithms, map-reduce, sensor data analysis, visualization of twitter and foursquare feeds, open source tools (R, Python, NoSQL)
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
1. Joaquin Vanschoren
Eindhoven University of Technology
j.vanschoren@tue.nl @joavanschoren http://bit.ly/openml
Advanced Course on Data Science & Machine Learning
Siena, 2019
Automatic Machine Learning & Meta-Learning
slides!
part 1
1
2. Slides / Book
Open access book
PDF (free): www.automl.org/book
www.amazon.de/dp/3030053172
More slides:
www.automl.org/events ->
AutoML Tutorial NeurIPS 2018
Video:
www.youtube.com/watch?v=0eBR8a4MQ30
http://bit.ly/openml
2
3. Learning takes experimentation
!3
It can take a lot of trial and error
Task
Models
performance
Learning
ModelsModelsModelsModels
Learning
episodes
4. Learning to learn takes experimentation
!4
Building machine learning systems requires a lot of expertise and trials
Neural networks
- design architecture (operators/layers)
- many sensitive hyperparameters
- optimization algorithm, learning rate,…
- regularization, dropout,…
- batch size, epochs, early stopping, …
- data augmentation, …
Task
Models
performance
Learning
ModelsModelsModelsModels
7. Learning to learn takes experimentation
!7
Building machine learning systems requires a lot of expertise and trials
Task
Models
performance
Learning
ModelsModelsModelsModels
Machine learning pipelines
- design pipeline architecture (operators)
- clean, preprocess data (!)
- select and/or engineer features
- select and tune models
- adjust to evolving input data (concept drift),…
8. !8
Task Models
performance
Human expert ModelsModels
chooses architecture
and hyperparameters
Task Models
performance
Meta-level learning and
optimization ModelsModels
chooses architecture
and hyperparameters
Replace manual model building by automation
Automated machine learning
Current practice
ModelsModelsModels
10. !10
Task Models
inner loop
Meta-level learning and
optimization ModelsModels
chooses architecture
and hyperparameters
Human-in-the-loop AutoML (semi-AutoML)
Domain knowledge and human expertise are very valuable
e.g. unknown unknowns, preference learning,…
ModelsModelsModels
Human expert
outer loop
control
adapt Models +
explanations
Sutton 2019
ask
11. !11
Human-in-the-loop AutoML (semi-AutoML)
Explain model in human language: automatic statistician
Steinruecken 2019
search over grammar
of models (kernels)
data
input
model component to
English translation
report
generation
12. !12
Task ModelsOptimization ModelsModels
optimize architecture
and hyperparameters
Overview
AutoML introduction, promising optimization techniques
Meta-learning
part 1
part 2
part 3 Neural architecture search, meta-learning on neural nets
Tasks ModelsMeta-learning ModelsModels
optimize architecture
and hyperparameters
TasksTasks
Tasks ModelsNAS, meta-learning ModelsModels
optimize neural architecture,
model/hyperparameters
TasksTasks
14. !14
AutoML Definition
Let:
{A(1)
, A(2)
, . . . , A(n)
} be a set of algorithms (operators)
Λ(i)
= λ1 × λ2 × . . . × λm be the hyperparameter space for A(i)
𝒜 be the space of possible architectures of one or more algorithms
Find the configuration that minimizes the expected loss on a dataset :
λ ∈Λ𝒜
Λ𝒜 = Λ(1)
× Λ(2)
× . . . × Λ(m) its combined configuration space
a specific configuration (of architecture and hyperparameters)
ℒ(λ, Dtrain, Dvalid) the loss of the model created by , trained on
data , and validated on data
λ
Dtrain Dvalid
λ* = argmin
λ∈Λ𝒜
𝔼(Dtrain,Dvalid)∼𝒟ℒ(λ, Dtrain, Dtest)
𝒟
15. !15
Types of hyperparameters
• Continuous (e.g. learning rate, SVM_C,…)
• Integer (e.g. number of hidden units, number of boosting iterations,…)
• Categorical
• e.g. choice of algorithm (SVM, RandomForest, Neural Net,…)
• e.g. choose of operator (Convolution, MaxPooling, DropOut,…)
• e.g. activation function (ReLU, Leaky ReLU, tanh,…)
• Conditional
• e.g. SVM kernel if SVM is selected, kernel width if RBF kernel is selected
• e.g. Convolution kernel size if Convolution layer is selected
16. • We can identify two subproblems:
• Architecture search: search space of all possible architectures
• Pipelines: Fixed predefined pipeline, grammars, genetic
programming, planning, Monte-Carlo Tree Search
• Neural architectures: See part 3
• Hyperparameter optimization: optimize remaining hyperparameters
• Optimization: grid/random search, Bayesian optimization, evolution,
multi-armed bandits, gradient descent (only NNs)
• Meta-learning: See part 2
• Can be solved consecutively, simultaneously or interleaved
• Compositionality: breaking down the learning process into smaller reusable
tasks makes it easier, more transferable, more robust
Architecture vs hyperparameters
!16
17. • Parameterize best-practice (linear) pipelines
• Introduce conditional hyperparameters
• Combined Algorithm Selection and Hyperparameter optimization (CASH):
Pipeline search: fixed pipelines
λr ∈{A(1)
, . . . , A(k)
, ∅}
Λ𝒜 = Λ(1)
× Λ(2)
× . . . × Λ(m)
× λr
Could such a pipeline be learned?
Feurer et al. 2015, Thornton et al. 2013
!17
18. Pipeline search: genetic programming
• Tree-based pipeline optimization
• Start with random pipelines,
best of every generation will
cross-over or mutate
• GP primitives include data copy
and feature joins: trees
• Multi-objective optimization:
accurate but short
• Easy to parallelize
• Asynchronous evolution (GAMA)
Olson, Moore 2016, 2019
copy
change
insert
shrink
Gijsbers, Vanschoren 2019
(TPOT)
!18
20. Pipeline search: genetic programming
• Grammar-based genetic programming (RECIPE) [de Sa et al. 2017]
• Encodes background knowledge, avoids invalid pipelines
• Cross-over and mutation respect the grammar
production rule non-terminaloptional
terminal
!20
21. Pipeline search: Monte Carlo Tree Search
• Use MCTS to search for optimal pipelines
• Optimize the structure and hyperparameters simultaneously by
building a surrogate model to predict configuration performance
• Bayesian surrogate model: MOSAIC [Rakotoarison et al. 2019]
• Neural network: AlphaD3M [Drori et al. 2018]
[Chaslot et al., 2008]!21
22. Pipeline search: planning
• Hierarchical planners: ML-Plan [Mohr et al. 2018], P4ML [Gil et al. 2018]
• Graph search problem, can be solved with best first search
• Use random path completion (as in MCTS) to evaluate each node
• Hyperparameter optimization interleaved as possible actions
score
!22 tune
23. Hyperparameter optimization (HPO)
!23
Black box optimization: expensive for machine learning.
Sample efficiency is important!
λ* = argmin
λ∈Λ𝒜
𝔼(Dtrain,Dvalid)∼𝒟ℒ(λ, Dtrain, Dtest)
24. HPO: grid and random search
!24
Bergstra & Bengio, JMLR 2012
• Random search handles unimportant dimensions be4er
• Easily parallelizable, but uninformed (no learning)
25. • Start with a few (random) hyperparameter configurations
• Build a surrogate model to predict how well other configurations will
work: mean and standard deviation (blue band)
• Any probabilistic regression model: e.g. Gaussian processes
• To avoid a greedy search, use an acquisition function to trade off
exploration and exploitation, e.g. Expected Improvement (EI)
• Sample for the best configuration under that function
μ σ
HPO: Bayesian Optimization
Mockus [1974]
μ
μ + σ
μ −σ
26. • Repeat
• Stopping criterion:
• Fixed budget (time,
evaluations)
• Min. distance
between configs
• Threshold for
acquisition function
• Still overfits easily
• Convergence results
• Good for non-convex,
noisy objectives
• Used in AlphaGo
!
HPO: Bayesian Optimization
Srinivas et al. 2010, Freitas et al.
2012, Kawaguchi et al. 2016
31. Jeroen van Hoof
(under review)
Gradient Boosting Surrogate
Estimate uncertainty with
quantile regression
!31
32. !32
HPO: Tree of Parzen Estimators
1.Test some hyperparameters
2. Separate into good and
bad hyperparameters (with
some quantile)
3. Fit non-parametric KDE
for and
4. For a few samples,
evaluate
Shown to be equivalent to EI!
Efficient, parallelizable,
robust, but less sample
efficient than GPs
Used in HyperOpt-sklearn
[Komer et al. 2019]
p(λ = good)
p(λ = bad)
p(λ = good)
p(λ = bad)
33. HPO: population-based methods
!33
• Less sample efficient, but easy to parallelize, and adapts to changes
• Gene@c programming
• Muta@ons: add, mutate/tune, remove HP
• Par@cle swarm op@miza@on
• Covariance matrix adapta@on evolu@on (CMA-ES)
• Purely con@nuous, expensive
• Very compe@@ve to op@mize deep neural nets
Olson, Moore 2016, 2019
Mantovani et al 2015
mutate
Hansen 2015
[Loshilov, Hu-er 2016]
35. !35
Multi-fidelity approaches
General techniques for cheap approximations of black-box
λ* = argmin
λ∈Λ𝒜
𝔼(Dtrain,Dvalid)∼𝒟ℒ(λ, Dtrain, Dtest)
data subsets
fewer epochs, shorter MCMC chains,
fewer RL trials,…
downsampled images
λmf
36. !36
Multi-fidelity approaches
Cheap low-fidelity surrogates: train on small subsets, infer which regions
may be interesting to evaluate in more depth
• Evaluate random samples on smallest set
• Update a Bayesian surrogate model
• Select fewer points based on surrogate, evaluate on more data, repeat
Swersky et al. 2013, 2014; Klein et al 2017; Kandasamy et al 2017
37. !37
Multi-fidelity approaches
Successive halving:
• Randomly sample candidates and evaluate on a small data sample
• retrain the best half candidates on twice the data
Jamieson & Talwalkar 2016
1/16 1/8 1/4 1/2
39. !39
Multi-fidelity approaches
Hyperband: Repeated, decreasingly aggressive successive halving
• Minimizes (doesn’t eliminate) chance that candidate was pruned to early
• Strong anytime performance, easy to implement, scalable, parallelizable
Li et al. 2017
40. !40
Multi-fidelity approaches
Combined Bayesian Optimization and Hyperband (BO-HB)
• BayesOpt: choose which configurations to evaluate
• Hyperband: allocate budgets more efficiently
• Strong anytime and final performance
Falkner et al. 2018
41. !41
Early stopping
Learning curve prediction
• Model learning curves as parametric functions, fit
• Train a Bayesian Neural Net to predict learning curves
[Domhan et al 2015]
[Klein et al 2017]
Data allocation using upper bounds (DAUB)
• Fit linear function through latest estimates
• Compute upper performance bound
Sabharwal et al. 2015
CNN learning curves
42. Hyperparameter gradient descent
• Bilevel program
• Outer obj.: optimize
• Inner obj.: optimize weights given
• Derive through the entire SGD
• Get hypergradients wrt. validation loss
• Expensive! But useful if you have many
hyper parameters and high parallelism
• Interleave optimization steps
• Alternate SGD steps for and
λ
λ
ω λ
[Franceschi et al 2018]
Optimize neural network hyperparameters and weights simultaneously
[MacLaurin et al 2015]
[Luketina et al 2016]
!42
43. !43
Meta-learning
Meta-learning can drastically speed up the architecture and hyperparameter
search, in combination with the optimization techniques we just discussed
• Learn which hyperparameters are really important
• Learn which hyperparameter values should be tried first
• Learn which architectures will most likely work
• Learn how to clean and annotate data
• Learn which feature representations to use
• …
44. Thank you! To be continued…
Part 2: meta-learning
Image: Pesah et al. 2018!44
45. Joaquin Vanschoren
Eindhoven University of Technology
j.vanschoren@tue.nl @joavanschoren http://bit.ly/openml
Advanced Course on Data Science & Machine Learning
Siena, 2019
Automatic Machine Learning & Meta-Learning
slides!
part 2
46. !2
Task ModelsOptimization ModelsModels
optimize architecture
and hyperparameters
Overview
AutoML introduction, promising optimization techniques
Meta-learning
part 1
part 2
part 3 Neural architecture search, meta-learning on neural nets
Tasks ModelsMeta-learning ModelsModels
optimize architecture
and hyperparameters
TasksTasks
Tasks ModelsNAS, meta-learning ModelsModels
optimize neural architecture,
model/hyperparameters
TasksTasks
47. Learning is a never-ending process
!3
Humans don’t learn from scratch
48. Learning is a never-ending process
!4
Learning humans also seek/create related tasks
E.g. find many similar puzzles, solve them in different ways,…
49. Learning is a never-ending process
Task 1 Task 2 Task 3
Models ModelsModelsModels
Model
performance performance performance
Learning
ModelsModelsModelsModels
Learning Learning
Learning
episodes
meta-
learning
meta-
learning
meta-
learning
!5
Humans learn across tasks
Why? Requires less trial-and-error, less data
50. Task 1 Task 2 Task 3
ModelsModelsModels
performance performance performance
LearningLearning Learning
ModelsModelsModels
ModelsModelsModels
inductive bias
!6
Inductive bias: assumptions added to the training data to learn effectively
If prior tasks are similar, we can transfer prior knowledge to new tasks
Learning to learn
New Task
performance
ModelsModelsModels
Learning
prior beliefs
constraints
model parameters
representations
training data
51. Task 1 Task 2 Task 3
ModelsModelsModels
performance performance performance
LearningLearningLearners
LearningLearningLearners
LearningLearningLearners
ModelsModelsModels
ModelsModelsModels
}meta-data
!7
Meta-learning
New Task
performance
ModelsModelsModels
meta-learner
base-learner
additional
experiments
Meta-learner learns a (base-)learning algorithm, based on meta-data
52. Meta-data
New Task
meta-learner
ModelsModelsModels
Task 1 Task j
ModelsModelsModels
performance performance
LearningLearningLearningLearning
ModelsModelsModels
…
performance
!8
Learners Learners
Pi,j
λi
mj
!i,j,k
Pi,j
λi
mj
!i,j,k
Description of task j (meta-features)
Configuration i (architecture + hyperparameters)
Model parameters (e.g. weights)
Performance estimate of model built with λi on task j
53. How?
New Task
meta-learner
ModelsModelsModels
Task 1 Task j
ModelsModelsModels
performance performance
LearningLearningLearningLearning
ModelsModelsModels
…
performance
1. Learn from observing learning algorithms (for any task)
2. Learn what may likely work (for somewhat similar tasks)
3. Learn from previous models (for very similar tasks)
!9
Learners Learners
Pi,j
λi
mj
!i,j,k
Pi,jλi
mj
!i,j,k
Pi,jλi
Pi,jλi(mj)
54. 1. Learn from observing learning algorithms
Define, store and use meta-data:
- configurations: settings that uniquely define the model
- performance (e.g. accuracy) on specific tasks
New Task
meta-learner
ModelsModelsModels
configurations
performances
Task 1 Task j
ModelsModelsModels
performance performance
LearningLearningLearningLearning
ModelsModelsModels
…
performance
λi
Pi,j
!10
Learners Learners
(hyperparameters, pipeline,
neural architecture,…)
55. 1. Learn from observing learning algorithms
Define, store and use meta-data:
- configurations: settings that uniquely define the model
- performance (e.g. accuracy) on specific tasks
New Task
meta-learner
ModelsModelsModels
configurations
performances
Task 1 Task j
ModelsModelsModels
performance performance
LearningLearningLearningLearning
ModelsModelsModels
…
performance
λi
Pi,j
!10
Learners Learners
(hyperparameters, pipeline,
neural architecture,…)
task model config λ score P
0 alg=SVM, C=0.1 0,876
0 alg=NN, lr=0.9 0,921
0 alg=RF, mtry=0.1 0,936
1 alg=SVM, C=0.2 0,674
1 alg=NN, lr=0.5 0,777
1 alg=RF, mtry=0.4 0,791
56. Rankings
• Build a global (multi-objective) ranking, recommend the top-K
• Can be used as a warm start for optimization techniques
• E.g. Bayesian optimization, evolutionary techniques,…
Tasks
ModelsModelsModels
performance
LearningLearningLearning
1. λa
2. λb
3. λc
4. λd
5. λe
6. …
New Task
meta-learner
ModelsModelsModels
performance
Global ranking
(task agnostic)
λa..k
warm
start
Leite et al. 2012
Abdulrahman et al. 2018
}
(discrete)
(multi-objective)
Pi,j
!11
λi
57. • What if prior configurations are not optimal?
• Per task, fit a differentiable plugin estimator on all evaluated configurations
• Do gradient descent to find optimized configurations, recommend those
Tasks
ModelsModelsModels
performance
LearningLearningLearning
New Task
meta-learner
ModelsModelsModels
performance
Warm-starting with plugin estimators
λ*i
per task:
task 1: λ*1
task 2: λ*2
task 3: λ*3
…
Wistuba et al. 2015
λi
}Pi,j
!12
58. • Functional ANOVA 1
Select hyperparameters that cause variance in the evaluations.
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
Hyperparameter behavior
hyperparameter
importance
1 van Rijn & Hutter 2018
λ1
λ2
λ3
λ4 constraints
priors
Pi,j
!13
λi
59. 1 van Rijn & Hutter 2018
SVM (RBF)
Hyperparameter behavior
• Functional ANOVA 1
Select hyperparameters that cause variance in the evaluations.
AdaBoost
60. 1 van Rijn & Hutter 2018
ResNets for image classification (under review)
• Functional ANOVA 1
Select hyperparameters that cause variance in the evaluations.
Hyperparameter behavior
61. 1 van Rijn & Hutter 2018
Priors (KDE) for the most important hyperparameters
• Functional ANOVA 1
Select hyperparameters that cause variance in the evaluations.
Learn priors for hyperparameter tuning (e.g. Bayesian optimization)
Hyperparameter behavior
62. • Functional ANOVA 1
Select hyperparameters that cause variance in the evaluations.
• Tunability 2,3,4
Learn good defaults, measure improvement from tuning over defaults
1 van Rijn & Hutter 2018
2 Probst et al. 2018
!17
3 Weerts et al. 2018Hyperparameter behavior
4 van Rijn et al. 2018
63. • Search space pruning
Exclude regions yielding bad performance on (similar) tasks
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
constraints
Pi,j
Wistuba et al. 2015
P
λ1
!18
λi
λ2
Hyperparameter behavior
64. • Experiments on the new task can tell us how it is similar to previous tasks
• Task are similar if observed performance of configurations is similar
• Use this to recommend new configurations, run, repeat
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
Learning task similarity
λc
Pi,j
!19
λi
Pc,new ≅ Pc,j
65. • Learn task similarity while tuning configurations
• Tournament-style selection, warm-start with overall best config λbest
• Next candidate λc : the one that beats current λbest on similar tasks
Tasks
ModelsModelsModels
performance
LearningLearningLearning
}
New Task
meta-learner
ModelsModelsModels
performance
Active testing
Leite et al. 2012
λc
Select λc > λbest
on similar tasks
(discrete)
Pi,j
!20
λi
Sim(tj,tnew) = Corr([RLa,b,j],[RLa,b,new])
Pc,j
66. • If task j is similar to the new task, its surrogate model Sj will likely transfer well
• Sum up all Sj predictions, weighted by task similarity (as in active testing)1
• Build combined Gaussian process, weighted by current performance on new task2
Tasks
ModelsModelsModels
performance
LearningLearningLearning
New Task
meta-learner
ModelsModelsModels
performance
per task tj:
Pi,j
}
Surrogate model transfer
1 Wistuba et al. 2018
λi
P
Sj
2 Feurer et al. 2018
S= ∑ wj Sj
+
+
S1
S2
S3
!21
λi
67. • Multi-task Gaussian processes: train surrogate model on t tasks simultaneously1
• If tasks are similar: transfers useful info
• Not very scalable
• Bayesian Neural Networks as surrogate model2
• Multi-task, more scalable
• Stacking Gaussian Process regressors (Google Vizier)3
• Continual learning (sequential similar tasks)
• Transfers a prior based on residuals of previous GP
Multi-task Bayesian optimization
1 Swersky et al. 2013
Independent GP predictions Multi-task GP predictions
2 Springenberg et al. 2016
3 Golovin et al. 2017
!22
68. • Bayesian linear regression (BLR) surrogate model on every task
• Use neural net to learn a suitable basis expansion ϕz(λ) for all tasks
• Scales linearly in # observations, transfers info on configuration space
Tasks
ModelsModelsModels
performance
LearningLearningLearning New Task
meta-learner
ModelsModelsModels
performance
}
More scalable variants
Perrone et al. 2018
P
BLR
surrogate
(λi,Pi,j)
φz(λ)i
warm-start (pre-train)
λi
Bayesian optimization
φz(λ)
BLR hyperparameters
Pi,j
!23
λi
69. 2. Learn what may likely work (for partially similar tasks)
Meta-features: measurable properties of the tasks
(number of instances and features, class imbalance, feature skewness,…)
configurations
performances
similar mj ?Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… meta-features New Task
meta-learner
ModelsModelsModels
performance
mj
Pi,j
!24
λi
70. 2. Learn what may likely work (for partially similar tasks)
Meta-features: measurable properties of the tasks
(number of instances and features, class imbalance, feature skewness,…)
configurations
performances
similar mj ?Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… meta-features New Task
meta-learner
ModelsModelsModels
performance
mj
Pi,j
!24
λi
meta-
features
model config
λ
score
P
60000,5,0.2 alg=SVM, C=0.1 0,876
60000,5,0.2 alg=NN, lr=0.9 0,921
60000,5,0.2 alg=RF, mtry=0.1 0,936
11233,4,0.1 alg=SVM, C=0.2 0,674
11233,4,0.1 alg=NN, lr=0.5 0,777
11233,4,0.1 alg=RF, mtry=0.4 0,791
71. • Hand-crafted (interpretable) meta-features1
• Number of instances, features, classes, missing values, outliers,…
• Statistical: skewness, kurtosis, correlation, covariance, sparsity, variance,…
• Information-theoretic: class entropy, mutual information, noise-signal
ratio,…
• Model-based: properties of simple models trained on the task
• Landmarkers: performance of fast algorithms trained on the task
• Domain specific task properties
• Optimized (recommended) representations exist 2
Meta-features
1 Vanschoren 2018
!25
2 Rivolli et al 2019
72. • Learning a joint task representation
• Deep metric learning: learn a representation hmf using ground truth distance
• Siamese Network: similar task, similar representation 1
• Feed through pretrained CNN, extract vector from weights 2
• Recommend neural architectures, or warm-start neural architecture search
Meta-features
1 Kim et al. 2017
!26
2 Achille et al. 2019
73. • Find k most similar tasks, warm-start search with best λi
• Auto-sklearn: Bayesian optimization (SMAC)
• Meta-learning yield better models, faster
• Winner of AutoML Challenges
Tasks
ModelsModelsModels
performance
LearningLearningLearning
New Task
meta-learner
ModelsModelsModels
performance
Pi,j
}
Warm-starting from similar tasks
λ1..k
mj
best λi on
similar tasks
Feurer et al. 2015
!27
λi
Bayesian optimization
λ
P
λ1
λ3
λ2
λ4
74. Warm-starting from similar tasks
Fusi et al. 2017
Pi,j
λi
TL
λL
tj
tnew warm-started
with λ1..k
. . .. . . .. . . . .
P
λiλLi
P
p(P|λLi)
latent representation
• Learn latent representation
for tasks and configurations
• Use meta-features to
warm-start on new task
• Returns probabilistic
predictions for BayesOpt
• Collaborative filtering: configurations λi are `rated’by tasks tj
New Task
meta-learner
ModelsModelsModels
performance
λ1..k
Pi,j
}
mj
(discrete)
λi
75. • Train a task-independent surrogate model with meta-features in inputs
• SCOT: Predict ranking of λi with surrogate ranking model + mj. 1
• Predict Pi,j with multilayer Perceptron surrogates + mj. 2
• Build joint GP surrogate model on most similar ( ) tasks. 3
• Scalability is often an issue
Tasks
ModelsModelsModels
performance
LearningLearningLearning
New Task
meta-learner
ModelsModelsModels
performance
Pi,j
}
Global surrogate models
mj
1 Bardenet et al. 2013
2 Schilling et al. 2015
mj ✖ λi
P
3 Yogatama et al. 2014
2
!29
λi
task1 task2
joint
76. • Learn direct mapping between meta-features and Pi,j
• Zero-shot meta-models: predict best λi given meta-features 1
• Ranking models: return ranking λ1..k 2
• Predict which algorithms / configurations to consider / tune3
• Predict performance / runtime for given !i and task4
• Can be integrated in larger AutoML systems: warm start, guide search,…
meta-learner
Meta-models
λbest
1 Brazdil et al. 2009, Lemke et al. 2015
2 Sun and Pfahringer 2013, Pinto et al. 2017
meta-learner λ1..k
mj
mj
meta-learner
Pijmj, λi
3 Sanders and C. Giraud-Carrier 2017
meta-learner
Λmj
4 Yang et al. 2018
!30
77. Learning to learn through self-play
!31
• Build pipelines by inserting, deleting, replacing components (actions)
• Neural network (LSTM) receives task meta-features, pipelines and evaluations
• Predicts pipeline performance and action probabilities
• Monte Carlo Tree Search builds pipelines, runs simulations against LSTM
Drori et al 2017
New Task
meta-learner
ModelsModelsModels
performance
self-play
mj
λi
78. 3. Learn from previous models (for very similar tasks)
configurations
performances
Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… New Task
meta-learner
ModelsModelsModels
performance
model parameters
Models trained on intrinsically similar tasks
(model parameters, features,…)
intrinsically (very) similar
(e.g. shared representation)
!k
Pi,j
!32
λi
79. 3. Learn from previous models (for very similar tasks)
configurations
performances
Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… New Task
meta-learner
ModelsModelsModels
performance
model parameters
Models trained on intrinsically similar tasks
(model parameters, features,…)
intrinsically (very) similar
(e.g. shared representation)
!k
Pi,j
!32
λi
model
param !
model config
λ
score
P
0.34,0.43,… alg=NN, lr=0.9 0,876
0.04,0.03,… alg=NN, lr=0.9 0,921
0.94,0.03,… alg=NN, lr=0.9 0,936
0.34,0.23,… alg=NN, lr=0.5 0,674
0.04,0.23,… alg=NN, lr=0.5 0,777
0.94,0.23,… alg=NN, lr=0.5 0,791
85. !4
Neural Architecture Search space
Sequential (chain) Arbitrary graph
Choose:
• number of layers
• type of layers
• dense
• convolutional
• max-pooling
• …
• hyperparameters of
layers
+ easier to search
- sometimes too
simple
Choose:
• branching
• joins
• skip connections
• types of layers
• hyperparameters
of layers
+ more flexible
- much harder
to search
Elsken et al. 2019
86. !5
Neural Architecture Search space
Observation: successful deep networks have repeated motifs (cells)
e.g. Inception v4:
87. !6
Cell search space
• parameterize building
blocks (cells)
• e.g. regular cell +
resolution reduction cell
• stack cells together in
macro-architecture
• usually a chain
• can be manual or
learned
+ smaller search space
+ cells can be learned on a
small dataset & transferred
to a larger dataset
- you can’t learn entirely
new architectures
Zoph et al 2018
88. !7
Within-cell search space
• Different strategies to construct cell, leading to different parameterizations
• Can be parameterized as set of categorical hyperparameters:
• Which existing layer (hidden state, e.g. cell input) to build on
• Which operation (e.g. 3x3conv) to perform on
• How to combine into new hidden state
• Iterate over B phases (blocks)
Hi
Hi
Zoph et al 2018
89. !8
Neural architecture optimization
Standard hyperparameter optimization techniques
1 Bergstra et al. 2013
• Kernels to optimize neural nets with GP-based Bayesian optimization
• ArcNet2: DNNs, 23 hyperparameters, 6 kernels
• NASBOT3: DNNs and CNNs
• Image classification pipelines1
• 238 hyperparameters, tuned with TPE
2 Swersky et al 2013
3 Kandasamy et al. 2018
90. !9
Neural architecture optimization
• AutoNet 1 : DNNs, 63 hyperparameters, tuned with SMAC
• Joint NAS + HPO 2: ResNets, tuned with BO-HB
• PNAS (Progressive NAS) 3
• Cell search space, optimized with SMAC, HPO afterwards
• SotA on ImageNet, CIFAR
1 Mendoza et al. 2016
3 Liu et al. 2018
Standard hyperparameter optimization techniques
2 Zela et al. 2018
91. !10
1. Observe state
2. Take action u based on policy ( are weights in an RNN)
3. New state is formed
4. Take further actions based on new state
5. After trajectory of motions, adjust policy based on total reward R( )
πθ θ
τ τ
Deep RL recap
policy gradient estimate (H steps, m samples)
HuiLevine
R
92. !11
NAS with Reinforcement learning
πθ
Zoph & Le 2017
RNN policy network:
• generate architecture step by step
• evaluate to get reward
• update with policy gradient
2-layer LSTM (REINFORCE), chains of convolutional layers
• State of the art on CIFAR-10, Penn Treebank
• 800 GPUs for 3-4 weeks, 12800 architectures
93. !12
NAS with Reinforcement learning
πθ
RNN policy network:
• generate architecture step by step
• evaluate to get reward
• update with policy gradient
1-layer LSTM (PPO), cell space search
• State of the art on ImageNet
• 450 GPUs, 20000 architectures
Zoph et al 2018
94. !13
Neuroevolution
Angeline et al 1994, Stanley and Miikkulainen 2002
Earlier solutions tried to:
• Optimize both the configuration and the weights simultaneously
with genetic algorithms
• Optimize individual nodes and connections
• Doesn’t scale to deep networks
95. !14
Neuroevolution
Real et al 2017; Miikkulainen et al 2017
Better:
• learn the architecture and configuration with evolutionary techniques
• train the network with SGD
• mutation steps: add, change, remove layer
alive, killed
97. !16
Neuroevolution
AmoebaNet: State of the art on ImageNet, CIFAR-10
• Cell search space, aging evolution (kill oldest networks)
• More efficient than reinforcement learning
Real et al 2019
98. !17
Network morphisms
Chen et al. 2016, Wei et al. 2016, Cai et al. 2017
Cai et al. 2018, Elsken et al. 2017, Cortes et al. 2017, Cai et al. 2018
• Change architecture, but not modelled function
• Non-black box, allows much more efficient architecture search
init. identity
99. !18
Weight sharing
1 Saxena & Verbeek, 2016
• One-shot models (convolutional neural fabric) 1
• Search space: ConvNet = path though fabric with shared weights
• Train ensemble of all of them For 7-layer CNNs:
• Path dropout 2
• ensure that individual paths do well
2 Bender et al. 2018
3 Pham et al. 2018
4 Brock et al. 2017
• ENAS 3
• use RL to sample paths from one-shot model, train weights
• other paths inherit these weights
• SMASH 4
• Shared hypernetwork that predicts weights given architecture
100. DARTS: Differentiable NAS
Liu et al. 2018
convolution
max pooling
zero
• Fixed (one-shot) structure, learn which operators to use
• Give all operators a weight
• Optimize and model weights using bilevel programming
αi
αi ωj
One-shot model operator weights αi
interleaved optimization
of and with SGDαi ωj!19
argmax αi
101. DARTS: Differentiable NAS
• Lots of further refinements
• SNAS
• Use Gumbel softmax (differentiable) on
• Proxyless NAS
• Memory-efficient DARTS
• trains sparse architectures only
• …
αi
!20
[Xie et al 2019]
[Cai et al 2019]
102. Learning to learn from previous models
configurations
performances
Task 1 Task N
ModelsModelsModels
performance performance
LearningLearningLearning
LearningLearningLearning
ModelsModelsModels
… New Task
meta-learner
ModelsModelsModels
performance
model parameters
For models trained on intrinsically similar tasks
(model parameters, features,…)
intrinsically (very) similar
(e.g. shared representation)
!k
Pi,j
!21
λi
103. !22
Learning to learn
Learn a better gradient update rule
for a group of tasks
Learn a better initialization (and representation)
for a group of tasks
104. Learning to learn by gradient descent
• Our brains probably don’t do backprop, replace it with:
• Simple parametric (bio-inspired) rule to update weights 1
• Single-layer neural network to learn weight updates 2
• Learn parameters across tasks, by gradient descent (meta-gradient)
1 Bengio et al. 1995
2 Runarsson and Jonsson 2000
learning rate
presynaptic activity
reinforcing signal
Tasks
meta-learner
performance
ModelsModelsModels
Δ !i = " ( )
meta-gradient
weights λi!23
learning rate
learn λi
gradient
descent
λi
λinit
learner
Bengio et al.
Runarsson and Jonsson
Δ !i
105. Learning to learn gradient descent
2 Andrychowicz et al. 2016
1 Hochreiter 2001
• Replace backprop with a recurrent neural net (LSTM)1, not so scalable
• Use a coordinatewise LSTM [m] for scalability/flexibility (cfr. ADAM, RMSprop) 2
• Optimizee: receives weight update gt from optimizer
• Optimizer: receives gradient estimate ∇t from optimizee
• Learns how to do gradient descent across tasks
hidden state
optimisee
weights
New task
Model
meta-
model
by gradient descent
!24LSTM parameters shared for all !
Single
network!
106. Learning to learn gradient descent
2 Andrychowicz et al. 2016
1 Hochreiter 2001
• Left: optimizer and optimizee trained to do style transfer
• Right: optimizer solves similar tasks (different style, content and
resolution) without any more training data
by gradient descent
!25
107. Application: Few-shot learning
• Learn how to learn from few examples (given similar tasks)
• Meta-learner must learn how to train a base-learner based on prior experience
• Parameterize base-learner model and learn the parameters !i
Ttrain
Image: Hugo Larochelle
meta-model
Model M
!i+1
Tj
Ttest
Ttest
!i
Pi,j
λk
!26
Pi+1,test
X_test
y_test
y_test
X_train y_train
Co st(θi) =
1
|Ttest | ∑
t∈Ttest
lo ss(θi, t) 1-shot, 5-class:
new classes!
meta
meta
108. Few-shot learning: approaches
!27
• Existing algorithm as meta-learner:
• LSTM + gradient descent
• Learn !init +gradient descent
• kNN-like: Memory + similarity
• Learn embedding + classifier
• …
• Black-box meta-learner
• Neural Turing machine (with memory)
• Neural attentive learner
• …
Co st(θi) =
1
|Ttest | ∑
t∈Ttest
lo ss(θi, t)
Santoro et al. 2016
Mishra et al. 2018
meta-model
Model M
!i+1
Tj Ttest
!i
Pi,j
λk
Pi+1,test
Ravi and Larochelle 2017
Finn et al. 2017
Vinyals et al. 2016
Snell et al. 2017
109. LSTM meta-learner + gradient descent
Ravi and Larochelle 2017
!28
Train Test
Co st(θT) =
1
|Ttest | ∑
t∈Ttest
lo ss(θT, t)
LSTM LSTM LSTM LSTM
M M M M M
• Gradient descent update !t is similar to LSTM cell state update ct
• Hence, training a meta-learner LSTM yields an update rule for training M
• Start from initial !0, train model on first batch, get gradient and loss update
• Predict !t+1 , continue to t=T, get cost, backpropagate to learn LSTM weights, optimal !0
forget
input
111. !30
Learning to learn
Learn a better gradient update rule
for a group of tasks
Learn a better initialization (and representation)
for a group of tasks
112. Transfer learning
• Pre-train weights from:
• Large image datasets (e.g. ImageNet) 1
• Large text corpora (e.g. Wikipedia) 2
• Use these weights as initialization for new task
• Fails if tasks are not similar enough 3
frozen new
pre-trained new
frozen
Source
tasks
ModelsModels
performance
LearningLearningLearning Feature extraction:
remove last layers, use output as features
if task is quite different, remove more layers
End-to-end tuning:
train from initialized weights
Fine-tuning:
unfreeze last layers, tune on new task
sm
all target task
large
similar
large
different
filters
1 Razavian et al. 2014
3 Yosinski et al. 2014
2 Mikolov et al. 2013
new
pre-trained
convnet
!31
113. • Solve new tasks faster by learning a model initialization from
similar tasks
• Current initialization !
• On K examples/task, evaluate
• Update weights for
• Update ! to minimize
sum of per-task losses
• Repeat
Model-agnostic meta-learning
1 Finn et al. 2017
!32
∇θ LTi
( fθ)
θ1, θ2, θ3
114. • Solve new tasks faster by learning a model initialization from
similar tasks
• Current initialization !
• On K examples/task, evaluate
• Update weights for
• Update ! to minimize
sum of per-task losses
• Repeat
Model-agnostic meta-learning
1 Finn et al. 2017
!32
∇θ LTi
( fθ)
θ1, θ2, θ3
115. • Solve new tasks faster by learning a model initialization from
similar tasks
• Current initialization !
• On K examples/task, evaluate
• Update weights for
• Update ! to minimize
sum of per-task losses
• Repeat
Model-agnostic meta-learning
1 Finn et al. 2017
!32
∇θ LTi
( fθ)
θ1, θ2, θ3
116. Model-agnostic meta-learning
• More resilient to overfitting
• Generalizes better than LSTM approaches for few shot learning
• Universality 1: no theoretical downsides in terms of expressivity when
compared to alternative meta-learning models.
• Many variants and applications:
• REPTILE: do SGD for k steps in one task, only then update init. weights 2
• PLATIPUS: probabilistic MAML
• Bayesian MAML (Bayesian ensemble)
• Online MAML
• …
!33
1 Finn et al. 2018
2 Nichol et al. 2018
Finn & Levine 2019
118. !35
Neural Architecture Meta-Learning
Wong et al. 2018• Warm-start a deep RL controller based on prior tasks
• Much faster than single-task equivalent
• Another idea: warm-start DARTS with task-specific one-shot model
119. Learning to reinforcement learn
!36
1 Duan et al. 2017
2 Wang et al. 2017
3 Duan et al. 2017
Environments
meta-RL
algorithm
performance
policy #θ
fast RL
agent
policy #θ
Similar env.
performance
• Humans often learn to play new games much faster than RL techniques do
• Reinforcement learning is very suited for learning-to-learn:
• Build a learner, then use performance as that learner as a reward
impl
• Learning to reinforcement learn 1,2
• Use RNN-based deep RL to train a
recurrent network on many tasks
• Learns to implement a‘fast’RL agent,
encoded in its weights
120. Learning to reinforcement learn
!37
• Also works for few-shot learning 3
• Condition on observation + upcoming demonstration
• You don’t know what someone is trying to teach you, but you
prepare for the lesson
1 Duan et al. 2017
2 Wang et al. 2017
3 Duan et al. 2017
121. Learning to learn more tasks
!38
• Active learning
• Deep network (learns representation) + policy network
• Receives state and reward, says which points to query next
• Density estimation
• Learn distribution over small set of images, can generate new ones
• Uses a MAML-based few-shot learner
• Matrix factorization
• Deep learning architecture that makes recommendations
• Meta-learner learns how to adjust biases for each user (task)
• Replace hand-crafted algorithms by learned ones.
• Look at problems through a meta-learning lens!
Pang et al. 2018
Reed et al. 2017
Vartak et al. 2017
123. Meta-data sharing
!40
Vanschoren et al. 2014
models built
by humans
models built
by AutoML bots
building a shared memory
• Train and run AutoML bots on any OpenML dataset
• Never-ending learning (from robots and humans)
124. !41
AutoML tools and benchmarks
Open source benchmark on https://openml.github.io/automlbenchmark/
• On OpenML datasets, will regularly be updated, anyone can add
Gijsbers et al. 2019
125. !42
AutoML open source tools
Architecture
search
Hyperparameter
search
Improvements Meta-learning
Auto-WEKA
Parameterized
pipeline
Bayesian
Optimization (RF)
auto-sklearn
Parameterized
pipeline
Bayesian
Optimization (RF)
Ensembling warm-start
BO-HB
Parameterized
pipeline
Tree of Parzen
Estimators
Ensembling,
Hyperband
hyperopt-sklearn
Parameterized
pipeline
Tree of Parzen
Estimators
TPOT
Evolving pipelines
(trees)
Genetic
programming
GAMA
Evolving pipelines
(trees)
Asynchronous
evolution
Ensembling,
Hyperband
H2O AutoML Single algorithms random search Stacking
ML-Plan
Planning-based
pipeline
Hierarchical
planner
OBOE Single algorithms
Low rank
approximation
Ensembling Runtime predictor
126. !43
AutoML tools and benchmarks
Observations so far (for pipeline search):
• No AutoML system outperforms all others (Auto-WEKA is worse)
• Auto-sklearn, H2O, TPOT can outperform each other on given datasets
• Tuned RandomForests are a strong baseline
• High numbers of classes, unbalanced classes, are a problem for most tools
• Little support for anytime prediction …
We need good benchmarks for NAS as well…
127. !44
AutoML Gym
Environment to train general-purpose AutoML algorithms
• Open benchmark, meta-learning, continual learning, efficient NAS
Join
Us!
Open PostDoc position!
http://bit.ly/automl-gym
128. Towards human-like learning to learn
!45
• Learning-to-learn gives humans a significant advantage
• Learning how to learn any task empowers us far beyond knowing
how to learn specific tasks.
• It is a universal aspect of life, and how it evolves
• Very exciting field with many unexplored possibilities
• Many aspects not understood (e.g. task similarity), need more
experiments.
• Challenge:
• Build learners that never stop learning, that learn from each other
• Fast learning by slow meta-learning
• Build a global memory for learning systems to learn from
• Let them explore / experiment by themselves
129. Thank you!
special thanks to
Pavel Brazdil, Matthias Feurer, Frank Hutter, Erin Grant,
Hugo Larochelle, Raghu Rajan, Jan van Rijn, Jane Wang
more to learn
http://www.automl.org/book/
Chapter 2: Meta-Learning
!46
Never stop learning