Best Practices for Hyperparameter Tuning with MLflowDatabricks
Hyperparameter tuning and optimization is a powerful tool in the area of AutoML, for both traditional statistical learning models as well as for deep learning. There are many existing tools to help drive this process, including both blackbox and whitebox tuning. In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, Bayesian optimization, and parzen estimators) and then discuss the open source tools which implement each of these techniques. Finally, we will discuss how we can leverage MLflow with these tools and techniques to analyze how our search is performing and to productionize the best models.
Speaker: Joseph Bradley
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
Best Practices for Hyperparameter Tuning with MLflowDatabricks
Hyperparameter tuning and optimization is a powerful tool in the area of AutoML, for both traditional statistical learning models as well as for deep learning. There are many existing tools to help drive this process, including both blackbox and whitebox tuning. In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, Bayesian optimization, and parzen estimators) and then discuss the open source tools which implement each of these techniques. Finally, we will discuss how we can leverage MLflow with these tools and techniques to analyze how our search is performing and to productionize the best models.
Speaker: Joseph Bradley
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
High level introduction to text mining analytics, which covers the building blocks or most commonly used techniques of text mining along with useful additional references/links where required for background/literature and R codes to get you started.
In this talk, Dmitry shares his approach to feature engineering which he used successfully in various Kaggle competitions. He covers common techniques used to convert your features into numeric representation used by ML algorithms.
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
High level introduction to text mining analytics, which covers the building blocks or most commonly used techniques of text mining along with useful additional references/links where required for background/literature and R codes to get you started.
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
While much of the recent literature in spatial statistics has evolved around addressing the big data issue, practical implementations of these methods on high performance computing systems for truly large data are still rare. We discuss our explorations in this area at the National Center for Atmospheric Research for a range of applications, which can benefit from large scale computing infrastructure. These applications include extreme value analysis, approximate spatial methods, spatial localization methods and statistically-based data compression and are implemented in different programming languages. We will focus on timing results and practical considerations, such as speed vs. memory trade-offs, limits of scaling and ease of use.
Many-Objective Performance Enhancement in Computing ClustersTarik Reza Toha
In a heterogeneous computing cluster, cluster objectives are conflicting to each other. Selecting a right combination of machines is necessary to enhance cluster performance, and to optimize all the cluster objectives. In this paper, we perform empirical performance analyses of a real cluster with our year-long collected data, formulate a new many-objective optimization problem for clusters, and integrate a greedy approach with the existing NSGA-III algorithm to solve this problem. From our experimental results, we find our approach performs better than existing optimization approaches.
Transfer Learning for Improving Model Predictions in Robotic SystemsPooyan Jamshidi
Modern software systems are now being built to be used in dynamic environments utilizing configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and, therefore, we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost.
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
The ever increasing energy requirements of supercomputers and server farms is driving the scientific and industrial communities to take in deeper consideration the energy efficiency of computing equipments. This contribution addresses the issue proposing a cluster of ARM processors for high-performance computing. The cluster is composed of five BeagleBoard-xM, with one board managing the cluster, and the other boards executing the actual processing. The software platform is based on the Angstrom GNU/Linux distribution and is equipped with a distributed file system to ease sharing data and code among the nodes of the cluster, and with tools for managing tasks and monitoring the status of each node. The computational capabilities of the cluster have been assessed through High-Performance Linpack and a cluster-wide speaker diarization algorithm, while power consumption has been measured using a clamp meter. Experimental results obtained in the speaker diarization task showed that the energy efficiency of the BeagleBoard-xM cluster is comparable to the one of a laptop computer equipped with a Intel Core2 Duo T8300 running at 2.4 GHz. Furthermore, removing the bottleneck due to the Ethernet interface, the BeagleBoard-xM cluster is able to achieve a superior energy efficiency.
"Attention Is All You Need" Grazie a queste semplici parole, nel 2017 il Deep Learning ha subito un profondo cambiamento. I Transformers, inizialmente introdotti nel campo del Natural Language Processing, si sono recentemente dimostrati estremamente efficaci anche al di fuori di questo settore, ottenendo un enorme - e forse inaspettato - successo nel campo della Computer Vision. I Vision Transformers e moltissime delle sue varianti stanno ridefinendo oggi lo stato dell'arte su molti task di visione artificiale, dalla classificazione di immagini fino ai sistemi di visione per la guida autonoma. Ma cosa sono i Transformers? In che cosa consiste il meccanismo della self-attention che è alla base del loro funzionamento? Quali sono i suoi limiti? Saranno in grado di rimpiazzare le famose reti convoluzionali che hanno, a loro tempo, rivoluzionato la Computer Vision? In questo talk cercheremo di rispondere a tutte queste domande, offrendo un'ampia panoramica sulle idee fondanti, sulle architetture Transformer più utilizzate, e sulle applicazioni più promettenti.
Thanks to Machine Learning (ML), a large amount of data has been put to good use in the recent years, detecting patterns, extracting insights and providing valuable predictions to decision makers.
However, having actionable knowledge is only part of the picture. Be it a robot or a human, once the data is in, a course of action, possibly the best, has to be found, while satisfying several requirements.
For example, deciding which stocks to buy and when, how to schedule deliveries, which widget to produce and in which machinery to invest, are all decision problems.
Operations Research (OR) tries to reliably answer those questions by explicitly modelling decisions and finding the optimal ones for the chosen goals.
Given the tremendous impact decision optimization can yield for a business, OR is one of the best way to exploit and valorize ML models!
In this talk I will present Mathematical Programming, a versatile decision modelling method, and its application to an example Power Plant scheduling problem, solved via open source solvers and Python libraries.
While smaller problems can be solved in reasonable time with a generic solver on a single machine, the same technology is routinely scaled in real-world applications to larger and more complex problems via
generic “decomposition methods”.
I will then present an example of decomposition for the Power Plant problem.
Molti esperimenti nell'ambito delle scienze naturali sono basati sul conteggio di strutture biologiche di interesse, e.g. il numero di cellule che interagiscono con specifici reagenti in diverse condizioni sperimentali. Malgrado il riconoscimento di queste strutture non necessiti tipicamente di particolare expertise, un'ispezione manuale dei campioni da parte di operatori umani è molto onerosa in termini di tempo e personale impiegato. Inoltre, questo processo è soggetto ad errori di varia natura dovuti all'affaticamento degli operatori e alla loro interpretazione soggettiva di alcuni casi limite. L'automatizzazione di questo processo è quindi fondamentale per accelerare gli sviluppi nel settore e permettere confronti più equi tra esperimenti.
In questo meetup approfondiremo cell-ResUnet (c-ResUnet), un approccio Deep Learning che permette di riconoscere e contare cellule neuronali in immagini di microscopia in fluorescenza,
sfruttando una architettura e delle strategie di addestramento pensati specificatamente per questa applicazione.
Negli ultimi anni la robotica sta finalmente uscendo dalle fabbriche per popolare le città in cui viviamo. Auto a guida autonoma, droni e robot per la consegna di cibo, quadrupedi per la sorveglianza delle strade: questi sono solo alcuni esempi di ciò che si può trovare già oggi in molti quartieri nel mondo. La rivoluzione generata dal deep learning a partire dal 2012 è soltanto uno degli elementi di questa diffusione, che si fonda anche su complesse dinamiche di mercato e decenni di ricerca precedente nell'ambito dei sistemi robotici, dal punto di vista sia software che hardware. A che punto siamo arrivati? Quali sono le sfide che i ricercatori e le aziende devono affrontare oggi in questo settore? Quali sono i meccanismi di mercato che guidano lo sviluppo di questi sistemi? In questo talk risponderemo a queste domande, in modo da fornire una panoramica completa sullo stato dell'arte nella robotica mobile urbana.
L’identificazione di anomalie è una tematica sempre più popolare che viene affrontata su più fronti. In generale, l’anomalia rappresenta un’entità, un evento o una caratteristica che non risulta conforme allo standard di normalità. Le anomalie sono un ostacolo, a volte anche pericoloso come per esempio nella sicurezza informatica, in cui l’intrusione di persone non fidate all’interno di sistemi informatici può diventare critico per un’azienda o un’istituzione; in industrie invece, le anomalie possono danneggiare la qualità dei prodotti, causando pesanti perdite in termini economici. Per questo motivo vengono ideate numerose tecniche che permettono di riconoscere le anomalie e ridurre i pericoli, i danni da esse causate o semplicemente per monitorare la qualità e gestire la manutenzione.
In un contesto di immagini, il riconoscimento di anomalie è un problema di Computer Vision. Esistono metodi di ricostruzione come gli Autoencoder o metodi generativi come le GAN che si occupano di risolvere tale problema. Tra i modelli che si basano sulle GAN, chiamati GAN-based, si distingue il modello Ganomaly: esso permette di rilevare se un’immagine sia anomala.
Sulla base di quest’ultimo, nascono Patch-Ganomaly, con cui si vuole migliorare il comportamento di Ganomaly, andando a localizzare la regione anomala di un’immagine, in termini di pixel, e migliorarne efficacia ed efficienza.
Mediante l’utilizzo di transfer learning basato sulla rete VGG16 è possibile ottenere un modello più preciso, TL-Ganomaly. Esso localizza la regione anomala in maniera precisa, in termini di pixel riconosciuti correttamente anomali.
In fase di post-processing inoltre è possibile dare un ulteriore apporto con il modello Conv-Processing, il quale apprende quale kernel convoluzionale riesca a migliorare la segmentazione delle anomalie in fase di post-processing.
Towards Quantum Machine Learning Hands-on
Machine Learning (ML) gained a lot of momentum in the last ten years, mostly thanks to the advancements in non-linear patterns discovery, and more specifically, in Deep Learning (DL). But those who think that DL is going to address all possible problems might be terribly wrong. DL and ML tasks, in general, are categorized as Non-Polynomial problems, which means that the number of possible solutions for a given problem can grow exponentially, making it intractable using the classical algorithmic approach. Here, Quantum Computing (QC) techniques have the potential to address these issues and help ML methods to solve problems faster and sometimes better than the classical counterpart. The conjunction of these two disciplines resulted in a new exciting research direction to explore: Quantum Machine Learning (QML).
towards Quantum Machine Learning
Machine Learning (ML) gained a lot of momentum in the last ten years, mostly thanks to the advancements in non-linear patterns discovery, and more specifically, in Deep Learning (DL). But those who think that DL is going to address all possible problems might be terribly wrong. DL and ML tasks, in general, are categorized as Non-Polynomial problems, which means that the number of possible solutions for a given problem can grow exponentially, making it intractable using the classical algorithmic approach. Here, Quantum Computing (QC) techniques have the potential to address these issues and help ML methods to solve problems faster and sometimes better than the classical counterpart. The conjunction of these two disciplines resulted in a new exciting research direction to explore: Quantum Machine Learning (QML).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. ● Gilberto Batres-Estrada
Senior Data Scientist @ Trell Technologies
● AIFI: Graduate teaching fellow
● Co-author: Big Data and Machine Learning
in Quantitative Investment, Wiley. (Ch on LSTM)
● MSc in Theoretical Physics, Stockholm University
● MSc in Engineering: Applied Mathematics and Statistics ,
(KTH Royal Institute of Technology) in Stockholm.
3. Goals for today’s talk
1. Make the training process of neural networks faster
2. Get better performance and accurate neural networks (better test error)
3. To get more time for exploring different architectures
4. Agenda
● Random Search for Hyper-Parameter Optimization
● Bayesian optimization
● Hyperband
● Other methods
● Implementations and examples
5. Random Search
Proposed by James Bergstra and Yoshua Bengio
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
6. Bayesian Optimization
Model the conditional probability
Where y is an evaluation metric such as test error and
is a set of hyperparameters.
7. Sequential Model-Based Algorithm Configuration SMAC
SMAC uses random forest to model
as a Gaussian Distribution (Hetter et al., 2011)
8. Tree Structured Parzen Estimator (TPE)
TPE is a non-standard Bayesian optimization algorithm based on tree-structured
Parzen density estimators (Bergstra et al., 2011)
11. Hyperband
Successive Halving
Hyperband extends Successive Halving (Jamieson and Talwalkar, 2005) and uses it as a
subroutine
● Uniformly allocate a budget to a set of hyperparameter configurations
● Evaluate the performance of all configurations
● Throw out the worst half
● Repeat until one configuration remains
The algorithm allocates exponentially more resources to more promising configurations.
Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
12. Hyperband
● get_hyperparameter_configuration(n): returns a set of n i.i.d samples from some
distribution defined over the hyperparameter configuration space. Uniformly sample the hyperparameters from
a predefined space (hypercube with min and max bounds for each hyperparameter).
● run_then_return_val_loss(t, r): a function that takes a hyperparameter configuration t
and resource allocation r as input and returns the validation loss after training the configuration for the
allocated resources.
● top_k(configs, losses, k): a function that takes a set of configurations as well as their
associated losses and returns the top k performing configurations.
14. Finding the right hyperparameter configuration
Takeaways from Figure 2, more resources are needed to differentiate between the two configurations when
either:
1. The envelope functions are wider
2. The terminal losses are closer together
Lisha Li et al. (2018) http://jmlr.org/papers/volume18/16-558/16-558.pdf
17. Experiment in the Paper
CNN used in Snoek et al. (2012) and Domhan et al. (2015)
Data-sets
● CIFAR-10 (40k, 10k, 10k)
● Rotated MNIST with Background images (MRBI)
(Larochelle et al., 2007) (10k, 2k, 50k)
● Street View House Numbers (SVHN) (600k, 6k, 26k)
18. Keras Tuner: Hyperparameter search
https://keras-team.github.io/keras-tuner/
Source code for Hyperband:
https://github.com/keras-team/keras-tuner/blob/master/kerastuner/tuners/hyperband.py