Meta Machine Learning: Hyperparameter Optimization

15-618 Parallel Computer Architecture and Programming
Meta Machine Learning: Hyperparameter optimization
https://pbollimp.github.io/Meta-Machine-Learning/
Priyatham Bollimpalli (pbollimp)
Mohit Deep Singh (mohitdes)

Abstract
In this project, we implemented various hyperparameter optimization algorithms primarily for
Deep Neural Networks (Multi-Layer Perceptrons). We used OpenMP as the baseline for grid
search, random search and a simple evolutionary algorithm. We then proceeded by
implementing our own three variants of evolutionary algorithms using pthreads that not only
give upto 9x speedups on 16 cores and scale very well with increasing number of threads, but
also explore a much larger hyperparameter space and finds the hyperparameters that give good
accuracy much faster than the baselines. Even though our experiments are specific to MLPs, we
are confident that these methods are very applicable to hyperparameter optimization for any
machine learning problem. We also performed an extensive breakdown of the execution times
for our algorithms and reasoned about the performance of individual algorithms.

Background
The task of Hyperparameter search
In the recent past, deep learning has been incredibly successful in various tasks such as image
classification and tracking, machine translation, speech recognition etc. One of the biggest
challenges faced by all machine learning researchers today (especially in deep learning) is
selecting hyperparameters for models. A hyperparameter is a parameter whose value is set
before the learning process starts. In contrast, model parameters are parameters that are
optimized as part of the learning process.

Hyperparameters usually have a huge impact on the amount of time it takes to train a model,
and how well the model actually does. Usually, hyperparameters end up being critical in model
performance, and that makes it critical for the researchers to find the most optimal values for it.

In practice today, most researchers manually tweak the hyperparameters and train the networks
to see which hyperparameters give the best results. This process is extremely tedious and time
consuming, and the process of automating this is called hyperparameter optimization. The few
main approaches that exist as of today include:

1. Grid Search: This basically involves exhaustively searching through the search space of
the hyperparameters, evaluating how each set of hyperparameters performs in terms of
training, and choosing the best combination. This approach is extremely heavyweight,
involves doing a brute force search over the space of hyperparameters. The advantage of
this is that it can be parallelized well, and given enough resources (and good intuition),
one can usually end up with the best set of hyperparameters.
2. Random Search: This approach involves randomly sampling hyperparameter sets
through the search space, and searching till it gets the desired accuracy. This method is
very successful in practice, since it doesn’t exhaustively search the entire space, and still
gets good results in practice.
3. Bayesian Optimizations: This approach usually involves using the past instances of the
hyperparameters already tested to learn, and sample better sets of hyperparameters to
evaluate. Evolutionary search and gaussian processes are the most common techniques
used as bayesian optimization techniques. The biggest challenge is that since there is an
optimization step, this makes these inherently sequential, and harder to parallelize.

Evolutionary Search Algorithm
In the artificial intelligence literature, an evolutionary search algorithm is a generic population
based meta-heuristic based optimization algorithm. An evolutionary strategy algorithm uses
methods that are inspired by evolution, such a mutation, reproduction, reconstruction and
selection. The high level idea is that given a population of individuals, we have a fitness function
to rank which of the individuals are the best performers. Using this function, we rank the
individuals, kill a certain fraction of the weakest part of the population, and have the stronger
part of the population reproduce and mutate to arrive at better solutions.

In our case, the individual hyperparameters sets are the candidates in a population. The
accuracy of a neural network trained with those hyperparameters is our fitness function. The
algorithm inputs are the population size, some initial random population of candidates and the
mutations and crossovers mechanism. The output is the best candidate with the highest
accuracy (fitness). The mutations can heavily vary based on the application. In our particular
case,we randomly chose 3 candidates from the pool of the strongest candidates, took the
average of their hyperparameters, and added some noise to it to encourage the algorithm to
explore more.
Initial Analysis: Workload and Data Structures
The main components of our program are the neural network (training and inference), the
dataset which the neural network is being trained on and the set of the best hyperparameters,
or the hyperparameters that we are searching for. In the described algorithms, the main
computationally expensive part is training and inferring multiple neural networks to get the
accuracies of different sets of parameters. So parallelizing this would give us huge benefits, as
we can test the fitness of multiple candidates in parallel.
There are a few dependencies in this program. Firstly, for evolutionary search algorithms, we
want to very carefully search through the hyperparameter search, so as to make sure we are
regularly using the past results to carefully restrict our search space in the future to get better
results. This requires us to frequently synchronize all parallel threads, evaluate the current
population, rank it and mutate it. Furthermore, depending on what parameters we are
searching, and what the values are, we could end up having very skewed workloads, where
some hyperparameters based on their value might take a much longer time to train and
evaluate as opposed to other. Therefore, these algorithms could greatly benefit from data
parallelism in the first few approaches, where we could analyze fitness of the population in
parallel, synchronize and then parallelize the mutations as well.
The key data structures ended up being the neural networks, which we could not end up sharing
(since we need to train each network individually through backpropagation to evaluate how well
the hyperparameters would perform). The other key structure is the array of all candidates of a
population or multiple populations. The main operations on this array were updating the correct
accuracies, ranking them with respect to each other, and killing the weakest candidates and
mutating and reproducing the strongest ones to produce new candidates.

Main Challenge

The task of hyperparameter search is a very tedious one, especially given we need to train
multiple neural networks from scratch with different hyperparameter configurations. As the
networks become more complex, and the number of hyperparameters increase, this task
becomes more and more challenging - the search space increases exponentially, and more
applicable methods such as bayesian optimization techniques are difficult to parallelize.
The main challenge we faced while trying to parallelize these algorithms was the ability to speed
up not just the algorithms, but also get the correct tradeoff between the accuracy and how fast
and effectively we can search the hyperparameter space. There is an inherently sequential part
of any bayesian optimization, where based on the past results, we need to come up with better
estimates of the hyperparameters. We tried parallelizing the algorithms in phases first. Then, we
aimed to optimize that sequential part as well, and traded-off the quality of the bayesian
optimization for a faster albeit slightly less accurate search.

Approach
In this section, we go into our approaches and techniques we used to parallelize the algorithms
Grid Search (GS)
The high level algorithm is as follows:

For grid search, we loop through all possible configurations of hyperparameters. In our case, the
fitness function is training a neural network, and getting the accuracy of the network on the

validation set. Parallelizing grid search was mostly straightforward, other than lines 5 and 6 in
the algorithm. As a result, we just created an array that kept track of all accuracies for all
configurations, and parallelized the calculation of the fitness for each possible configuration. In
the end, we get the maximum accuracy, and select that configuration. We realized this was not
very scalable with the size of the hyperparameters, so we broke up the entire parameter space
into chunks, parallely calculate the fitness of those chunks, and incrementally keep track of the
most accurate configuration.
Random Search (GS)
The high level algorithm is as follows:
As can be seen in the algorithm, random search randomly samples hyperparameters from their
search space, and tests them for their fitness. It keeps track on the best parameter configuration
at any given point, and if the fitness is over a threshold, it stops searching the parameter space.
Parallelizing this was not absolutely straightforward, but we followed the strategy of creating
batches of threads which randomly sample from the hyperparameter space and report its
accuracy to an array in the global memory. The master thread, every few chunks, gets the best
accuracy from this array, and if it is over the threshold kills the program. Otherwise it keeps
spawning off new blocks of threads to evaluate other random configurations. This is always in
practice much faster than grid search, and parallelized version of it usually helps explore the
space in a more efficient manner and get the desired accuracy pretty quickly.

Evolutionary Search - Basic Algorithm

As can be seen in the algorithm, the basic evolutionary search works in a way where it tries to
leverage the past evaluations of the fitness functions of the explored parameters to try and
come up with a better and more effective form of optimizing the search space.
The idea is the following. We start out with a population of candidate hyperparameters, and
evaluate all of them. After evaluating them, we rank all the candidates in the population
according to their fitness. We then kill/discard the last k candidates (least fit). Then we go ahead
and generate new children using the top/best performing hyperparameter. The way we mutated
our top performing parameters were: we randomly sample 3 hyperparameters from the top
performing ones, and took the average. We then add some uniform noise to these averages to
encourage our algorithm to explore.
Parallel Evolutionary Search 1 (EV1)
The first approach we took was to try and parallelize the basic evolutionary search algorithm. We
first decided to parallelize the evaluation of the fitness functions of the individual candidates. We
ended up parallelizing this using OpenMP, and creating threads to divide this work up. We
noticed that with certain hyperparameters like the number of hidden units, the workload
imbalance ended up being very high, and openMP static scheduling did not give us good

speedups. So we decided to use dynamic scheduling, and despite the existence of sequential
code (which ranked the population and mutated it), we got decent speedups (discussed in the
results section).
Evolutionary Search with Islands - Our adaptation of the basic algorithm
The earlier algorithm usually would do well with hyperparameter spaces as long as the
dimensionality is not too high. But this algorithm is not ideal when the dimensionality of the
number of hyperparameters we search for is high. Because of random sampling, if it does not
start with good enough points, a basic evolutionary search would get stuck at local optimums. It
helps if we create multiple local evolutionary search algorithm with specific start points, and
then have them communicate with each other what their best parameters are after a few
generations of local optimization. We call this variant the “island” approach, where you have
island of local populations, which intermingle every few iterations.
The basic idea was, we kept n different “islands”, each of which had its own population of
candidate hyperparameters. Each of these islands separately carried out the evolutionary search
on its candidate population for a few generations. This means each island would explore their
candidates, rank them within their population and kill and repopulate the weakest members.
After a few iterations, we took the top-k candidates from each of these islands, merged them
together and applied a global evolution to it. This means, the merged candidates were now
sorted, the weakest in this global evolutionary search were killed and repopulated based on the
mutations explained in evolutionary search algorithm section. After this, we would reseed the
local population with these globally mutated seed, and some random seeds, so that it can
explore the high dimensional space in a better way.

We explained how we parallelized these algorithms in the next few sections.

We wanted to parallelize the island approach, to be able to scale better and search dense
hyperparameter spaces more efficiently. We decided to use the fork join model with shared
memory to implement this parallel approach.
The initial idea was that the global top candidates would be stored in shared memory. The
master then spawns off the requisite number of threads, and every thread essentially ends up
acting as an island. Every island then reads its chunk of “seed” candidates from the global top
candidates (which is shuffled after mutation by the master). Then each of these threads run a
certain number of iterations of the evolutionary search algorithm locally. After finding its top
candidates having run a few generations, the islands report their top-k candidates to the global
array. The master waits for all these threads to complete executing their local “generations”,
using join. Once every thread finishes, master then sorts these reported local winners in the
global top array, and applies a “global evolution” on this array. This is followed by the master
shuffling this global array, and then spawning off local threads again to do more iterations of
these local evolutionary searches.

The approach we took here was to parallelize the islands, rather than parallelizing every single
candidate, so that it is easier for us to do a global intermingling of the best candidates.
Independent to EV2, we also tried implementing a threading model, where instead of using join
every time, we used a pool of threads, and shared variables and mutexes to synchronize
between the threads. The idea was to compare the performance of the two and see which one
would do better.

We used a similar model in terms of splitting up the islands between the threads. We had a
shared count variables, and a mutex per thread, and also a mutex for the count variable. The
master thread created a thread pool of the specified number of threads to begin with and would
set the count variable to the number of threads. Each thread would work in an infinite loop,
where at every iteration of the while loop, it would lock its respective mutex. Then it would get
the requisite global “seed” candidates, and run the respective local evolutionary search
algorithms. After each individual thread would be done, it would decrement the count variable
by one, and loop back and try grabbing its own lock again (where it will wait). The master thread
waits for the count variable to get to 0, and when it does, it sorts the global tops array, applies
the respective global mutations, shuffles the array, and resets the count variable to num
threads. After that the master thread unlocks the respective mutex for each thread so they can
start running. This would run till either the master thread found an accuracy greater than the
threshold, or a certain upper limit of iterations. Once the stopping condition would hit, the
master thread sends out a signal to the other threads to quit, using a global variable.

The motivation behind this was that, we thought pthread fork and joins may have extra
overhead, and using mutexes might help us prevent that overhead. As discussed in the result
section, this ended up being slower than EV2 for smaller number of threads, and that was
mostly because of the synchronization overheads due to mutexes and shared variables. But for
16 threads, this outperformed EV2, because of the overhead of spawning new threads every
iteration.

Finally, we wanted to target and reduce the time the island program was taking in the sequential
section. The approach we took is defined below.

In the last couple of approaches we noticed a few things. First was that sometimes the workload
distribution ends up being skewed, and that causes some threads to just be idle to wait for other
threads to catch up. Also, the global evolution would happen in a sequential fashion, which was
definitely a bottleneck in further speedups.

For this, we went back to the drawing board, and realized that the whole idea behind us coming
up with the island approach was to have different local evolutionary populations communicate
and intermingle to encourage more random exploration, and better evolution eventually. We
realized that the communication does not need to be completely exact (and accurate) in time,
and we could do this in an asynchronous manner. We built this on top of EV2, but got rid of the
synchronizations. So every thread, including the master would exclusively work on its local
population. We created three shared arrays of global populations. One of the arrays was called
the “read array”: this is where the master initializes the first “seed” for all islands. The second
array, was the “write array” where all threads would report the best candidates from their
population. The third is the buffer array, which is used to do any intermittent calculations.

The master first initializes the “read array”. Then it spawns off all other threads. Each thread
basically runs the local evolutionary search algorithm, which either runs till the master signals it
to stop (using shared variable), or it hits a certain number of iterations. The master also runs
local evolutionary search, but every few iterations, it performs the global sorting and mutation
step without blocking other threads. It copies the “write array” results to the buffer, sorts the
buffer, applies global evolutionary search to it, and shuffles it. Then it switches the pointer
locations of read array and the buffer, so now the threads can read the new globally evolved
array. This allows all threads to operate asynchronously, where the only drawback is that the
intermingling does not exactly happen at the same times for the threads. This is acceptable,
because the main idea was to interchange information between islands, and by making this
asynchronous we did not lose much accuracy.

The only problem we faced with this was that the master thread now had to do extra work so it
severely lagged behind, and that caused the other threads to work on stale data for a much
longer time. As a fix to this, we significantly reduced the population size of the local evolutionary
search for the master, and that way, the master was always either ahead of the other threads
(which did us no harm), or would be at par with the other threads. We kept executing master till
either it hit a threshold, or some other thread exited, which signaled all other threads to quit via
a shared variable. This way, we completely removed any sequential code from our algorithm.

We did tradeoff a little bit of freshness in terms of global evolution, but since the local evolutions
follow the process correctly anyway, this ended up not causing a huge issue with accuracies. We
just wanted information to flow through all threads to explore a larger space better, and not get
stuck in local optimums, which this model helps us achieve. This gave us more speedups as we
have shown in the results section.

For this approach we used the same model as EV4, but we also parallelized the part of the code
where the master sorts and evolves the global array. We used openMP to implement a parallel
for loop here and ended up parallelizing the evolutions carried out by the master thread.

In addition to that, we also padded the three global arrays to completely take up the cache line
size in order to prevent false sharing. This method thus is our best model in terms to
performance and gave us the most speedups.
Experimental Setup
Tiny DNN
Our initial approach was to use a highly optimized neural network library, to run
hyperparameter search on convolutional neural networks. We wanted to stick with C/C++ for our
implementation, as it is much easier to parallelize on that, rather than python (which has much
better frameworks for deep learning).

We found a library called tiny-dnn which provided a very high level abstraction of neural
networks. It was a highly optimized library which could be linked to a project by using header
only compilation. This library was also in C++14 which prevented us from running this on GHC.
We implemented most of our techniques using the Tiny DNN framework, but on actually running
the tests on multiple threads, we saw that there were no performance improvements. We
hypothesized that this is because of the tinyDNN doing some optimizations using openMP and
pthreads under the hood in the background and this was clashing with our parallelization
implementations. The library was taking up most resources on the limited resource machine we
were running our experiments on. We could not profile because we could not run it on GHC (due
to C++14) and AWS blocked proper profiling on their instances. At this point, we decided to
abandon this library, and use a simpler DNN (a multi layer perceptron) to train our networks.
MLP
We eventually decided that we will mostly try and use an existing simple multi-layer perceptron
codebase, or reimplement our own version on a simple multilayer perceptron with
backpropagation. We ended up taking some motivation and code [2], and reimplemented parts
of it to fit our needs.

This meant that we could not test our algorithms on very complicated networks (just MLPs), but
given the resource constraints we had, and the amount of time it takes to train even a single
complicated network, it would have taken us considerable amount of resources and time to

write C++ code and train these networks to get good accuracies. So we scoped our problem
down to multi-layer perceptrons, as it was easier for us to implement it and use a version of it
for hyperparameter search. We still spent a considerable amount of time trying to implement
and get a simple MLP working in C++, so we decided to focus the rest of our time trying to
parallelize hyperparameter search, rather than build more complicated networks such as
convolution networks.
Datasets
We primarily tested our results on two datasets. We used the iris dataset as our first dataset,
because it was a rather small classification dataset (150 points, with 3 labels), and it was very
quick for us to train multiple networks in parallel with different hyperparameters to test the
speedups and results. Once we verified our approaches and got some considerable speedups
with the iris dataset, we switched to using the MNIST dataset, which was considerably larger
(60,000 images). We initially wanted to get all our results on the CIFAR10 datasets, but since we
resorted to using MLPs, it was very hard to get any substantial accuracies with the CIFAR dataset,
and that was throwing our hyperparameter search off.

Machine Specifications
All the experiments were run on the AWS c5.4xlarge instances. These instances have Intel Xeon
Platinum 8000 series, with 16 virtual cores and 32 gb memory. We also tested this on the GHC
machine, which have eight 3.2 GHz Intel Core i7 processors, and we noticed similar speedups.

Results and Analysis
Speedups
Wall Clock Times
The wallclock time for different algorithms are given here. Observations:
1. GS, RS, EV1 are going to be faster because the parameter space size on which they search
is much less than the other algorithms. We significantly restricted these to measure
speedups, and not waste time on running it on hyperparameter large spaces.
2. EV3 takes the longest time since the synchronization cost is a lot more than the other
techniques
3. For random search, average of three runs is taken and since the algorithm stops when it
reaches an given high accuracy threshold (75% for MNIST and 97% for Iris), it’s total
execution time is much less.
4. EV4 execution time is less because it has near zero synchronization cost.
5. EV5 execution time is the least among EV2-5s since it benefits the most from the
parallelization while maintaining the same search space.

Relative Speedups

The relative speedup for different algorithms on both the datasets are given here. Observations:
1. RS doesn’t scale after a point since the time needed for achieving a minimum good
accuracy has been reached with a certain number of threads (8 in this case) and more
threads do not improve the execution time since we hit the steady state in the expected
iterations to hit the maximum accuracy threshold.
2. EV 3, 4, 5 scales similarly since the parallelization technique is similar. EV 5 scales the best
since it has no synchronization overhead.
3. As the number of threads increase, the overhead of repeatedly creating pthreads
increases for EV2 due to which it’s speedup at the end is not as good as the others. EV3
just creates a pool of threads at the start and uses mutex to synchronize due to which it
did not achieve perfect speed up but it did better than EV2.
4. EV1 does not scale well since the contention between different threads increases as they
are operating on the same array which leads to false sharing and more synchronization
overhead due to some workload imbalance (explained in next section). Moreover, the
search space (amount of work) is less, so more threads are not giving ideal speedups.

5. EV5 scales the best since it has no synchronization (build from EV4), and additionally we
added padding to the structure which reduces false sharing and it also contained
additional parallelism in addition to the parallel component of parameter search due to
which we get the best efficiency at the end.

To summarize, the overhead of creating threads, synchronization cost, part sequential
work inherent in each algorithm in order to reconcile and get best hyperparameters and
other costs (like initial data loading) prevent us from achieving perfect speedups. This is
more elaborately portrayed for EV2, EV3 and EV4 in ‘Normalized Execution Time
breakdown’ section.

Scaling with different parameters
This section gives the relative change in the execution time within each algorithm. The
experiments are run using the best possible execution time that is achieved using 16 threads for
all the algorithms.
Dataset size
The relative execution time (degradation in this scale), when scaled with dataset size is is shown
here. Note that the dataset sizes are increased exponentially, so an exponential slowdown in this
case is observed (except for RS since the time to reach the accuracy threshold for the network
becomes almost constant after certain amount of data). Since the data is passed through the
neural network, irrespective of the algorithm, we should observe the same proportionate
change in the execution times. This is because the forward pass, backward pass and the testing
(which is 10% of the dataset) is done in the neural network and this changes exactly in the same
way for all the algorithms due to which same trend is observed.

Number of Initial Hyperparameters
The relative execution time (degradation in this scale), when scaled with hyperparameter search
size is shown here. There are multiple interesting observations. Note that the number of
hyperparameters are also changed exponentially.
1. The grid search is essentially an exhaustive search due to which an exponential increase
in the hyperparameters gave an exponential increase in execution times.
2. If the space of random hyperparameter sampling is more, it is observed that random
search quickly finds the best model and reaches its accuracy threshold. Thus, the
runtimes decrease in this case.
3. The whole idea of evolutionary search is to retain only the best hyperparameters. Thus,
even though we sample from a large hyper parameter space, since we discard the bad
ones and only operate on the best ones, the execution time remains constant. There is a
slight increase due to the increase in the sampling cost because of larger range.

Size of Neural Network
The relative execution time (degradation in this scale), when scaled with the size of the neural
network is shown here. Note that similar to the increase in the dataset, the cost incurred will be
the same across all the algorithms. The training and testing times proportionately increase here
too and since the depth of the network increases exponentially from 1 to 8, so are the relative
execution times.

Workload distribution between the threads in the algorithms
The primary difference in EV2 and EV5 are that in EV2, the pthreads are forked and joined
repeatedly. A major problem with this is the unequal execution times between the threads. For
instance, having a hidden layer size of 512 would take more time to train and test compared to a
32 unit one. This would lead to multiple threads being idle waiting for the long running threads.
In EV5, since the algorithm is modified to remove synchronization, all the threads can be busy
and get the best result. From the figure above, for 16 threaded run, ‘blue’ from EV5 are better
distributed than the ‘orange’ color ones from EV2. This can also be confirmed from the speedups
and wallclock time graphs too in which EV5 is shown to be the best.

Note that all the results on GS, RS and EV1 are reported on dynamically scheduled workload of
openMP since similar trend is observed.

Normalized Execution Time Breakdown
For each algorithm, a detailed normalized execution time breakdown is given below.
The relative trends and speedups in the execution times are already explained above. But from
this graph, we can clearly see exactly why that is the case.
EV3 has a lot more synchronization cost associate with it since the threads stall a lot more to
acquire the mutexes. This cost (time) adds up and only gets exponentially worse as the number
of threads increase.
EV2 in contrast, has some synchronization cost during the ‘join’ part of the pthreads. But this is
almost constant since the master executes them a fixed number of times (that is specific to the
algorithm).
We clearly see here that for EV4 (and EV5 too since its built on top of EV4), the lack of
synchronization time makes them very efficient.
The algorithms have similar proportion of total parallel work (given in orange) across the threads
but we see the speedups because this is divided between the threads and each thread has
lesser work to do as we increase the number of threads.
Moreover, we can see that the inherent sequential part of the algorithms prevent us from
achieving perfect speedup.

Accuracy vs. Time
For this chart, we set a threshold for each of the parameters to stop searching the parameter
space as soon as it hits a 75% accuracy (which is the a good accuracy on MNIST using MLP). We
had significantly constrained the parameter space for grid search and random search to prevent
long runtimes, so these numbers are the proportion of the time taken to get to the required
accuracy, with respect to the total time taken(for an iteration bound search or an exhaustive
search for that particular algorithm).

As we can see, grid search needs to do a sweep of all parameters before it can check the
accuracies, so it takes the most time. Random search, has a threshold on accuracies that are
checked after every block of parallel execution, and we can see that it reaches 75% accuracy in
about 20% of the time it takes to do a longer sweep and still get similar results.

We can see that EV4 takes the least amount of time, as it is searching in an asynchronous
manner, and it still retains information across threads or islands. EV2 and EV3 are very close,
and this can even be written down to the random initializations of the populations, but the idea
is they get to a pretty good solution significantly faster than grid or random search. EV3 is
slightly higher due to synchronization cost involved (explained in previous section)

Credit Distribution
The work was equal by both partners. We both worked together on figuring out the approaches,
setting up frameworks and getting a basic version of our project working. We split up the
algorithms we have implemented equally, and had equal contribution in terms of the analysis
and the report.
References
[1] https://github.com/tiny-dnn
[2] https://github.com/davidalbertonogueira/MLP/blob/master/README.md
[3] https://deepmind.com/blog/population-based-training-neural-networks/
[4] http://geco.mines.edu/workshop/aug2011/05fri/parallelGA.pdf
[5] https://blog.floydhub.com/guide-to-hyperparameters-search-for-deep-learning-models/
Two images were taken from:
[6] https://www.slideshare.net/alirezaandalib77/evolutionary-algorithms-69187390
[7] https://www.slideshare.net/alirezaandalib77/evolutionary-algorithms-69187390

Meta Machine Learning: Hyperparameter Optimization

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Meta Machine Learning: Hyperparameter Optimization

Similar to Meta Machine Learning: Hyperparameter Optimization (20)

More from Priyatham Bollimpalli

More from Priyatham Bollimpalli (10)

Recently uploaded

Recently uploaded (20)

Meta Machine Learning: Hyperparameter Optimization