Sigma Xi Research Showcase 2018 - Oleksii Volkovskyi

Neural Network Hyperparameter
Optimization using Q-Learning
Oleksii Volkovskyi
American Heritage School

Neural Network Background
● A statistical model
○ Given Inputs, computes an Output
○ Inspired by the Human Brain
○ Consists computational Units with Parameters
■ Uses complex nonlinear functions in order to
achieve high results
○ Relies on computational power and data size
○ Backpropagation can be used to “train” the network
■ Works on the principle of comparing current
output to the dataset output and self
modification in order to increase accuracy

Neural Network Hyperparameters
● Hyperparameters
○ Constants chosen before Training
● Regularisation Constant (Alpha)
○ Throttles complexity of trained model
○ Small constant values results in simpler models
○ Is generally randomly selected
● Size of Hidden Layers
○ Affects computational complexity, and can have a
negative effect
○ Is generally a human decision
○ Computational Expense
● Random selection and arbitrary decision can lead to
suboptimal performance and failure to reach the global
minimum of the neural network (bottom right graph)

Q Learning Algorithm
● In order to tackle the problem of suboptimal
hyperparameters, a reinforcement learning algorithm can
be used
○ Q learning is an algorithm that makes decision given a
state
○ The algorithm defines a Q-matrix that acts as a
decision matrix
■ The algorithm iteratively improves the Q-matrix
given a training dataset
○ Q learning can be used with a dynamic Q-matrix that
can be modified when a new state is created, which
means that the number of states doesn’t have a limit

Q Learning Algorithm
● Q - Learning will be applied to the Neural Network Hyperparameter Selection task as
following:
○ The Q-Matrix will be a multidimensional array, depending on the number of
hyperparameters and chosen states
■ In this case, the chosen hyperparameter was the regularisation constant
■ The state will be defined as an array of the following:
● The current regularisation constant value
● The current bias-variance metrics
○ The reward will be computed as the F-score of the algorithm (accuracy metric)
○ When trained on multiple networks, the algorithm should be able to generalise to any
Neural Network
■ The goal will be to converge to Optimal Hyperparameters in shorter computational
time

Hypothesis
● Q-learning, with hyperparameters and bias-variance metrics given as states and the
F-score as reward, can be used to develop a general algorithm for tuning the hyper-
parameters of a neural network.
● General neural network algorithms will:
○ Eliminate the need for human interference in neural network algorithm
training
○ Maximise the capabilities of the neural network concept

Bias Variance Metrics and F-Score
● Bias and Variance Metrics will be defined by the given formulas
○ They are responsible for the intuition behind Q-learning
algorithm
○ Bias - The ability of an algorithm to fit training data
○ Variance - The ability of an algorithm to generalise to
test data
○ Given these metrics, the Q-learning system is tasked with
scaling the regularisation constant
■ Increasing - Simpler system
■ Decreasing - Complex System
● F-score is a more rigorous alternative to accuracy
○ Handles skewed datasets more strictly
○ Eliminates the need for large pre-processing of data

Procedure
1. Create a Neural Network Function, that Inputs Regularisation Alpha, Outputs F-score, Bias, and
Variance.
2. Write a function that selects action based on Q-matrix Reward and euclidean distance to possible
hyperparameter values
3. Write the Q-learning iterative learning loop, that updates the Q-matrix based on the computed
reward and action chosen
4. Train Algorithm on Train Data, which I have chosen to be a poisonous mushroom classification
dataset
5. Benchmark algorithm on Test Data
a. Record Time and Final Accuracy of Algorithm (Average across 10 Runs)
6. Run Random Selection algorithm on Test Data as Control (Average data from 10 Runs)
a. Since random selection can narrow down to an infinitely small range of hyperparameters, the
results will compare computational time, while accuracy will be a controlled variable

Results
● Q-learning yielded a decrease in Average Computational Time
when compared to Random Selection
○ 28.65% on the test dataset
○ 47.05% on the train dataset
● It was given the task to generalise from training to test datasets
○ The datasets had similar concepts, yet entirely different
tasks
■ Safe/Poisonous Mushroom Classification (Train)
■ Malignant/Benign Cancer Classification (Test)
○ The algorithm was prone to overfitting on the train dataset
■ The algorithm was trained on one dataset due to
limitations in hardware and time

Conclusion
● The hypothesis was validated
○ The Q-Learning algorithm was able to fit
hyperparameters when given an unknown dataset
■ Additionally, the Q-Learning algorithm
Outperformed baseline algorithm (Random
Selection) in terms of computational time
○ The improvements in computational time were made
due to proper selection of hyperparameters, without
modifying the core of the neural network

Future Improvements
○ Large improvements can be gained from more complex reinforcement
learning algorithms
■ One example is Deep Q-Learning, used by Google in the development
of AlphaGo
○ Higher variety of Hyperparameters would validate the experiment further
■ Hyperparameters such as hidden layer size and amount of layers could
be used, as well as the choice of computational unit function
■ This would come with an increase in training time and overall
computational time, because the algorithm would have to consider
exponentially more options
○ Training on a variety of datasets would eliminate overfitting

Applications
● General neural network algorithms would:
○ Eliminate the need for human interference
○ Maximise Performance of neural networks
○ Have a chance to reach Bayes’ Optimal Error, which is defined as the smallest error
possible for a function to have on a dataset
● Hyperparameter training is applicable to other, more complex, types of Neural Networks
○ Convolutional Neural Networks
■ Autonomous Driving
■ Face Recognition / Verification
○ Recurrent Neural Networks
■ Speech Recognition
■ Music Composition

Sigma Xi Research Showcase 2018 - Oleksii Volkovskyi

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Sigma Xi Research Showcase 2018 - Oleksii Volkovskyi

Similar to Sigma Xi Research Showcase 2018 - Oleksii Volkovskyi (20)

Recently uploaded

Recently uploaded (20)

Sigma Xi Research Showcase 2018 - Oleksii Volkovskyi