In this presentation, we give a brief introduction to Keras and Neural networks, and use examples to explain how to build and train neural network models using this framework.
Talk given as part of an event by Rio Machine Learning Meetup.
3. Intro
● Neural nets are versatile, but there was a need for a simple
framework to design + experiment with them.
● Neural nets (particularly with multiple layers) need a lot of time to
be trained
● Recent advances in algorithms (Layerwise-training, contrastive
divergence, etc) and in hardware (leveraging GPUs for tensor
operations), as well as the massive amounts of available data have
made deep learning popular
3
4. Neural Networks
● Generally speaking, neural networks are nonlinear machine
learning models.
● They can be used for supervised or unsupervised learning.
● Deep learning refers to training neural nets with multiple layers.
○ They are more powerful but only if you have lots of data to train
them on.
● Keras is used to create neural network models
4
11. Keras
● Models created by Keras can be executed on a backend:
○ Tensorflow (default)
○ Theano
○ CNTK (Beta)
○ MxNet (Beta)
● Keras has builtin GPU support with CUDA
○ CUDA is a framework for using the GPU on Nvidia video cards
for mathematical (tensor) operations
11
12. Keras
● Keras is the de facto deep learning frontendSource:@fchollet,Jun32017
12
13. Keras
● Keras is among the libraries supported by Apple’s CoreML
Source: @fchollet, Jun 5 2017
13
14. Example #1
● The MNIST dataset contains 60,000 labelled handwritten digits (for
training) and 10,000 for testing.
14
15. Example #1
● We can train a neural net to classify a digit’s pixels into one of the
10 digit classes:
NOTEBOOK - MNIST MLP
15
16. Example #2
● The MNIST dataset can also be trained using multi-layer,
convolutional neural networks (CNNs).
○ The results with a regular NN are already good, but it’s good to
show how to train a CNN
● NOTEBOOK - MNIST CNN
16
17. Example #2 - What are CNNs
● While the model is being trained, let’s understand what a CNN
looks like and what it’s good for.
● CNNs use convolutional operations to extract features that are
position invariant.
○ In other words, they make it possible to train models that detect
features no matter what position they are in the input samples
17
18. Example #2 - What are CNNs
● For this reason, they are often used for image classification:
18
19. Example #3
● CNNs can also be used for text classification
○ In fact, they produce state-of-the-art results in tasks such as:
■ Text classification
■ Sentiment analysis
● Let’s train a CNN model to classify documents in the
newsgroup_20 dataset
● NOTEBOOK IMDB CNN
19
20. Keras: Models
● The most important part of keras are models.
● Model = layers, loss and an optimizer
● These are the objects that you add Layers to, call compile() and
fit() on.
● Models can be saved and checkpointed for later use
20
21. Keras: Layers
● Layers are used to define what your architecture looks like
● Examples of layers are:
○ Dense layers (this is the normal, fully-connected layer)
○ Convolutional layers (applies convolution operations on the
previous layer)
○ Pooling layers (used after convolutional layers)
○ Dropout layers (these are used for regularization, to avoid
overfitting)
21
22. Keras: Loss Functions
● Loss functions are used to compare the network’s predicted output
with the real output, in each pass of the backpropagations
algorithm
○ Loss functions are used to tell the model how the weights
should be updated
● Common loss functions are:
○ Mean squared error
○ Cross-entropy
○ etc.
22
23. Keras: Optimizers
● Optimizers are strategies used to update the network’s weights in
the backpropagation algorithm.
● The most simple optimizer is the Stochastic Gradient Descent
Algorithm (SGD), but there are many other you can choose, such
as:
○ RMSProp
○ Adagrad
23
24. Keras: Optimizers
● Most optimizers can be tuned using hyperparameters, such as:
○ The learning rate to use
○ Whether or not to use momentum
24
25. Keras: CPU / GPU
● If your computer has a good graphics card, it can be used to speed
up model training
● All models up to now were trained using the GPU.
○ Let’s see what happens if we disable to the GPU, and force
keras to use the CPU instead.
25
26. Keras: Other information
● Feature preprocessing
○ Although you can use any other method for feature
preprocessing, keras has a couple of utilities to help, such as:
■ To_categorical (to one-hot encode data)
■ Text preprocessing utilities, such as tokenizing
26
27. Keras: Other information
● You can integrate Keras models into a Scikit-learn Pipeline.
○ There are special wrapper functions available on Keras to help
you implement the methods that are expected by a scikit-learn
classifier, such as fit(), predict(), predict_proba(),
etc.
○ You can also use things like scikit-learn’s grid_search, to do
model selection on Keras models, to decide what are the best
hyperparameters for a given task.
27
28. Keras: Other information
● Nearly everything in Keras can be regularized. In addition to the
Dropout layer, there are all sorts of other regularizers available,
such as:
○ Weight regularizers
○ Bias regularizers
○ Activity regularizers
28