How I Made Zoom In and Enhance - Seattle Mobile .NET

HOW I MADE
ZOOM IN AND ENHANCE
CONVOLUTIONAL NEURAL NETWORKS

1. SOMEONE ELSE DID IT
2. FINALLY THERE WAS A
LIBRARY I COULD UNDERSTAND

THEORY
▸ A way to compute a function inspired by the human brain
▸ Much simpler than the brain!
▸ Input -> Network -> Output
▸ Comprised of many neurons
▸ These neurons are interconnected
▸ Some are “inputs” and some are “outputs”

THEORY
DEEP NETWORKS
▸ Nowadays we have so many neurons that we no longer describe the networks
with them
▸ Instead, we have layers that contain many neurons
▸ When layers are connected to each other, the constituent neurons get
connected
▸ Neurons in different types of layers behave differently from those in other
types

TEXT
CONVOLUTIONAL NEURAL NETWORKS
▸ CNNs use a layer type called a Spatial Convolution to operate on images
▸ This is a way of connecting neurons so that nearby pixels in 2D images are
connected to nearby neurons
▸ The operation the neurons perform is called convolution and is a generalized
technique for manipulating images
▸ The beneﬁts are that local information gets local connections and the
operation itself is very powerful

TEXT
RECURRENT NEURAL NETWORKS
▸ RNNs use feedback to predict the future!
▸ Most networks are pure functions that process inputs and produce outputs
▸ RNNs feed outputs back into the network to model time
▸ It’s scarily effective

THEORY
NETWORK NOTATION
▸ Networks are written like a chemistry formula
▸ Instead of atoms, layer types are used

THEORY
TRAINING
▸ We train the network by showing it a bunch of inputs and desired outputs
▸ The training algorithm is called back propagation and involves a lot of
number crunching
▸ Neurons assign weights to the neurons they’re connected to
▸ These weights control how much the neighbors inﬂuence that neuron
▸ Training is the process of determining these weights
▸ The number of times a pair have been reused is called an epoch

THEORY
MINIMIZE ERROR BY ADJUSTING WEIGHTS

THEORY
GENERALIZING
▸ What’s the point of training a network if we already have a bunch of inputs and
outputs?
▸ The hope is that the network will learn how to solve problems it hasn’t yet seen
▸ When we train, we always reserve a batch of validation input and outputs that it
never sees
▸ We then use those inputs and outputs to rate the network

PRACTICE
NEURAL NETWORK DISTRIBUTION
▸ While the concepts are universal…
▸ Neural networks are released as source code that can train and execute the network
▸ The code can be in any number of languages and use any number of support libraries
▸ Python with TensorFlow
▸ Lua with Torch
▸ Networks may only work with some hardware
▸ NVIDIA CUDA is used a lot (they invest in academia)
▸ Cloud solutions exist varying from virtualized hardware to proprietary languages

PRACTICE
PIX2PIX https://github.com/phillipi/pix2pix
▸ 2 Networks: Generator + Discriminator

PRACTICE
PREREQUISITES - HARDWARE
LOTS OF
PROCESSING
POWER

PRACTICE
INSTALLATION
▸ Install NVIDIA drivers
▸ CUDA - GPU programming SDK
▸ CUDNN - GPU libraries to help writing nets
▸ I did this on Mac and Linux
▸ Install Torch
▸ Lua libraries that can use CUDNN
▸ Install pix2pix

PRACTICE
TRAINING
$ DATA_ROOT=./datasets/facades
name=facades_generation
which_direction=AtoB
th train.lua

PRACTICE
TRAINING COMPLETE
▸ We get a trained
model ﬁle
▸ Contains the
structured of the
network
▸ Along with the
learned weights

PRACTICE
RUNNING THE NETWORK
▸ Now that the network is trained, we can run it against new inputs
▸ Put images you want to test in a special folder
▸ Run test.lua instead of train.lua
▸ Now you have outputs!

ZOOM AND ENHANCE
GOOD TRAINING EXAMPLES
▸ It some fooling around to learn what the network can and can’t do
▸ You can’t just throw images at it and hope for the best
▸ You must spend time to give it good training examples
▸ Inputs and outputs should only differ by what you want the network to
learn
▸ Other differences will cause slow or impossible learning
▸ Examples: aspect ratio, cropping / region of interest, backgrounds

ZOOM AND ENHANCE
FACE ZOOMING EXAMPLE GENERATION
▸ Narrowed the task down to zooming in on faces
▸ Wrote an app that extracts faces from images using Apple’s CoreImage
framework
▸ Wrote another app that down samples those faces by 8x, then 16x
▸ This only simulates the problem of Z&E since noise is drastically reduced by
this downsampling
▸ Ideally, my inputs and outputs would be taken with the same camera using
two different zoom levels. But alas…

ZOOM AND ENHANCE
300 EXAMPLES - NOT JUST ME

FUTURE
MOBILE LIBRARIES
▸ TensorFlow is a C++ library that runs on Android & iOS
▸ Miguel de Icaza is binding TensorFlow to .NET (hope it works on mobile!)
▸ https://github.com/migueldeicaza/TensorFlowSharp
▸ Apple includes Metal Performance Shaders that contains basic CNN routines
▸ I’m porting Torch, for now…

How I Made Zoom In and Enhance - Seattle Mobile .NET

Recommended

Recommended

More Related Content

Similar to How I Made Zoom In and Enhance - Seattle Mobile .NET

Similar to How I Made Zoom In and Enhance - Seattle Mobile .NET (20)

More from Frank Krueger

More from Frank Krueger (9)

Recently uploaded

Recently uploaded (20)

How I Made Zoom In and Enhance - Seattle Mobile .NET