[AI07] Revolutionizing Image Processing with Cognitive Toolkit

Goals
 My goal is not to teach you Cognitive Toolkit programming
 I will have some code samples
 But they are illustrative
 CNTK has great Tutorials for that: https://aka.ms/cntk_tut
 I want to get you excited for Deep Learning
 I want you to think about the possibilities
 I want you to understand how far it’s come
 In so short a time
 And what that means for the future

Agenda
 The old “state of the art”
 Deep Learning
 What happened? Why?
 What does Deep Learning enable
 Image Recognition, Transfer Learning
 Object Detection, Image Understanding
 Semantic Segmentation
 Image Generation, Style Transfer, Adversarial Networks
 Issues
 The Future

Evolution vs. Revolution
 Image Processing had been evolving for years
 Chaining complex “convolutions” helped
 However, it was only making incremental improvements
 State-of-the-art custom featurizers
 Could not perform complex image recognition tasks
 Still lacked performance close to humans on simple ones

Convolutional Filters
 Traditional Filter-based methods still work

CAT FACES, 12 people, news articles…

FUEL + SPARK + ENGINE
COMPUTER
HORSEPOWER
+
NEW
MATH+MASSIVE
DATA

Massive Data
 Iris dataset: 150 instances, 4 classes
 ImageNet dataset: 1.2M instances, 1000 classes
 Facebook (as of four years ago): 3.5M images / day
 That’s pre-Snapchat!

1995
1985 5,000,000
2005
160,000,000
2015
7,600,000,000
( 2010 )
1,000,000,000

Deep Learning
 Shallow networks
 Deep networks (Inception v3 – ResNet152 wouldn’t fit)

Deep Neural Networks
 Deeper Networks
 Represent large non-linear function
 Easy to train by “pushing” weights in the
right direction
 “Smarter” Neurons
 CNNs learn those complex convolution filters
“automagically”
 RNNs learn to pay attention to the right bits
of history

Recognition Error – Shallow vs. Deep
shallow shallow
AlexNet, 8 layers 8 layers
GoogLeNet, 22 layers
(VGG @ 19/7.3)
ResNet152, 152 layers
0
20
40
60
80
100
120
140
160
2010 2011 2012 2013 2014 2015
layers error

* Feature visualization images from “Visualizing and Understanding Convolutional Neural Networks”, Zeiler and Fergus, ECCV 2014.

… in CNTK
def create_basic_model(input, out_dims):
with C.layers.default_options(init=C.glorot_uniform(), activation=C.relu):
net = C.layers.Convolution((5,5), 32, pad=True)(input)
net = C.layers.MaxPooling((3,3), strides=(2,2))(net)
net = C.layers.Convolution((5,5), 32, pad=True)(net)
net = C.layers.Convolution((5,5), 64, pad=True)(net)
net = C.layers.Dense(64)(net)
net = C.layers.Dense(out_dims, activation=None)(net)

What is it?
 DNNs are uncommonly good at learning abstractions
 Trained CNNs can be adapted to alternate domains
 Far less data required
 More data depending on how far the domain is from trained
 Custom Vision Service

…in CNTK
def create_model(model_uri, num_classes, input_features, feature_node_name, last_hidden_name, new_prediction_node_name='pr
base_model = C.load_model(download_from_uri(model_uri))
feature_node = C.logging.find_by_name(base_model, feature_node_name)
last_node = C.logging.find_by_name(base_model, last_hidden_name)
cloned_layers = C.combine([last_node.owner]).clone(C.CloneMethod.clone,
{feature_node: C.placeholder(name='features')})
feat_norm = input_features - C.Constant(114)
cloned_out = cloned_layers(feat_norm)
return C.layers.Dense(num_classes, activation=None, name=new_prediction_node_name) (cloned_out)

Example: Leak Detection
 https://aka.ms/leak_detection
 Convert audio into images
 Use FFT to convert to frequency domain
 Aggregate across time window and frequency into image:

Example: Using Custom Vision Service
 Detecting food using
mobile apps
 https://aka.ms/cvs_food
 “Layered” models to
reduce confusion

Regions of Interest
 Selection of bounding boxes
 Often random sizes and locations
 Major differentiator between methods
 Region Proposal Network (RPN)
 Initially done with separate SVM – super slow!
 Faster: Train class and RPN simultaneously
 YOLO9000
 Lots of tricks to make it fast

Highlights
 Per-pixel classification
 Multiple Methods
 Mask-RCNN
 Multitask Network Cascades
 Recurrent Attention Networks
 Most involve simultaneous
training (like in RCNNs
RoIs

How Does It Work?
 Trade off content vs. style
 Loss function
 L(x)=αC(x)+βS(x)+T(x)
 Content weighs more heavily in early layers
 Style weighs more heavily in later layers
 Tutorial on CNTK

…in CNTK
y = C.input_variable((3, SIZE, SIZE), needs_gradient=True)
z, intermediate_layers = model(y, layers)
content_activations = ordered_outputs(intermediate_layers, {y: [[content]]})
style_activations = ordered_outputs(intermediate_layers, {y: [[style]]})
style_output = np.squeeze(z.eval({y: [[style]]}))
total = (1-decay**(n+1))/(1-decay) # makes sure that changing the decay does not affect the magnitude of content/style
loss = (1.0/total * content_weight * content_loss(y, content)
+ 1.0/total * style_weight * style_loss(z, style_output)
+ total_variation_loss(y))

Generative Adversarial Networks
 Two networks
 Generator takes input, generates images
 Discriminator takes images, determines if fake
 Train both at same time – generator learns to fool discriminator

… in CNTK
def convolutional_generator(z):
…
h2 = C.layers.ConvolutionTranspose2D(gkernel,
num_filters=gf_dim*2,
strides=gstride,
pad=True,
output_shape=(s_h2, s_w2),
activation=None)(h1)
…
return C.reshape(h3, img_h * img_w)

Super Resolution GANs
 GANs Up-res
images
 Just shown
small and
large versions
so training
data is easy https://arxiv.org/pdf/1609.04802.pdf

GANs on Steroids
 CycleGANs is currently the best
 GANs fighting GANs
 Changing so fast it has its own zoo
 https://www.youtube.com/watch?v=IbjF5VjniVE
 Yann LeCun says
 “The most important idea in ML in the last 10 years”

How fast are GANs changing?
Credit: Bruno Gavranović

Neural Networks Can Be Fooled
 https://blog.openai.com/adversarial-example-research/
 Networks are learning a large non-linear equation
 “Intelligent” noise finds “alternate” solution

Regulations, Liability, Social Issues
 NNs doing medical diagnoses
 How do you regulate? If it’s wrong, where does the fault lie?
 Garbage in, garbage out
 FaceApp “Whitewashing”
 Economic Displacement
 Job losses
 Privacy
 Better, faster facial recognition, voiceprinting
 #FakeNews
 Autogeneration of fake audio, video

Deep Learning Toolkits
 CNTK
 1-bit SGD for better distributed training
 TensorFlow
 Momentum
 Chainer
 Torch
 Caffe
 MXNet
 Azure Linux and Windows GPU VMs

What Does the Future Hold?
 Adversaries
 Adversarial networks
 Adversarial examples
 Reinforcement
 Deep RL
 “Training” DL
 Mixture of Networks
 RNNs + CNNs
 Conv + Deconv

Call to action
 CNTK Tutorials
 In GitHub: https://aka.ms/cntk_tut
 On Notebooks.azure.com/cntk
 Finding the latest papers
 arXiv (https://arxiv.org/) …
 And even more important arXiv Sanity Preserver: http://www.arxiv-
sanity.com/
 Re-visit de:code session recordings on Channel 9.
 Continue your education at
Microsoft Virtual Academy online.

[AI07] Revolutionizing Image Processing with Cognitive Toolkit

[AI07] Revolutionizing Image Processing with Cognitive Toolkit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to [AI07] Revolutionizing Image Processing with Cognitive Toolkit

Similar to [AI07] Revolutionizing Image Processing with Cognitive Toolkit (20)

More from de:code 2017

More from de:code 2017 (20)

Recently uploaded

Recently uploaded (20)

[AI07] Revolutionizing Image Processing with Cognitive Toolkit