Deep Learning has revolutionized the field of image processing. I'll show real-world examples using CNTK, from anomaly classification using CNNs to generation using Generative Adversarial Networks.
製品/テクノロジ: AI (人工知能)/Deep Learning (深層学習)/Microsoft Azure/Machine Learning (機械学習)
Michael Lanzetta
Microsoft Corporation
Developer Experience and Evangelism
Principal Software Development Engineer
UiPath Test Automation using UiPath Test Suite series, part 2
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
1.
2.
3. Goals
My goal is not to teach you Cognitive Toolkit programming
I will have some code samples
But they are illustrative
CNTK has great Tutorials for that: https://aka.ms/cntk_tut
I want to get you excited for Deep Learning
I want you to think about the possibilities
I want you to understand how far it’s come
In so short a time
And what that means for the future
4. Agenda
The old “state of the art”
Deep Learning
What happened? Why?
What does Deep Learning enable
Image Recognition, Transfer Learning
Object Detection, Image Understanding
Semantic Segmentation
Image Generation, Style Transfer, Adversarial Networks
Issues
The Future
5. Evolution vs. Revolution
Image Processing had been evolving for years
Chaining complex “convolutions” helped
However, it was only making incremental improvements
State-of-the-art custom featurizers
Could not perform complex image recognition tasks
Still lacked performance close to humans on simple ones
21. Deep Neural Networks
Deeper Networks
Represent large non-linear function
Easy to train by “pushing” weights in the
right direction
“Smarter” Neurons
CNNs learn those complex convolution filters
“automagically”
RNNs learn to pay attention to the right bits
of history
25. * Feature visualization images from “Visualizing and Understanding Convolutional Neural Networks”, Zeiler and Fergus, ECCV 2014.
26.
27. … in CNTK
def create_basic_model(input, out_dims):
with C.layers.default_options(init=C.glorot_uniform(), activation=C.relu):
net = C.layers.Convolution((5,5), 32, pad=True)(input)
net = C.layers.MaxPooling((3,3), strides=(2,2))(net)
net = C.layers.Convolution((5,5), 32, pad=True)(net)
net = C.layers.MaxPooling((3,3), strides=(2,2))(net)
net = C.layers.Convolution((5,5), 64, pad=True)(net)
net = C.layers.MaxPooling((3,3), strides=(2,2))(net)
net = C.layers.Dense(64)(net)
net = C.layers.Dense(out_dims, activation=None)(net)
28.
29.
30. What is it?
DNNs are uncommonly good at learning abstractions
Trained CNNs can be adapted to alternate domains
Far less data required
More data depending on how far the domain is from trained
Custom Vision Service
32. Example: Leak Detection
https://aka.ms/leak_detection
Convert audio into images
Use FFT to convert to frequency domain
Aggregate across time window and frequency into image:
33. Example: Using Custom Vision Service
Detecting food using
mobile apps
https://aka.ms/cvs_food
“Layered” models to
reduce confusion
34.
35.
36. Regions of Interest
Selection of bounding boxes
Often random sizes and locations
Major differentiator between methods
Region Proposal Network (RPN)
Initially done with separate SVM – super slow!
Faster: Train class and RPN simultaneously
YOLO9000
Lots of tricks to make it fast
45. How Does It Work?
Trade off content vs. style
Loss function
L(x)=αC(x)+βS(x)+T(x)
Content weighs more heavily in early layers
Style weighs more heavily in later layers
Tutorial on CNTK
46. …in CNTK
y = C.input_variable((3, SIZE, SIZE), needs_gradient=True)
z, intermediate_layers = model(y, layers)
content_activations = ordered_outputs(intermediate_layers, {y: [[content]]})
style_activations = ordered_outputs(intermediate_layers, {y: [[style]]})
style_output = np.squeeze(z.eval({y: [[style]]}))
total = (1-decay**(n+1))/(1-decay) # makes sure that changing the decay does not affect the magnitude of content/style
loss = (1.0/total * content_weight * content_loss(y, content)
+ 1.0/total * style_weight * style_loss(z, style_output)
+ total_variation_loss(y))
47.
48. Generative Adversarial Networks
Two networks
Generator takes input, generates images
Discriminator takes images, determines if fake
Train both at same time – generator learns to fool discriminator
50. Super Resolution GANs
GANs Up-res
images
Just shown
small and
large versions
so training
data is easy https://arxiv.org/pdf/1609.04802.pdf
51. GANs on Steroids
CycleGANs is currently the best
GANs fighting GANs
Changing so fast it has its own zoo
https://www.youtube.com/watch?v=IbjF5VjniVE
Yann LeCun says
“The most important idea in ML in the last 10 years”
52. How fast are GANs changing?
Credit: Bruno Gavranović
55. Neural Networks Can Be Fooled
https://blog.openai.com/adversarial-example-research/
Networks are learning a large non-linear equation
“Intelligent” noise finds “alternate” solution
56. Regulations, Liability, Social Issues
NNs doing medical diagnoses
How do you regulate? If it’s wrong, where does the fault lie?
Garbage in, garbage out
FaceApp “Whitewashing”
Economic Displacement
Job losses
Privacy
Better, faster facial recognition, voiceprinting
#FakeNews
Autogeneration of fake audio, video
57.
58. Deep Learning Toolkits
CNTK
1-bit SGD for better distributed training
TensorFlow
Momentum
Chainer
Torch
Caffe
MXNet
Azure Linux and Windows GPU VMs
59.
60. What Does the Future Hold?
Adversaries
Adversarial networks
Adversarial examples
Reinforcement
Deep RL
“Training” DL
Mixture of Networks
RNNs + CNNs
Conv + Deconv
61. Call to action
CNTK Tutorials
In GitHub: https://aka.ms/cntk_tut
On Notebooks.azure.com/cntk
Finding the latest papers
arXiv (https://arxiv.org/) …
And even more important arXiv Sanity Preserver: http://www.arxiv-
sanity.com/
Re-visit de:code session recordings on Channel 9.
Continue your education at
Microsoft Virtual Academy online.