BTP Learning Outcome.pdf

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Project SHRINGAR(Learning Outcomes)
BTech Project, under guidance of Dr Anand Mishra
Nivedit Jain, Mitul Patel, Rajat Sharma
Department of Computer Science and Engineering
Indian Institute of Technology, Jodhpur
December 2020 - January 2021
Nivedit, Mitul, Rajat (IITJ) Project SHRINGAR(Learning Outcomes) December 2020 - January 2021

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
1 Gaussian Mixture Model
2 UNet
3 Mask RCNN
4 Generative Models
5 AR
6 AR
7 AR
8 AR

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Gaussian Mixture Model

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Idea

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mathematical Details
p(x) =
K
∑
i=1
ϕiN(x|µi, Σi)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
∫
p(x) =
K
∑
i=1
ϕi
∫
N(x|µi, Σi)
⇓
1 =
K
∑
i=1
ϕi

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
p(Ci|X) =
ϕiN(X|µi, σi)
∑K
i=1 ϕiN(X|µi, σi)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Super Pixels

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
SuperPixel SLIC Algorithm

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UNet
UNet

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UNet
Intro
U-Net was introduced for biomedical Image segmentation using
Convolutional Networks.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UNet
Architecture

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UNet
UNet-Loss

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UNet
UNet-Example

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
UNet
Conclusions
1 Fast-Predictions: Takes less than a sec to predict on latest GPUs for
a 512x512 Image.
2 Very good performance on very different biomedical segmentation
applications.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Mask RCNN

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Intro
Mask RCNN is a deep learning model that is used for predicting
segmentation masks in an image, and is an extension of the RCNN
(Regional Convolutional Networks) family.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
RCNN
Use Graph Based Segmentation to generate candidate regions.
Selective Search Algorithm generates 2000 Region Proposals by
combining smaller regions into larger ones.
Each of the 2000 proposals is fed into a CNN that outputs a 4096
dimensional feature vector
SVMs for each class are used to classify the presence of that object in
a proposal.
Bounding boxes are generated for each object containing region.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Fast-RCNN
RCNN tries to classify 2000 Region Proposals per image, which both
time consuming and wasteful.
Fast RCNN reduces this time by feeds the input image to the CNN
instead, and then maps the proposed regions onto the convolutional
feature map.
Regions of Interests that are identified are then warped into squares
and then passed through a pooling layer, where they are reshaped
into a fixed size.
The pooled RoIs are fed into a fully connected layer, where a softmax
layer is used for classification, and linear regression is performed for
Bounding Box offset values.
The entire network is trained using Log Loss (for classification) +
Smooth L1 Loss (for Bbox regression).

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Fast RCNN - Architecture

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Faster RCNN - Architecture

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Faster RCNN
Region Proposal Algorithms are heuristical and slow, and hence form
a bottleneck during training and testing.
Faster RCNN uses Region Proposal Network (RPN) to propose
regions from the convolutional feature map.
Uses anchor points for different scales and aspect ratios to account
for different scales of the objects in an image.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Faster RCNN - Anchor Points

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Mask RCNN
Extension of Faster RCNN. Makes predictions for masks also.
Masks have K ∗ m2 dimensional output for each RoI, which encodes
K binary masks of resolution m ∗ m for each of the K classes.
Masks are predicted for the RoI pooled feature map, and need to be
aligned with the input RoI. Thus RoIAlign method is used which uses
bilinear interpolation to align pooled feature map with the input
feature map.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Mask RCNN - Architecture

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Mask RCNN - Loss
Mask RCNN loss function is a multi-task loss function, since it incorporates
the prediction losses of classes, bounding boxes and segmentation masks.
Thus it can be represented as L = Lcls + Lbox + Lmask, where
Lcls: This represents a binary cross entropy loss function for each of
the K classes.
Lbox: This represents the smooth L1 loss function, which is used for
regresssion loss
Lmask: This represents the binary cross entropy loss function, used
for prediction of binary masks for each of the K classes.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Mask RCNN
Conclusions
1 Mask RCNN is highly suitable for use in real time applications
because of its fast runtime.
2 Easy to use implementations are available (Detectron2 by Facebook
and Matterport Implementation in Tensorflow 1.x)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Generative Models

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Idea

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
PixelRNN
p(x) =
n
∏
i=1
p(xi|x1, . . . , xi−1)
⇓
pθ(x) =
n
∏
i=1
pθ(xi|x1, . . . , xi−1)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
PixelRNN

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Drawbacks
1 Slow
2 Hard to learn
3 Pixels are probably less related

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
PixelCNN

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Idea
We are till now looking on Pixel Level, which might not be a good thing,
example, for a classifier for say humans vs no humans, classifier does not
look at all pixels it uses pixels to extract some features and then use them
to make prediction.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Idea
We should be able to generate an image of human given some features
like, height, body shape, skin color, and some other parameters of image
like light, exposure etc.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Autoencoders

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Autoencoders
pθ(x) =
∫
pθ(z)pθ(x|z)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Autoencoders
Loss = ||x̂ − x||

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Loss Autoencoders
L(θ, ϕ|x, z) = min(||x̂ − x|| + KL(q(z)|p(z)))
⇓
L(θ, ϕ|x, z) = min(||x̂ − x|| + KL(q(z)|N(0, 1)))

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Kullback–Leibler (KL) Divergence
KL(p(x)||q(x)) or DKL(p(x)||q(x)) = −
∑
x∼X
p(x)log(
q(x)
p(x)
)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Kullback–Leibler (KL) Divergence
the information gain achieved if p would be used instead of q which is
currently used or relative entropy of p with respect to q

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Gibbs Inequality
KL(p(x) || q(x)) ≥ 0
KL(p(x) || q(x)) = 0 iff p(x) and q(x) are extremely close to each other
for all x.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Multi-Variable KL Divergence
KL(p(⃗
x) || q(⃗
x)) =
tr(
∑−1
1
∑
0) + (µ1 − µ0)T
∑−1
1 (µ0 − µ1) − k + ln(
|
∑
1 |
|
∑
0 |)
2

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Multi-Variable KL Divergence for Normal Distributions
KL(N(⃗
µ, (σ2
1, . . . , σ2
k)) || N(0, I)) =
∑i=k
i=1(σ2
i + µ2
i − 1 − ln(σ2
i ))
2

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Loss Autoencoders Revisit
L(θ, ϕ|x, z) = min(||x̂ − x|| + KL(q(z)|p(z)))
⇓
L(θ, ϕ|x, z) = min(||x̂ − x|| + KL(N( ⃗
µz, (σz1 , . . . , σzk
))|N(0, 1)))
⇓
L(θ, ϕ|x, z) = min(||x̂ − x|| +
∑i=k
i=1(σ2
zi
+ µ2
zi
− 1 − ln(σ2
zi
))
2
)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Quiz?

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
MSE Problem

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
KL Divergence Problem

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
GANs

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
GANs
Generator is similar to a person who is trying to make copies of famous
paintings and Discriminator is like a expert telling the difference between
fake painting and real. both improving each other.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Loss
LD(ŷ, y) = max(y · log(ŷ) + (1 − ŷ) · log(1 − y))
⇓
LD(ŷ, y = 1) = log(D(x))
⇓
LD(ŷ, y = 0) = log(1 − D(G(z)))
⇓
maxD[Ex∼pdata(x)log(D(x)) + Ez∼pz(z)log(1 − D(G(z))]

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Loss
LG = minGlog(1 − D(G(z))
⇓
minG[Ex∼pdata(x)log(D(x)) + Ez∼pz(z)log(1 − D(G(z))]
⇓
minGmaxD[Ex∼pdata(x)log(D(x)) + Ez∼pz(z)log(1 − D(G(z))]

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Generative Adversarial Nets, Goodfellow et al NIPS 2014

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
GANs
Usually Discriminator trains very quickly as compared to Generator, in
such cases we might not get properly trained network

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Loss Function Problem - Vanishing Gradient

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Loss Function Problem - Mode Collapse

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Critic Model (Wasserstein Loss)
minGmaxC(E(c(x)) − E(c(g(z))))
C could be be any number, Approximates Earth Movers Distance

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Earth Movers Distance

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
1-Lipschitz continuity
A function is said to be 1-Lipschitz continuous if norm of its gradient is at
most one.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
1-Lipschitz continuity
Critics Neural Network must be 1-Lipschitz continuous as this ensures that
W-Loss is eﬀiciently approximating the Earth Movers Distance.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Including 1-Lipschitz continuity - Weight Clipping
We clip weights of model in a range so that 1-Lipschitz continuity is
enforced, however adversely influences ability to learn.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Including 1-Lipschitz continuity - Loss Function
minGmaxC(E(c(x)) − E(c(g(z))) + λreg)
⇓
minGmaxC(E(c(x)) − E(c(g(z))) + λE((||∇ · c(ϵx′
+ (1 − ϵ)g(z′
))|| − 1)2
)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Including 1-Lipschitz continuity - Loss Function

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Controllable GANs

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Controllable GANs
We need to find the direction to move for a particular feature in Z
space
There can be co-relation (example male face and beard)
There could be a lot of entanglement
Using Pre-Trained Classifiers to find directions

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Conditional GANs

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Generative Models
Evaluating GANs
No Universal Discriminator
Fidelity
Diversity

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
AR
Placing

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
AR
Scaling

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
AR
Occlusion

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
AR
Light and Color

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Thank You
Thank You!

BTP Learning Outcome.pdf

Recommended

Recommended

More Related Content

Similar to BTP Learning Outcome.pdf

Similar to BTP Learning Outcome.pdf (20)

More from niveditJain

More from niveditJain (17)

Recently uploaded

Recently uploaded (20)

BTP Learning Outcome.pdf