OOD_PPT.pptx

OUT OF THE DISTRIBUTION
DETECTION
C
0
Applied AI research
by
Data & AI Team

PRESENTATION │ C0│ AUGUST 2023 │ 2
INTRODUCTION
• AI models  Critical applications
• AI Models  Robustness and Safety
• Most of AI models  Not safe; Overconfident
Source- https://researchleap.com/research-in-autonomous-driving-a-historic-
bibliometric-view-of-the-research-development-in-autonomous-driving/
Source- https://hexanika.com/ai-in-regulatory-compliance-if-you-are-not-in-then-
you-will-be-out/
Source-
https://health.economictimes.indiatimes.com/news/hospitals/th
e-importance-of-ai-for-future-ready-hospitals/97672951

INTRODUCTION
• A trustworthy AI model should not only produce accurate predictions on known context, but also detect
unknown examples and reject them (or hand them over to human users for safe handling)
• Most existing machine learning models are trained based on the closed-world assumption where the test
data is assumed to be drawn i.i.d. from the same distribution as the training data, known as in-distribution
(ID). However, when models are deployed in an open-world scenario test samples can be out-of-distribution
(OOD) and therefore should be handled with caution.
• The distributional shifts can be caused by semantic shift (e.g., OOD samples are drawn from different
classes), or covariate shift (e.g., OOD samples from a different domain).
• Potential use cases
• In safety critical applications such as autonomous driving, the driving system must issue a warning and
hand over the control to drivers when it detects unusual scenes or objects it has never seen during
training.
• In healthcare applications where identifying non-diseased as diseased and vice-versa are dangerous
• In regulatory compliance identifying unknown document as any of the known documents is risky (KYC
use case)

A SIMPLE DISCRIMINATIVE MODEL
Decision boundary (P(y|X)
CNI TSJ
• Learns Decision boundary; have no idea about population characteristics
• Over-confident; dangerous to deploy for critical applications
What if this classifier decides
• If the object Infront of an AV is a person or not?
• If a person is having a critical disease or not?
• If the client is compliant as per a regulator or not?
It’s a CNI
In-Distribution
Out-of-Distribution
Source- https://infonet.fr/kbis/
Source-
https://ingroupe.com/product/residence-permit/

A SIMPLE GENERATIVE MODEL
KBIS
Population characteristics; P(X,y)
1. What is my population?
2. Large Data; costly data gathering
3. Large models to capture all the nuances of population
4. Computational resources
5. Is the technology mature enough?
I don’t know
In-Distribution
Out-of-Distribution

PROBLEM STATEMENT
• Building robust and safe AI models is very important for critical applications
• Technology is not mature enough and research is required
• In this work, we have been attempting to build OOD model for banking applications using
generative AI models
KBIS
Population characteristics; P(X,y)
I don’t know
We need an AI model which can say I don’t know !!

LITERATURE SURVEY
Generative Approaches
Discriminative
Approaches
OOD Approaches
VAEs
Likelihood
Ratio
GANs
SoftMax
score based
Distance
based
Energy
based
methods
Conformal
Scores

SUMMARY OF LITERATURE SURVEY
Two types of approaches: Discriminative and Generative
Classifier based approaches:
1. OOD inputs tend to have lower Soft-max scores (P(y|x)) than in-distribution data [Hendrycks & Gimpel]
2. High entropy of the predicted class distribution, and therefore a high predictive uncertainty, which suggests that the input may be
OOD.
3. The ODIN method proposed by Liang et al. (2017). ODIN uses temperature scaling (Guo et al., 2017), adds small perturbations to
the input, and applies a threshold to the resulting predicted class to distinguish in- and out-of- distribution inputs. This method is an
improvement over Hendrycks & Gimpel
4. The Mahalanobis distance of the input to the nearest class-conditional Gaussian distribution estimated from the in-distribution data.
Lee et al. (2018) fit class-conditional Gaussian distributions to the activations from the last layer of the neural network.
5. The classifier-based ensemble method that uses the average of the predictions from multiple independently trained models with
random initialization of network parameters and random shuffling of training inputs (Lakshminarayanan et al., 2017).
6. The log-odds of a binary classifier trained to distinguish between in-distribution inputs from all classes as one class and randomly
perturbed in-distribution inputs as the other.
7. The maximum class probability over K in-distribution classes of a (K +1)-class classifier where the additional class is perturbed in-
distribution.
Generative model based approaches
1. Auto-regressive models: PixelCNN (likelihood ratio)
2. VAE
3. GAN

DATASET CREATION
• Benchmark datasets are available which are not suitable for banking applications
eg: CIFAR vs SVHN
• Inlier  46 classes of BDDF
• Outlier  Trade Finance (10), CIFAR(10), Imagenet (100)
In-Distribution
Dataset No. Of. Samples
46 document classes 1,000,000
Out-Of-Distribution
Trade Finance 150
CIFAR 10 170
ImageNet 1800

DISCRIMINATIVE APPROACHES
1. Softmax scores method
• 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑥𝑖 =
e𝑥𝑝(𝑃)
e𝑥𝑝(𝑃)
• Not true probabilities; overconfident
• Lack of uncertainty estimation
2. Temperature scaling
• 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑥𝑖 =
e𝑥𝑝(
𝑃𝑖
𝑇
)
e𝑥𝑝(
𝑃𝑗
𝑇
)
• T>1 softmax probabilities become smoother; T<1 softmax probabilities become sharper
3. Energy =− 𝑙𝑜𝑔(𝑃)
• Higher energy values higher uncertainty or lower confidence; lower energy values  higher
confidence
• Sensitivity to small changes
• Robustness to calibration issues
4. Entropy = − 𝑃 ∗ log 𝑃
• Sensitivity to uncertainty high entropy indicates a more uncertain or ambiguous prediction
• Invariance to temperature scaling  softmax scores are affected by temperature scaling whereas
entropy is invariant to temperature scaling
• Considers all classes
•

CONFORMAL SCORES METHOD
Source- Angelopoulos, A. N. ( 2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. . arXiv preprint arXiv:2107.07511.

LIKELIHOOD SCORES METHOD
PixelCN
N
𝑃 𝑥 = 𝑝 𝑥𝑖 𝑥1, … , 𝑥𝑖−1
Architecture
• Masked convolutions
• Residual connections
• 1x1 convolutions
• Input  image; output reconstructed image
• Loss: Cross entropy
Masked convolution layer
Autoregressive inference

LIKELIHOOD SCORES METHOD
PixelCNN
Add noise
PixelCNN
• Image  Sematic information & background information
• Background information confounds PixelCNN; So eliminate background information
• Two models are trained : Background and semantic model
• Ratio of the likelihood scores from two models is likelihood ratio
𝐿𝐿𝑅 𝑥 = log
𝑝𝜃𝐹 𝑥
𝑝𝜃𝐵 𝑥
= log
𝑃𝜃𝐹 𝑥𝐵 𝑃𝜃𝐹 𝑥𝑠
𝑃𝜃𝐵 𝑥𝐵 𝑃𝜃𝐵 𝑥𝑠
LLR x ≈ log PθF 𝑥𝑠 − log 𝑃ΘB 𝑥𝑠
Background model
Semantic model

COMPARISON

EXPERIMENTS & RESULTS
Approach True Positive Rate False positive Rate
Likelihood Score 98 13.75
Likelihood Ratio 98.15 35.19
Conformal Scores 97.28 37.3
Energy Scores 98 43.68
Entropy Scores 98 38.82
Softmax Scores 98 64.33
**True Positive Rate: Inliers being correctly predicted as inliers.(should be as high as possible)
False Positive Rate: Outliers being incorrectly predicted as inliers.(should be as low as possible)
FPR@TPRx is the commonly-used metric for evaluating OOD
models

CONCLUSION
• We observe that likelihood method rather than likelihood ratio method performs better.
• This observation is in line with our hypothesis that the document images are fundamentally different from
real world images such as CIFAR and ImageNet.
• In real-world images low level or high-frequency features are irrelevant and adds noise in image
classification.
• However, for document data, high frequency features are more important and hence cancelling out these
features will deteriorate the performance of the model.

• Exploration of other generative approaches  Like VAEs, GANs
• Industrialization of the model  Adding it to our document AI pipeline
FUTURE WORK

REFERENCES
1. Angelopoulos, A. N. ( 2021). A gentle introduction to conformal prediction and distribution-free uncertainty
quantification. . arXiv preprint arXiv:2107.07511.
2. Hendrycks, D. &. ( 2016). A baseline for detecting misclassified and out-of-distribution examples in neural
networks. arXiv preprint arXiv:1610.02136.
3. Nalisnick, E. M. (2018). Do deep generative models know what they don't know? arXiv preprint arXiv:1810.09136.
4. Ren, J. L. ((2019)). Likelihood ratios for out-of-distribution detection.
5. Van den Oord, A. K. ( 2016). Conditional image generation with pixelcnn decoders. . Advances in neural information
processing systems.

THANK YOU

OOD_PPT.pptx

Recommended

Recommended

More Related Content

Similar to OOD_PPT.pptx

Similar to OOD_PPT.pptx (20)

Recently uploaded

Recently uploaded (20)

OOD_PPT.pptx

Editor's Notes