ICLR 2020 Recap

1
ICLR 2020 Recap
Selected Paper summaries and discussions
Sanyam Bhutani
ML Engineer & AI Content Creator
bhutanisanyam1
🎙: ctdsshow

Democratizing AI
Our mission to use AI for Good permeates into everything we do
AI Transformation
Bringing AI to industry by helping
companies transform their
businesses with H2O.ai.
Trusted Partner
AI4GOOD
Bringing AI to impact by augmenting
non-profits and social ventures with
technological resources and
capabilities.
Impact/Social
Open Source
An industry leader in providing
open source, cutting edge AI & ML
platforms (H2O-3).
Community

Confidential3
Founded in Silicon Valley 2012
Funding: $147M | Series D
Investors: Goldman Sachs, Ping An,
Wells Fargo, NVIDIA, Nexus Ventures
We are Established
We Make World-class AI Platforms
We are Global
H2O Open Source Machine Learning
H2O Driverless AI: Automatic Machine Learning
H2O Q: AI platform for business users
Mountain View, NYC, London, Paris, Ottawa,
Prague, Chennai, Singapore
220+ 1K
20K 180K
Universities
Companies Using
H2O Open Source
Meetup Members
Experts
H2O.ai Snapshot
We are Passionate about Customers
4X customers, 2 years, all industries, all continents
Aetna/CVS, Allergan, AT&T, CapitalOne, CBA, Citi,
Coca Cola, Bredesco, Dish, Disney, Franklin
Templeton, Genentech, Kaiser Permanente, Lego,
Merck, Pepsi, Reckitt Benckiser, Roche

Confidential4
Our Team is Made up of the World’s Leading Data Scientists
Your projects are backed by 10% of the World’s Data
Science Grandmasters who are relentless in solving
your critical problems.

Make Your Company an
AI Company

7
AGENDA
• What is ICLR?
• Paper Selection
• 8 Paper Summaries
• Q & A

9
Paper Summaries
• GAN related use cases
• Deployment discussions
• Adversarial attacks
• Sesame Street (Transformers)

Confidential10
The Cutting edge of DL is about Engineering
- Jeremy Howard

Confidential11
tive Attentional Networks with Adaptive Layer-Instance Normalization
- Junho Kim et al

12
• Image to Image Translation:
- Selfie2Anime
- Horse2Zebra
- Dog2Cat
- Photo2VanGough
• Method for unsupervised image-to-image translation
• Attention! (Attention is all you need)
• Adaptive Layer- Instance Normalisation (AdaLIN)
U-GAT-IT

13
Architecture
• Appreciating the problem
• Attention! (Attention is all you
need)
• Adaptive Layer- Instance
Normalisation (AdaLIN)

14
• Using attention to guide different
geometric transforms
• Introduction of a new normalising
function
• Image 2 Image translation (And
Backwards!)
To Summarise

Confidential15
ix: A Simple Data processing method to improve robustness and unce
- Dan Hendrycks et al

16
• Why do you need image
augmentations?
• Test and Train split should be similar
• Comparison of recent techniques
• Why is AugMix promising?
Image Augmentations

18
• Mixes augmented images and enforces consistent embeddings of the augmented images, which results in increased robustness and improved uncertainty calibration.
• AutoAugment
• AugMix does not require tuning to work correctly: enables plug-and-play data augmentation
To Summarise

Confidential19
ELECTRA: Pre-Training Text Encoders as
Discriminators rather than
Generators
- Kevin Clark et al

21
• Progress in NLP as a
measure of GLUE score
• What is GLUE Score?
Pre-Training Progress

23
• Progress in NLP as a
measure of GLUE score
• What is GLUE Score?
• Normalised by Pre-
Training FLOPs
Pre-Training Progress

24
• BERT family uses MLM
• Suggested: A bi-
directional model that
learns from all of the
tokens rather than some
% masks
Masked LM & ELECTRA

25
% masks
Masked LM & ELECTRA

26
% masks
Masked LM & ELECTRA
ELECTRA Pre-Training outperforms MLM Pre-Training

27
• Replacing token detection: a new self-supervised task for language
representation learning.
• Training a text encoder to distinguish input tokens from high-quality
negative samples produced by an small generator network
• It works well even when using relatively small amounts of compute
• 45x/8x speedup over Train/Inference when compared to BERT-
Base
To Summarise

Confidential28
ALBERT: A Lite BERT
for Language
Understanding
- Zhenzhong Lan et al

29
• At some point further model increases
become harder due to GPU/TPU
memory limitations
• Is having better NLP models as easy as
having larger models?
• How can we reduce Parameters?
Introduction

30
• Token Embeddings are sparsely populated -> Reduce size by projections
• Re-Use Parameters of repeated operations
Proposed Changes

31
•Sentence Order Prediction for
capturing inter-sentence coherence
•Remove Dropout!
•Adding more data increases
performance
Three More Tricks!

Confidential32
nce for All: Train One Network and Specialize it for Efficient Deploymen
- Han Cai et al

33
• Efficient Deployment of DL models
across devices
• Conventional approach: Train
specialised Models: Think SqueezeNet,
MobileNet,etc
• Training Costs $$$, Engineering costs
$$$
Introduction

34
• Train Once, Specialise for deployment
• Key Idea: Decouple model training from
architectural search
• Algorithm proposed: Progressive
Shrinking
Proposed Approach

35
Base
To Summarise

36
Base
To Summarise

Confidential37
Thieves on Sesame Street! Model Extraction of BERT-based APIs
- Kalpesh Krishna et al

38
• Random sentences to understand the model
• After performing a large number of attacks, you have labels and dataset
• Note: These are economically practical (Cheaper than trying to train a model)
• Note 2: This is not model distillation, it’s IP Theft
Attacks

40
• Membership classification: Flagging
queries
• API Watermarking: Some % of queries
are return a wrong output,
“watermarked queries” and their
outputs are stored on the API side.
• Note: Both of these would fail against
smart attacks
Suggested Solutions

Confidential41
olling Text Generation with Plug and Play Language M
- Rosanne Liu et al

43
• LMs can generate coherent, relatable
text, either from scratch or by
completing a passage started by the
user.
• BUT, they are hard to steer or control.
• Can also be triggered by certain
adversarial attacks
Introduction

44
• Controlled generation: Adding knobs with
conditional probability
• Consists of 3 Steps:
Controlling the Mammoth

46
• Controlled generation: Adding knobs with
conditional probability
• Consists of 3 Steps
• Also allows reduction in toxicity
63% to ~5%!
Controlling the Mammoth

Confidential47
ENERATIVE MODELS FOR EFFECTIVE ML ON PRIVATE, DECENTRALIZED DATASET
- Sean Augenstein et al

48
• Modelling is important: Looking at data is
a large part of the pipeline
• Manual data inspection is problematic for
privacy-sensitive dataset
• Problem: Your model resides on your
server, data on end devices
Introduction

49
• Modelling is important: Looking at data is
a large part of the pipeline
• Manual data inspection is problematic for
privacy-sensitive dataset
• Problem: Your model resides on your
server, data on end devices
Suggested Solutions

50
• DP: Federated GANs:
- Train on user device
- Inspect generated data
• Repository showcases:
- Language Modelling with DP RNN
- Image Modelling with DP GANs
Suggested Solutions

Thank You! 🍵
bhutanisanyam1
🎙: ctdsshow

ICLR 2020 Recap

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ICLR 2020 Recap

Similar to ICLR 2020 Recap (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

ICLR 2020 Recap

Editor's Notes