MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop)

•

0 likes•52 views

Presented the MINR framework, integrating implicit neural representations and masked image modeling to surpass the limitations of MAE.

Technology

MINR: Implicit Neural Representations
with Masked Image Modelling
Sua Lee, Joonhun Lee, and Myungjoo Kang
Seoul National University
OOD-CV
Task. Masked Image modelling(MIM) is notable self-
supervised learning method, deliberately masking images
and training models to reconstruct the hidden information for
robust feature representations.
Introduction
Motivation. The limitation of Masked autoencoder(MAE),
one of the powerful MIM method, is its dependency on
masking strategies.
Masked Implicit Neural Representations(MINR) framework
effectively combines Implicit Neural Representations(INR)
with MIM to address the limitations of MAE.
Contributions
• Leverages INR to learn a continuous function, less
affected by variations of visible patches information.
Integrating INRs with MIM
• Alleviates the reliance on heavy pretrained model
dependencies, with considerably reduced params.
Methodology
Results
• Learned continuous function provides greater flexibility
in creating embeddings for various downstream tasks.
• Apply random masking to a dataset 𝑂 = 𝑜 𝑛
𝑛=1
𝑁
containing 𝑁 observations 𝑜 𝑛 = {𝑥𝑖
𝑛
, 𝑦𝑖
𝑛
} :
𝑀 = 𝑚 𝑛 𝑚 𝑛 ≔ 𝑀𝑎𝑠𝑘 𝑛 (𝑜 𝑛 )}𝑛=1
𝑁
• Estimate a continuous function 𝑓𝜃: ℝ2 → ℝ3, which maps
input coordinates to corresponding properties.
ℒ𝑛 𝜃 𝑛 ; 𝑚 𝑛 =
1
𝐻 × 𝑊
෍
𝑖=1
𝐻×𝑊
𝑦𝑖
𝑛
− 𝑓𝜃 𝑛 𝑥𝑖
𝑛
2
2
To expand independent training to a large-scale out-of-
distribution(OOD) datasets, we utilizes INR approaches
that utilize transformer-based hypernetwork.
• The hypernetwork predicts the entire set of INR weights
𝜃 𝑛 = 𝑊𝑙 𝑙=1
𝐿
, and Θ = 𝜃 𝑛
𝑛=1
𝑁
ℒ Θ; 𝑀 =
1
𝑁
෍
𝑛=1
𝑁
ℒ𝑛(𝜃 𝑛 ; 𝑚 𝑛 )
TransINR -based approach
GINR -based approach
• Generalized-INR partitions the MLP hidden layers into
instance-specific 𝜃 and instance-agnostic layers 𝜙. It is
empirically found to be effective in learning common and
unique patterns across a dataset, making it ideal for
OOD settings.
ℒ Θ, 𝜙; 𝑀 =
1
𝑁
෍
𝑛=1
𝑁
ℒ𝑛(𝜃 𝑛 , 𝜙; 𝑚 𝑛 )
• We employ three datasets for experiments, namely
CelebA, Imagenette, and MIT-Indoor67. Notably, these
datasets were selected due to their varied nature, for the
evaluation of the robustness of our method across
different data distributions.
In-domain performance
Out-of-domain performance
Qualitative results
For a fair comparison,
we visualize the results
by pasting unmasked
patches onto the
reconstruction results.
[1] He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition. 2022.
[2] Chen, Yinbo, and Xiaolong Wang. "Transformers as meta-learners for implicit neural
representations." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.
[3] Kim, Chiheon, et al. "Generalizable Implicit Neural Representations via Instance Pattern
Composers." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
References
Acknowledgement
Myungjoo Kang was supported by the NRF grant [2012R1A2C3010887] and the MSIT/IITP ([1711117093], [NO.2021-0-
00077], [No. 2021-0-01343, Artificial Intelligence Graduate School Program (SNU)]).
Our approaches clarity
enhanced the
reconstruction
performance for all
experiment settings.
[1]
[2]
[3]

Similar to MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop)

Robust Malware Detection using Residual Attention NetworkShamika Ganesan

Image classification with Deep Neural NetworksYogendra Tamang

[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...Jihoo Kim

Sign Detection from Hearing ImpairedIRJET Journal

Enhanced characterness for text detection in the wildPrerana Mukherjee

A simple framework for contrastive learning of visual representationsDevansh16

contrastive-learning2.pdfomogire

Model-Based Reinforcement Learning @NIPS2017mooopan

[IJCAI 2023] SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Effi...Ziyuan Zhao

Introduction to Interpretable Machine LearningNguyen Giang

Pixel Recurrent Neural Networksneouyghur

An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo

Face recognition using gaussian mixture model & artificial neural networkeSAT Journals

IRJET- Machine Learning based Object Identification System using PythonIRJET Journal

A hybrid approach for face recognition using a convolutional neural network c...IAESIJAI

Manifold learning for credit risk assessment Armando Vieira

Comparative Study of Pre-Trained Neural Network Models in Detection of GlaucomaIRJET Journal

Exploring Simple Siamese Representation LearningSangmin Woo

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline

Similar to MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop) (20)

Robust Malware Detection using Residual Attention Network

Image classification with Deep Neural Networks

[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversar...

Sign Detection from Hearing Impaired

Enhanced characterness for text detection in the wild

A simple framework for contrastive learning of visual representations

contrastive-learning2.pdf

Model-Based Reinforcement Learning @NIPS2017

[IJCAI 2023] SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Effi...

Introduction to Interpretable Machine Learning

Pixel Recurrent Neural Networks

An Empirical Study of Training Self-Supervised Vision Transformers.pptx

Face recognition using gaussian mixture model & artificial neural network

IRJET- Machine Learning based Object Identification System using Python

A hybrid approach for face recognition using a convolutional neural network c...

Manifold learning for credit risk assessment

Comparative Study of Pre-Trained Neural Network Models in Detection of Glaucoma

Exploring Simple Siamese Representation Learning

PR-231: A Simple Framework for Contrastive Learning of Visual Representations

Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021

JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37

Generative AI Use Cases and Applications.pdfalexjohnson7307

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance

WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero

Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance

Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson

ADP Passwordless Journey Case Study.pptxFIDO Alliance

Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier

ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek

How to Check CNIC Information Online with Pakdata cfdanishmna97

State of the Smart Building Startup Landscape 2024!Memoori

Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG

UiPath manufacturing technology benefits and AI overviewDianaGray10

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan

Design Guidelines for Passkeys 2024.pptxFIDO Alliance

Top 10 CodeIgniter Development CompaniesTopCSSGallery

AI in Action: Real World Use Cases by AnitarajAnitaRaj43

How to Check GPS Location with a Live Tracker in Pakistandanishmna97

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology

JohnPollard-hybrid-app-RailsConf2024.pptx

Generative AI Use Cases and Applications.pdf

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx

WebRTC and SIP not just audio and video @ OpenSIPS 2024

Intro to Passkeys and the State of Passwordless.pptx

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots

ADP Passwordless Journey Case Study.pptx

Design and Development of a Provenance Capture Platform for Data Science

ChatGPT and Beyond - Elevating DevOps Productivity

How to Check CNIC Information Online with Pakdata cf

State of the Smart Building Startup Landscape 2024!

Oauth 2.0 Introduction and Flows with MuleSoft

Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx

UiPath manufacturing technology benefits and AI overview

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf

Design Guidelines for Passkeys 2024.pptx

Top 10 CodeIgniter Development Companies

AI in Action: Real World Use Cases by Anitaraj

How to Check GPS Location with a Live Tracker in Pakistan

MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop)

1. MINR: Implicit Neural Representations with Masked Image Modelling Sua Lee, Joonhun Lee, and Myungjoo Kang Seoul National University OOD-CV Task. Masked Image modelling(MIM) is notable self- supervised learning method, deliberately masking images and training models to reconstruct the hidden information for robust feature representations. Introduction Motivation. The limitation of Masked autoencoder(MAE), one of the powerful MIM method, is its dependency on masking strategies. Masked Implicit Neural Representations(MINR) framework effectively combines Implicit Neural Representations(INR) with MIM to address the limitations of MAE. Contributions • Leverages INR to learn a continuous function, less affected by variations of visible patches information. Integrating INRs with MIM • Alleviates the reliance on heavy pretrained model dependencies, with considerably reduced params. Methodology Results • Learned continuous function provides greater flexibility in creating embeddings for various downstream tasks. • Apply random masking to a dataset 𝑂 = 𝑜 𝑛 𝑛=1 𝑁 containing 𝑁 observations 𝑜 𝑛 = {𝑥𝑖 𝑛 , 𝑦𝑖 𝑛 } : 𝑀 = 𝑚 𝑛 𝑚 𝑛 ≔ 𝑀𝑎𝑠𝑘 𝑛 (𝑜 𝑛 )}𝑛=1 𝑁 • Estimate a continuous function 𝑓𝜃: ℝ2 → ℝ3, which maps input coordinates to corresponding properties. ℒ𝑛 𝜃 𝑛 ; 𝑚 𝑛 = 1 𝐻 × 𝑊 ෍ 𝑖=1 𝐻×𝑊 𝑦𝑖 𝑛 − 𝑓𝜃 𝑛 𝑥𝑖 𝑛 2 2 To expand independent training to a large-scale out-of- distribution(OOD) datasets, we utilizes INR approaches that utilize transformer-based hypernetwork. • The hypernetwork predicts the entire set of INR weights 𝜃 𝑛 = 𝑊𝑙 𝑙=1 𝐿 , and Θ = 𝜃 𝑛 𝑛=1 𝑁 ℒ Θ; 𝑀 = 1 𝑁 ෍ 𝑛=1 𝑁 ℒ𝑛(𝜃 𝑛 ; 𝑚 𝑛 ) TransINR -based approach GINR -based approach • Generalized-INR partitions the MLP hidden layers into instance-specific 𝜃 and instance-agnostic layers 𝜙. It is empirically found to be effective in learning common and unique patterns across a dataset, making it ideal for OOD settings. ℒ Θ, 𝜙; 𝑀 = 1 𝑁 ෍ 𝑛=1 𝑁 ℒ𝑛(𝜃 𝑛 , 𝜙; 𝑚 𝑛 ) • We employ three datasets for experiments, namely CelebA, Imagenette, and MIT-Indoor67. Notably, these datasets were selected due to their varied nature, for the evaluation of the robustness of our method across different data distributions. In-domain performance Out-of-domain performance Qualitative results For a fair comparison, we visualize the results by pasting unmasked patches onto the reconstruction results. [1] He, Kaiming, et al. "Masked autoencoders are scalable vision learners." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. [2] Chen, Yinbo, and Xiaolong Wang. "Transformers as meta-learners for implicit neural representations." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022. [3] Kim, Chiheon, et al. "Generalizable Implicit Neural Representations via Instance Pattern Composers." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. References Acknowledgement Myungjoo Kang was supported by the NRF grant [2012R1A2C3010887] and the MSIT/IITP ([1711117093], [NO.2021-0- 00077], [No. 2021-0-01343, Artificial Intelligence Graduate School Program (SNU)]). Our approaches clarity enhanced the reconstruction performance for all experiment settings. [1] [2] [3]

MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop)

Recommended

Recommended

More Related Content

Similar to MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop)

Similar to MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop) (20)

Recently uploaded

Recently uploaded (20)

MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 OOD-CV Workshop)