Deep implicit layers allow neural networks to solve structured problems by following algorithmic rules. They include layers for convex optimization, discrete optimization, differential equations, and more. The forward pass runs an algorithm, while the backward pass computes gradients using algorithmic properties like KKT conditions. This enables problems like structured prediction, meta-learning, and time series modeling to be solved reliably with neural networks by respecting their underlying structure.
ICML2018読み会: Overview of NLP / Adversarial AttacksMotoki Sato
ICML 2018読み会の資料.
Overview of NLP/ Adversarial Attacks
- Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
- Synthesizing Robust Adversarial Examples
- Black-box Adversarial Attacks with Limited Queries and Information
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
오사카 대학 박사과정인 Takato Horii군이 작성한 자료
데이터 생성 모델로 우수한 GAN을 이용하여 비지도학습을 통해
"알기쉬게" 이미지의 정보를 표현하는 특징량을 "간단하게"획득하기
* 특징이 서로 얽혀있는 Physical space에서 서로 독립적인 Eigen space로 변환하는 것과 같은 원리
ICML2018読み会: Overview of NLP / Adversarial AttacksMotoki Sato
ICML 2018読み会の資料.
Overview of NLP/ Adversarial Attacks
- Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples
- Synthesizing Robust Adversarial Examples
- Black-box Adversarial Attacks with Limited Queries and Information
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
오사카 대학 박사과정인 Takato Horii군이 작성한 자료
데이터 생성 모델로 우수한 GAN을 이용하여 비지도학습을 통해
"알기쉬게" 이미지의 정보를 표현하는 특징량을 "간단하게"획득하기
* 특징이 서로 얽혀있는 Physical space에서 서로 독립적인 Eigen space로 변환하는 것과 같은 원리
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
Knowledge distillation aims at transferring “knowledge” acquired in one model (teacher) to another model (student) that is typically smaller.
Previous approaches can be expressed as a form of training the student with output activations of data examples represented by the teacher.
We introduce a novel approach, dubbed relational knowledge distillation (Relational KD), that transfers relations among data examples represented by the teacher.
As concrete realizations of Relational KD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations.
Experiments conducted on different benchmark tasks show that the Relational KD improves the performance of the educated student networks with a significant margin, and even outperforms the teacher’s performance.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
Graph Representation Learning with Deep Embedding Approach:
Graphs are commonly used data structure for representing the real-world relationships, e.g., molecular structure, knowledge graphs, social and communication networks. The effective encoding of graphical information is essential to the success of such applications. In this talk I’ll first describe a general deep learning framework, namely structure2vec, for end to end graph feature representation learning. Then I’ll present the direct application of this model on graph problems on different scales, including community detection and molecule graph classification/regression. We then extend the embedding idea to temporal evolving user-product interaction graph for recommendation. Finally I’ll present our latest work on leveraging the reinforcement learning technique for graph combinatorial optimization, including vertex cover problem for social influence maximization and traveling salesman problem for scheduling management.
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
The slides for the techniques to use Convolutional Neural Networks (CNN) for the sequence modeling tasks, including image captioning and natural machine translation (NMT). The slides contain the main building blocks from different papers. Used in group paper reading in University of Sydney.
TensorFlow Korea 논문읽기모임 PR12 243째 논문 review입니다
이번 논문은 RegNet으로 알려진 Facebook AI Research의 Designing Network Design Spaces 입니다.
CNN을 디자인할 때, bottleneck layer는 정말 좋을까요? layer 수는 많을 수록 높은 성능을 낼까요? activation map의 width, height를 절반으로 줄일 때(stride 2 혹은 pooling), channel을 2배로 늘려주는데 이게 최선일까요? 혹시 bottleneck layer가 없는 게 더 좋지는 않은지, 최고 성능을 내는 layer 수에 magic number가 있는 건 아닐지, activation이 절반으로 줄어들 때 channel을 2배가 아니라 3배로 늘리는 게 더 좋은건 아닌지?
이 논문에서는 하나의 neural network을 잘 design하는 것이 아니라 Auto ML과 같은 기술로 좋은 neural network을 찾을 수 있는 즉 좋은 neural network들이 살고 있는 좋은 design space를 design하는 방법에 대해서 얘기하고 있습니다. constraint이 거의 없는 design space에서 human-in-the-loop을 통해 좋은 design space로 그 공간을 좁혀나가는 방법을 제안하였는데요, EfficientNet보다 더 좋은 성능을 보여주는 RegNet은 어떤 design space에서 탄생하였는지 그리고 그 과정에서 우리가 당연하게 여기고 있었던 design choice들이 잘못된 부분은 없었는지 아래 동영상에서 확인하실 수 있습니다~
영상링크: https://youtu.be/bnbKQRae_u4
논문링크: https://arxiv.org/abs/2003.13678
Brief History of Visual Representation LearningSangwoo Mo
- [2012-2015] Evolution of deep learning architectures
- [2016-2019] Learning paradigms for diverse tasks
- [2020-current] Scaling laws and foundation models
Learning Visual Representations from Uncurated DataSangwoo Mo
Slide about the defense of my Ph.D. dissertation: "Learning Visual Representations from Uncurated Data"
It includes four papers about
- Learning from multi-object images for contrastive learning [1] and Vision Transformer (ViT) [2]
- Learning with limited labels (semi-sup) for image classification [3] and vision-language [4] models
[1] Mo*, Kang* et al. Object-aware Contrastive Learning for Debiased Scene Representation. NeurIPS’21.
[2] Kang*, Mo* et al. OAMixer: Object-aware Mixing Layer for Vision Transformers. CVPRW’22.
[3] Mo et al. RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data. ICLR’23.
[4] Mo et al. S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions. Under Review.
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
Lab seminar introduces Ting Chen's recent 3 works:
- Pix2seq: A Language Modeling Framework for Object Detection (ICLR’22)
- A Unified Sequence Interface for Vision Tasks (NeurIPS’22)
- A Generalist Framework for Panoptic Segmentation of Images and Videos (submitted to ICLR’23)
Lab seminar on
- Sharpness-Aware Minimization for Efficiently Improving Generalization (ICLR 2021)
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations (under review)
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
2. Can deep learning solve structured problems?
2
• Deep learning has shown remarkable success on perception (system 1) tasks
= “3”
3. Can deep learning solve structured problems?
3
• Deep learning has shown remarkable success on perception (system 1) tasks
• Can deep learning also solve the complex reasoning (system 2) problems?
= “3”
Solve Sudoku
4. Can deep learning solve structured problems?
4
• Structured reasoning problems require an algorithmic thinking
5. Can deep learning solve structured problems?
5
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable
6. Can deep learning solve structured problems?
6
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable
• Why we need the implicit layers?
• Reliable and generalizable prediction from interpretable rules[1,2]
• It is importance to choose a proper architecture following the problem’s structure
[1] Chen et al. Understanding Deep Architectures with Reasoning Layer. NeurIPS 2020.
[2] Xu et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks. ICLR 2021.
7. Can deep learning solve structured problems?
7
• Deep implicit layers: design a layer to follow the algorithmic rule
• Output of the layer is a solution of an algorithm, not a simple calculus 𝜎(𝑊𝑧 + 𝑏)
• Forward: 𝑧∗ = Algorithm"(𝑧)
• Backward: both
#$∗
#$
and
#$∗
#"
should be computable
• Examples of implicit layers
• (Convex) optimization (application: meta-learning, structured prediction)
• Discrete optimization (application: abstract reasoning)
• Differential equation (application: sequential modeling, density estimation)
• Fixed-point iteration (application: memory-efficient architectures)
• Planning & control (application: model-based RL)
• …and so on (e.g., ranking & sorting)
8. Implicit layer – (Convex) optimization
8
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Technical detail: OptNet considers a quadratic program
then the gradients over parameters 𝑄, 𝑞, 𝐴, 𝑏, 𝐺, ℎ are given by
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Agrawal et al. Differentiable Convex Optimization Layers. NeurIPS 2019.
9. Implicit layer – (Convex) optimization
9
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Ridge/SVM classifier upon deep features
• Train a classifier upon 𝑘-shot features (ProtoNet uses 𝑘-means classifier)
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Lee et al. Meta-Learning with Differentiable Convex Optimization. CVPR 2019.
10. Implicit layer – (Convex) optimization
10
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Inner loop of MAML as a solution of regularized optimization
• No early stopping heuristic as original MAML (# of inner loops vary)
• Does not keep intermediate trajectory (property of optima)
→ Can apply arbitrary number of inner loops
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
11. Implicit layer – (Convex) optimization
11
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Inner loop of MAML as a solution of regularized optimization
• Does not keep intermediate trajectory (property of optima)
• Efficient computation when # of inner loops are large
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
12. Implicit layer – (Convex) optimization
12
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Inner loop of MAML as a solution of regularized optimization
• Does not keep intermediate trajectory (property of optima)
• Technical detail: meta-gradient is
where Jacobian is
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Rajeswaran et al. Meta-Learning with Implicit Gradients. NeurIPS 2019.
13. Implicit layer – (Convex) optimization
13
• How it works?
• Forward: (convex) optimization solvers
• Backward: use property of optima (e.g., KKT condition)
• Application: Structured prediction by minimizing an energy function
• Solve an energy-based model (EBM) using deep feature 𝑓(𝑥)
• Output is 𝑦∗ = min
%
𝐸"(𝑦; 𝑓 𝑥 )
[1] Amos & Kolter. OptNet: Differentiable Optimization as a Layer in Neural Networks. ICML 2017.
[2] Belanger et al. End-to-End Learning for Structured Prediction Energy Networks. ICML 2017.
14. Implicit layer – Discrete optimization
14
• How it works?
• Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem)
• Backward: use property of optima
[1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.
15. Implicit layer – Discrete optimization
15
• How it works?
• Forward: continuous relaxation (e.g., SDP solver for MAXSAT problem)
• Backward: use property of optima
• Application: Solve abstract reasoning problems
• Extract discrete latent code with VQ-VAE and apply SATNet
[1] Wang et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver. ICML 2019.
[2] Yu et al. Abstract Reasoning via Logic-guided Generation. ICML Workshop 2021.
16. Implicit layer – Differential equation
16
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
17. Implicit layer – Differential equation
17
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Handle arbitrary time inputs by continuous modeling (RNN needs discrete time)
• Hidden state over time:
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.
18. Implicit layer – Differential equation
18
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Handle arbitrary time inputs by continuous modeling (RNN needs discrete time)
• Better extrapolation:
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Rubanova et al. Latent ODEs for Irregularly-Sampled Time Series. NeurIPS 2019.
19. Implicit layer – Differential equation
19
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Irregular time series
• Also can be applied for continuous video modeling
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Toth et al. Hamiltonian Generative Networks. ICLR 2020.
20. Implicit layer – Differential equation
20
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
21. Implicit layer – Differential equation
21
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
• It needs a specialized architectures to efficiently compute the Jacobian term det
#,
#$
• Example: planar flow
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
22. Implicit layer – Differential equation
22
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Normalizing flow models explicit density by change of variables
• Neural ODE can compute the Jacobian term efficiently
• Only compute the trace, instead of the determinant
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Grathwohl. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models. ICLR 2019.
23. Implicit layer – Differential equation
23
• How it works?
• Forward: solve an ODE on
#$ &
#&
= 𝑓"(𝑧 𝑡 , 𝑡) for [𝑡', 𝑡(], i.e., 𝑧 𝑡' = ∫
&"
&#
𝑓 𝑧 𝑡 , 𝑡, 𝜃 𝑑𝑡
Backward: solve another ODE on 𝑎 𝑡 =
#)
#$(&)
(adjoint method)
• Application: Density estimation (normalizing flow)
• Furthermore, neural SDE is current state-of-the-art for image generation
• Caveat: It is different from prior continuous flows and more related to diffusion models
[1] Chen et al. Neural Ordinary Differential Equations. NeurIPS 2018.
[2] Song et al. Score-Based Generative Modeling through Stochastic Differential Equations. ICLR 2021.
24. Implicit layer – Fixed-point iteration
24
• How it works?
• Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3)
• Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3
• Technical detail: SGD requires an inverse Jacobian,
which can be approximated by a solution of a linear system
[1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.
25. Implicit layer – Fixed-point iteration
25
• How it works?
• Forward: apply layer 𝑧-.( = 𝑓"(𝑧-; 𝑧/012) until converge (output = 𝑧3)
• Backward: use the property of fixed-point 𝑓" 𝑧3 = 𝑧3
• Application: Infinite-depth network with a single layer
• Does not keep the intermediate activations → memory efficient
[1] Bai et al. Deep Equilibrium Models. NeurIPS 2019.
26. Implicit layer – Planning & control
26
• How it works?
• Forward: choose action via differentiable planning (e.g., value iteration, MCTS, MPC)
• Backward: rollout gradient through planning
• Application: Implicit planning on MDP (better prediction of action)
• Evaluate the action by running simulations (instead of directly using a Q-function)
• Need a transition matrix 𝑠, 𝑎 → 𝑠4, i.e., model-based RL
[1] Tamar et al. Value Iteration Networks. NeurIPS 2016.
[2] Amos et al. Differentiable MPC for End-to-end Planning and Control. NeurIPS 2018.
27. • Deep implicit layers are an interesting combination of algorithm and deep learning
• Lots of attention from ML community
• Value Iteration Network NeurIPS 2016 best paper
• Neural ODE NeurIPS 2018 best paper
• Score-based SDE ICLR 2021 best paper
• SATNet ICML 2019 honorable mention
• …and many orals & spotlights
• Many opportunities to utilize the ideas?
• MetaOptNet Apply OptNet for few-shot learning
• Logic-guied generation (LoGe) Apply SATNet for abstract reasoning
Take-home message
27
Thank you for listening! 😀