Combinatorial Optimisation with Policy Adaptation using latent Space Search, by Shikha Surana

COMbinatorial optimization with Policy
Adaptation using latent Space Search
COMPASS 🧭
Felix Chalumeau*
, Shikha Surana*
, Clément Bonnet, Nathan Grinsztajn, Arnu
Pretorius, Alexandre Laterre, Thomas D. Barrett
November, 2023

InstaDeep is a Leader in AI Innovation
2
180 AI Research & Engineering
ML Engineers, Research Engineers
Research Scientist, Data Scientists
45 Full-time Researchers
Reinforcement Learning, DL, Biology,
Chemistry
45 Protein Engineering & Genomics
Computational Biologists, Chemists, &
Geneticists, Bioinformaticians
© 2023 InstaDeep Ltd. All Rights Reserved.
300+ AI experts, >80% with
advanced degrees in Applied
Mathematics/ML, Computer
Science, Computational Biology
and Chemistry, and related
ﬁelds
10 Oﬃces across US & EMEA
Founded in 2014, HQ in London

Solving complex challenges for top tier
international customers
Access to top Talent. Partner with
leading Universities
Cutting-edge AI Research Joint R&D
work with elite partners
* 5 InstaDeepers out of 171 Google ML Dev Experts globally
Distinctions &
Awards
3 © 2023 InstaDeep Ltd. All Rights Reserved.
Decision-making AI products: delivering AI eﬃciencies for advanced enterprise customers
InstaDeep is a Leader in AI Innovation

Main Track Papers: 3
🏆 Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization
🌸 Combinatorial Optimization with Policy Adaptation using Latent Space Search – patent application submitted
🦏 Nonparametric Boundary Geometry in Physics Informed Deep Learning – collaboration with Oxford University
Workshop Papers: 10
🦧 From Humans to Agents: Reinventing Team Dynamics and Leadership in Multi-Agent RL
✏ CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition
🍝 PASTA: Pre-trained Action-State Transformer Agents
🤖 Generalisable Agents for Neural Network Optimisation – collaboration with Cohere For AI
🍇 LightMHC: A Light Model for pMHC Structure Prediction with Graph Neural Networks – collaboration with BioNTech
🖼 FrameDiPT: SE(3) Diffusion Model for Protein Structure Inpainting – collaboration with BioNTech
📋 BioCLIP: Contrasting Sequence with Structure: Pre-training Graph Representations with Protein Language Models
🧬 Preferential Bayesian Optimisation for Protein Design with Fine-Tuned Protein Language Model Ensembles
󰥤 Oﬄine RL for generative design of protein binders
🤪 Are we going MAD? Benchmarking Multi-Agent Debate between Language Models for Medical Q&A
👐 Graph Neural Networks for End-to-End Information Extraction from Handwritten Documents – WACV 2024
🧫 Progressive loss of conserved spike protein neutralizing antibody sites in Omicron sublineages is balanced by preserved T cell immunity – Cell Report
🚀 12 publications under review at various conferences (e.g.ICLR, AAAI) and journals (e.g. Nature Methods)
Research Driven: 13 Publications Accepted at NeurIPS 2023
© 2023 InstaDeep Ltd. All Rights Reserved.
4

Combinatorial Optimisation (CO) 🧩
🤖 Reinforcement Learning: Successfully applied across a range of CO tasks.
GIF Courtesy: Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX (Bonnet et al., 2023)
❓ What is CO? Find the optimal solution from a ﬁnite set of solutions.
🚩 Challenge with CO: Set of solutions grows exponentially with the problem size.

Reinforcement Learning (RL)
Trial and Error Learning: Inspired by behavioral psychology, involving learning through
trial and error.
Objective: Learn a strategy (policy) that maximizes its total rewards. Agents explore
different actions and adjust their behavior based on outcomes.
Key Components: states (environment context), actions (agent choices), and policies
(decision-making strategies).
AGENT
ENVIRONMENT
state
st
reward
rt
action
at
rt+1
st+1

Motivation 💫
⭐ Goal: optimally solve NP(-hard) CO problems under a budget constraint.
🔍 Prior Works: combine pre-trained policy with search procedures.
🚩 Limitation: struggle to signiﬁcantly enhance results within the search budget
constraint and have poor generalization.
⚡ Solution: learning a space of diverse policies that can be explored at inference
time to ﬁnd the most performant policy for a given problem instance.

Two phases: (A) Training - trains a continuous space of diverse and specialised policies
(B) Inference - search the space to ﬁnd most performant policy
Our Method: COMPASS 🧭
Image Courtesy: Combinatorial Optimization with Policy Adaptation using Latent Space Search (Chalumeau & Surana et al., 2023)

A. Training
Policy Latent Space
1. Sample
Latent Space

A. Training
Policy Latent Space
1. Sample
Latent Space
2. Rollout
Policies

A. Training
Policy Latent Space
1. Sample
Latent Space
3
1 2
3. Evaluate & Rank
Conditions
2. Rollout
Policies

A. Training
Policy Latent Space
4. Train Policy
with Best Condition
1. Sample
Latent Space
3
1 2
3. Evaluate & Rank
Conditions
2. Rollout
Policies

B. Inference
1. Sample
Latent Space

B. Inference 2. Evaluate
Policies
1. Sample
Latent Space

Policies
3. Update
Evolutionary Strategy
1. Sample
Latent Space

Policies
3. Update
Evolutionary Strategy
1. Sample
Latent Space
Repeat For
Evaluation Budget

CO Problems:
● Travelling Salesman
● Capacitated Vehicle Routing
● JobShop Scheduling
Baselines:
● POMO (Kwon et al., 2020)
● EAS (Hottung et al., 2022)
● Poppy (Grinsztajn et al., 2022)
COMPASS surpasses the baselines on all of the categories, showing its
versatility for all types of tasks and in particular, its generalization capacity.
Results 🧭 - Overview

Results 🧭 - Generalization
Setting:
● Mutate test set instances
● Increase mutation power to
study out-of-distribution
performance
Observations:
● COMPASS is robust to ODD
instances
● COMPASS shows better
generalization than all baselines

Results 🧭 - Search Analysis
Setting:
● Evolution of overall and last batch
performance over budget
● Exploration trajectory in latent
space
Observations:
● COMPASS’s proposed solutions
get signiﬁcantly better than
baselines
● Search strategy reaches
high-performance region

● COMPASS framework: trains and explores a latent space of
specialised and diverse policies
● COMPASS reaches overall state-of-the-art on 29 tasks
Take-home 💫
GIF Courtesy: Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX (Bonnet et al., 2023)

Combinatorial Optimisation with Policy Adaptation using latent Space Search, by Shikha Surana

Recommended

Recommended

More Related Content

Similar to Combinatorial Optimisation with Policy Adaptation using latent Space Search, by Shikha Surana

Similar to Combinatorial Optimisation with Policy Adaptation using latent Space Search, by Shikha Surana (20)

More from Paris Women in Machine Learning and Data Science

More from Paris Women in Machine Learning and Data Science (20)

Recently uploaded

Recently uploaded (20)

Combinatorial Optimisation with Policy Adaptation using latent Space Search, by Shikha Surana