Paper Study: A learning based iterative method for solving vehicle routing

A Learning-based Iterative
Method for Solving Vehicle
Routing Problem
Hao Lu, Xingwen Zhang and Shuang Yang
Princeton University and Ant Financial Services Group
ICLR 2020

Abstract
• Present “Learn to Improve (L2I)” to solve capacitated vehicle routing
problem (CVRP).
• Start with initial solution, refine the solution iteratively.
• Outperform the classical operations research (OR) approach (e.g.
LKH3).

Introduction
• In recent years, after the Pointer Network, researchers start to
develop new deep learning and reinforcement learning framework to
solve combinatorial optimization problems.
• In terms of vehicle routing problem, the prior results can not beat the
OR algorithm LKH3.
• Propose a learning-based algorithm for solving CVRP and outperform
classical solvers.

Introduction (cont’d)
• Propose hierarchical framework.
• Separate heuristic operators into two classes, improvement operators and
perturbation operators.
• Choose the class first and then choose operators within the class.
• Propose an ensemble method training several RL policies at the same
time.

Capacitated Vehicle Routing Problem (CVRP)
• There is a depot and a set of 𝑁 customers in the CVRP. Each customer
𝑖 has a demand 𝑑𝑖 to be satisfied.
• A vehicle which starts at and ends at the depot, can serve a set of
customers and the total customer demand does not exceed the
capacity of the vehicle 𝐶.
• Find a set of routes with minimal cost to fulfill the demands of a set of
customers without violating vehicle capacity constraints.

Local search and 2-opt
• Start with feasible solution and look for an improved solution .
• Two TSP tours are called 2-adjacent if one can be obtained from the
other by deleting two edges and adding two edges.
• A TSP tour T is called 2-optimal if there is no 2-adjacent tour to T with
lower cost than T.
• 2-opt heuristic: Continuously replace the 2-adjacent tour whose cost
is lower than current tour until there is a 2-optimal tour.
Source: MIT 15.053/8 The Traveling Salesman Problem and Heuristics

Source: MIT 15.053/8 The Traveling Salesman Problem and Heuristics

• Improvement operator try to improve the solution.
• Call maximum consecutive sequence of improvement operators applied
before perturbation an improvement iteration.
• Perturbation operator destroy and reconstruct to generate a new
starting solution.
• If no cost reduction has been made for 𝐿 improvement steps, perturb
the solution.
• After 𝑇 steps, the algorithm stops and choose the minimum cost
solution.

States for each node
+1 if action led to reduction, -1 otherwise
problem
solution

Reward and Policy network
• Reward
• Intermediate impact
• 1 if the operator improve the current solution, -1 otherwise
• Advantage-based
• Take the distance for the problem during the first improvement iteration as a baseline.
• For the subsequent iteration, receive reward equal to difference between current distance
and the baseline.
• Policy network
• REINFORCE algorithm
𝛻𝜃 𝐽 𝜃 𝑠 = 𝔼 𝜋~𝑝 𝜃 . 𝑠 [ 𝐿 𝜋 𝑠 − 𝑏 𝑠 𝛻𝜃log p 𝜃(𝜋|𝑠)]

• Attention network:
• Transformer
• 8 heads
• 64 output unit
• Ensemble method: train 6 different policies.

Experiments and Analyses
• Three sub-problems with number of customers 𝑁 = 20,50, 100
• Location of each customer and the depot in 0,1 2.
• Demand of each customer in 1, 2, … , 9 .
• The capacity of a vehicle is 20, 30, 40 for 𝑁 = 20,50,100, respectively.
• ADAM optimizer
• 𝑇 = 40000, perturb solution after 𝐿 = 6 consecutive non-
improvements
• 2000 random samples

heuristic
solver
SOTA
With improvement operator but lack of perturbation operation

Apply on TSP
Use the first node as depot, zero demand in each customer

Conclusion
• Propose “Learn to Improve” for solving CVRP and ensemble method
training several RL policies and choose the best solution produced by
the policies.
• Combine the strength of OR with learning capabilities of RL.
• Achieve new state-of-the-art result for CVRP instances.

Paper Study: A learning based iterative method for solving vehicle routing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Paper Study: A learning based iterative method for solving vehicle routing

Similar to Paper Study: A learning based iterative method for solving vehicle routing (20)

More from ChenYiHuang5

More from ChenYiHuang5 (8)

Recently uploaded

Recently uploaded (20)

Paper Study: A learning based iterative method for solving vehicle routing