Inter IIT Tech Meet 2k19, IIT Jodhpur

TEAM IIT JODHPUR
BOSCH's Route Optimization Algorithm
INTERIIT-TECHMEET2K19

UNDERSTANING OF
PROBLEM STATEMENT
2

3
Develop a algorithm to serve the need for
finding route for Buses to Bosch Office on a
permanent bases.

4
There are some buses which reach at Bosch
office and we need to find the route of these
buses.

5
We need to find the route such that total cost
is minimized and require the minimum number
of buses.

6
We further need to assign each person a bus
to commute to office.

7
We already know location of each customer.

Solution approach-
(Exact Solution) 18

ASSUMPTIONS
◉ There exists a path between any two parts of city.
◉ Data for each point is presnet in System.
◉ Maximimum number of buses available are bounded by N.
◉ There is are N imagirary passangers at DEPOT.
10

11
is a Boolean Variable it is 1 if and only if bus b goes to point j from point
i in rth step else it is 0.
denotes the member of distance matrix i,j

14
WHAT IS NUMBER OF PEOPLE IN EACH BUS?

15
denotes the number of passengers in bus b.

16
FINALLY EACH BUS MUST REACH DEPOT

18
EACH PERSON CAN TAKE ONLY 1 BUS

20
EACH UNUSED BUS DIRECTLY REACHES THE
DEPOT

22
OBVIOUSLY NUMBER OF PASSANGERS IN EACH
VECHICLE MUST BE LESS THAN ITS CAPACITY

26
Time interval on Running of Busses

29
min(α×Total Distance +
β×(N−Number of Busses Not
Used))

Evolving Solutions !!
A heuristic
30

The Fittest Survives!!!!
32
The fitness function determines how fit an individual is (the ability
of an individual to compete with other individuals). It gives a fitness
score to each individual. The probability that an individual will be
selected for reproduction is based on its fitness score.

Reinforcement Learning
-learn from your mistakes
35

Overview of Reinforcement Learning
◉ Based on Markov Decision Process mathematical framework.
◉ Consist of agent which takes certain actions in the
environment according to a policy and gets reward or
punishment for a certain action.
◉ The policy function is optimized by training over and over
again through the process of policy gradient descent.
36

Heuristic Solution – Reinforcement Learning
◉ NEURAL COMBINATORIAL OPTIMIZATION WITH REINFORCEMENT
LEARNING –
Irwan Bello , Hieu Pham , Quoc V. Le, Mohammad Norouzi, Samy Bengio
37

Overview of Neural Combinatorial Optimization with
◉ This paper presents a framework to tackle combinatorial
optimization problems using neural network and
reinforcement learning.
◉ Solves a TSP that given a set of city coordinates predicts a
distribution over different city permutations.
◉ Using negative tour length as the reward function and
optimize the network using policy gradient method.
38

Overview of Neural Combinatorial Optimization with
◉ Supervised learning is not applicable to combinatorial
optimization problem because one does not have access to
optimal labels.
◉ RL is based on exploration and learning in an environment.
39

Neural Network Architecture for TSP
◉ Given an input graph represented as a sequence of n cities in
a two dimensional space s={xi}n. Total length is given by
below equation which is also the loss function of the neural
network.
40

Network Architecture-Pointer Network
◉ Using pointer network allows model to effectively point to a
specific position in the input sequence .
◉ Pointer network consist of an encoder network which
transforms input into a predicted sequence.
41

Network Optimization using Policy gradient
◉ Training a neural network involves using a supervised loss
function which involves the difference between predicted
path and the path found by policy function.
◉ Thus the model gets trained from the policy function of RL as
the training set and learns to find a TSP given a set of input.
43

Reinforcement Learning + Neural Network for VRP Problem
◉ Modifying the Bello et. al. paper for solving VRP Problem.
◉ Using the pointer Network model proposed by Bello to use it
in the VRP model.
44

Our solution for solving Vehicle Routing Problem
◉ We represent each state by a sequence of tuple {xi(t)=(si,di(t)},
where s(i) are the distance coordinates and di(t) is the
demand at point i at time t. set of all inputs is denoted by X(t) .
◉ We start from an arbitrary input X0 and at every step, our
pointer model predicts an output y0 of the next customer to
be picked.
45

Our solution for solving Vehicle Routing Problem
◉ We are interested in finding a stochastic policy pi which generates a
sequence Y in such a way which minimizes a loss while satisfying the
problem constraint.
◉ After training the bus over an environment, We find the policy
function pi and then use that policy function to train out network
model for predicting the optimal path.
◉ Using Reinforcement Learning in VRP involves multiple agents acting
on an environment optimizing the total distance covered.
46

Results after training over DGX2 for 10 million cycles
47

Algorithm Comparison
48
Time on
vpr10
Time on
vpr50
Time on
vpr100
Training
Time
Accuracy
Integer
Programm
ing
5 minutes 54 hours Approx.
300 days
Nil 100%
Genetic
Programm
ing
1 min 1.5
minutes
2 minutes Nil 20%
Reinforce
ment
Learning
0.1 min 0.2 min 0.5 min 6 hours 61%

Conclusion
◉ We finally propose 3 solutions for solving the required problem.
◉ The algorithm can be chosen depending on the given dataset, test time,
training time, hardware etc.
◉ Integer Programming solution is NP-Hard hence takes exponential time.
◉ Hence , for real life application it is important to develop a heuristic
solution .
◉ Genetic Algorithms looks for best possible solution through the principle
of evolution.
◉ Reinforcement learning helps create a policy function which helps in
training a neural network for predicting outcome.
49

Inter IIT Tech Meet 2k19, IIT Jodhpur

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Inter IIT Tech Meet 2k19, IIT Jodhpur

Similar to Inter IIT Tech Meet 2k19, IIT Jodhpur (20)

More from niveditJain

More from niveditJain (17)

Recently uploaded

Recently uploaded (20)

Inter IIT Tech Meet 2k19, IIT Jodhpur