APM Welcome, APM North West Network Conference, Synergies Across Sectors
Optimal Vacation Itinerary Modeling
1. MATH 381: OPTIMAL VACATION ROUTING
GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
Abstract. This project aims to find optimal vacation options for a trav-
eler. A complete graph is constructed with vertexes representing cities and
edges representing the distance between two adjacent cities. Two problems
are solved: the classic traveling salesperson problem, as well as a maximal
traveling salesperson problem in which the route is adjusted to maximize total
cost. In addition, related problems that included ratings for individual cities
were posed and solved. The results are finally discussed in terms of cost and
travel distance.
Contents
1. Introduction 2
2. Background 2
3. Model 3
3.1. Traveling Salesperson Problem 3
3.2. Best Trip 4
3.3. Implementation Challenges 5
4. Solution 6
4.1. Sage Solution 6
4.2. Matlab MTZ Solution 6
4.3. Visualization 6
5. Discussion 7
6. Model Variations 8
6.1. TSP Maximum Mileage 8
6.2. TSP Happiness Factors/Mileage Requirements 10
7. Conclusions 12
References 12
A. Problem Data 12
B. Visualization 14
1
2. 2 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
1. Introduction
We started with the concept of world travel with the goal of visiting 20 pre-
determined cities. More specifically, we were interested in finding optimal routes
that would take the traveler through each city exactly once. In order to get con-
crete routing results, we first constructed a fully connected graph whose vertexes
were the cities and whose edges represented the distance[5] between adjacent cities.
We later updated the data to include factors for each city such as GDP[2], cost
of living[4], number of museums[6], and number of malls[1]. These problems are
closely related to the well-known traveling salesperson problem, which is solvable
with linear programming.
In addition to using linear programming to find optimal solutions, we also made
use of multidimensional scaling in order to visualize our results.
2. Background
In our first meeting, we brainstormed a list of general topics that we were inter-
ested in so that each member felt connected to the project. John suggested that we
focus on something related to graphs or graph theory, specifically the TSP. Gerard
was interested in a problem similar to the Netflix algorithm, where a library of user
ratings generate a network of suggested movies. Simeng mentioned that he was
interested in currency conversion, which led to the idea of creating an international
problem.
We were eventually drawn to TSP-related problems because of the rich and far-
reaching theory behind them. TSP problems were first defined in the 1800s by
W.R. Hamilton, although serious study and progress on the problem did not take
off until the 1930s. Since then, traveling salesman type problems have become
ubiquitous and have driven advances in both the study of graph theory and, after
1970, algorithms for solving linear programs. Using a TSP-type problem, we were
all enabled to explore our individual interests.
We chose 20 cities that we felt were globally recognized as travel destinations
and had a fairly diverse geographical distribution, (i.e. Seattle, Miami, New York,
Stockholm, Reykjavik, Moscow, Saint Petersburg, New Delhi, Shanghai, Chengdu,
Tokyo, Rio de Jeneiro, Cairo, Johannesburg, London, Sydney, Paris, Oslo, Hon-
olulu, and Bangkok.) As for the costs of the edges between the travel vertices, we
originally wanted to consider airfare and hotel pricing in each city. However, the
variance in realtime airfare and hotel pricing was hard to overlook, so we settled on
geographical distance as a more consistent, yet relevant, measure of cost between
two cities. An argument can be made that there is a correlation between geograph-
ical distance and airfare pricing when considering the additional fuel required for
longer versus shorter flights.
Next, we generated a list of possible happiness factors that we could define for
each city. The list included: number of restaurants in the city, carbon footprint,
whether or not the city has an amusement park, number of museums or libraries,
number of art galleries, nightlife (number of clubs/bars), number of sporting are-
nas/teams, average cost of living, GDP number of shopping malls, etc. We then
scoured the internet for quantititive attraction data for each of the above mentioned
cities. In the end, we were able to find a cost of living index[4], restauarant index[4],
purchasing power index[4], GDP[2], and number of shopping malls for each city.
3. MATH 381: OPTIMAL VACATION ROUTING 3
3. Model
Consider a graph G which is made up of a set of vertices V and a set of edges E.
The entry vi represents the label for vertex i, while an element (i, j) of E indicates
that vertex i is adjacent (connected) to vertex j. More specifically, let G be a
weighted graph such that ci,j = cj,i is the weight of the edge connecting vertex i
to vertex j if (i, j) ∈ E, and zero otherwise. Then we can construct an adjacency
matrix:
(1) A = (ci,j).
For our problem that is centered around travel between n = 20 cities, ci,j is the
“cost” of travel between city i and city j – we chose this cost to be represented
by the physical distance between them. Refer to Table 6 in Appendix A for the
correspondence between city name and vertex.
Figure 1. Complete Graph with 20 Nodes
The graph G, and its corresponding adjacency matrix A, can be used to deter-
mine several different types of optimal routing schemes.
3.1. Traveling Salesperson Problem. Suppose we wish to find the route of least
cost which passes through each city exactly once and ends up at the starting city.
This problem is known as the Traveling Salesperson Problem, or TSP. The TSP
can be solved by the use of integer linear programming through the Miller-Tucker-
Zemlin (MTZ) formulation[3]. The MTZ formulation relies on :
4. 4 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
3.1.1. MTZ Formulation. Suppose we have n cities that we wish to visit on our
itinerary. We number the cities 1, . . . , n and define
(2) xi,j =
1 if there exists an edge between city i and city j
0 otherwise
Define ci,j to be the ”cost,” or geographical distance, of traveling from city i to
city j. Introduce the dummy variables ui, where i = 1, . . . , n, which represents
the itinerary number for each of the locations, e.g. u1 represents when ’Seattle’ is
visited, which is always first (i.e. u1 = 1), and u2 represents when Miami is visited,
which depends on the solution of the TSP. Therefore, the LP can be written as
Minimize:
i,j
ci,jxi,j
Subject to:
i
xi,j = 1 for j = 1, 2, . . . , n (1)
j
xi,j = 1 for i = 1, 2, . . . , n (2)
u1 = 1, 2 ≤ ui ≤ n for i = 2, . . . , n (3)
ui − uj + 1 ≤ (n − 1)(1 − xi,j) for i, j ≥ 2 (4)
0 ≤ xi,j ≤ 1 for all i, j (5)
xi,j ∈ Z (6)
(1) ensures that there is exactly one trip into and out of each city
(2) compliments (1)
(3) introduces a dummy variable for ensuring that the path found is Hamilton-
ian
(4) ensures that the solution path is made of one loop
(5) limits xi,j to the domain [0, 1] and
(6) together with (5) ensures that xi,j is binary
3.2. Best Trip. As mentioned previously, our ultimate goal for creating the MTZ
formulation of the TSP was to utilize it as a springboard to more complex models.
We then wanted to modify the TSP problem to find the best possible trip in terms
of city sequence. Specifically, we agreed that the trip would be better if the good
cities were visited in sequence, when possible, with the added constraint that the
total trip must cover a certain minimum distance. In order to do this, we gathered
quantitative data on several “factors of happiness” for each city on our map, and
used that data to formulate a rating for each city.
3.2.1. City Rating Scheme. Consider the matrix F which contains the raw rating
data for the n cities:
(3) F = f1|f2| . . . |fm .
5. MATH 381: OPTIMAL VACATION ROUTING 5
Here, column m corresponds to the m-th city rating factor, and each column has
its own units and distribution. In order to standardize the factor matrix F, we
subtract from each column its own minimum value and then divide that difference
by the difference between the maximum and minimum values in that column. That
is, if F∗
represents the standardized factor matrix, then
(4) f∗
m =
fm − min(fm)
max(fm) − min(fm)
Next, consider a vector w = [w1, w2, . . . , wm] which contains the relative
weights for each of the rating factors. The weighted city rating vector can then
be written as
(5) r = F∗
w = w1f∗
1 + w2f∗
2 + · · · + wmf∗
m
The total city rating for city i is then ri.
3.2.2. New LP Constraints. With this additional happiness factor in mind, a new
linear program can be written to determine the best route for a given budget.
Consider a scaled cost variable c∗
i,j = k
ci,j −min(A)
max(A)−min(A) . Then, we can also add a
constraint that the total cost of the trip be above a certain level M. Keeping all
original constraints, new LP can be written as
Maximize:
i,j
[(k − c∗
i,j)(ri + rj)xi,j]
Subject to:
i,j
ci,jxi,j ≥ M (1)
i
xi,j = 1 for j = 1, 2, . . . , n (2)
j
xi,j = 1 for i = 1, 2, . . . , n (3)
u1 = 1, 2 ≤ ui ≤ n for i = 2, . . . , n (4)
ui − uj + 1 ≤ (n − 1)(1 − xi,j) for i, j ≥ 2 (5)
0 ≤ xi,j ≤ 1 for all i, j (6)
xi,j ∈ Z (7)
This formulation is very similar to the traveling salesperson problem. The main
changes are that the objective function is now set to maximize a weighted sum of
scaled ratings of the cities that our traveler passes through. The “cost” is now
(k − c∗
i,j)(ri + rj), which is large when ri and rj and both large and c∗
i,j is small.
This causes our LP to look for routes which group the best cities together – that is,
it looks for Hamiltonian paths which minimize the distance between the best cities.
3.3. Implementation Challenges. One challenge that we encountered while im-
plementing the TSP in Sage was the lack of versatility within the program which
prevented us from adding additional constraints to the standard TSP. Therefore,
we decided to implement our problem by utilizing the Matlab function mxlpsolve
to generate an LPsolve script to solve the problem.
6. 6 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
First, we attempted to solve the TSP for the 20 aforementioned cities but the in-
efficiencies of the MTZ formulation led to extremely long calculation times. There-
fore, we decided to reduce the number of vacation destination to the 13 locations
with the most city data: Seattle, Miami, New York, Shanghai, Chengdu, Tokyo,
Rio de Janeiro, Cairo, Johannesburg, London, Sydney, Paris, and Bangkok. This
allowed us to perform the optimization calculations in a much more reasonable
period of time, while preserving the majority of the model’s complexity.
4. Solution
A folder containing all coded inputs and outputs is available at https://drive.
google.com/drive/folders/0B0qQ-1OAi1JkRXZzMHJFMENPU0E?usp=sharing.
4.1. Sage Solution. Using Sage, we obtained the optimal solution to the Traveling
Salesman Problem. The code for this problem was very straightforward, and only
took a few lines to implement. We utlized the built in Sage commands for creating
Graphs and solving for their respective TSP solutions.
4.2. Matlab MTZ Solution. For the Matlab implementation of the MTZ, we
began by importing the distance matrix from a locally stored Excel document. We
then used the length() command to calculate the number of cities in our data set,
n. We created a binary n × n matrix X, where the xij entry represents whether
or not there is an edge between city i and city j. The distance data was used to
create an n × n cost matrix C. A vector of dummy variables u was used to store
the itinerary number of each city. The mxlpsolve package for Matlab allows the
user to choose from a library of commands that ultimately write the LP to an .lp
file, which can then be opened, run, and solved in lpsolve1
.
We began by setting the variable names xij for i, j = 1, . . . , n and ui for i =
1, . . . , n. We then defined the objective function as a row vector f where the entries
fi are the corresponding coefficient for each variable in the objective function.
For constraint (1), we utilize the ’set binary’ command, which ensures that the
variables xij ∈ {0, 1} for i, j = 1, . . . , n. For constraint (2), we used the command
add constraint to restrict u1 = 1 and ui ∈ {2, . . . , n}, i = 2, . . . , n which ensures
that each city falls into an available iternary spot, and that Seattle, location 1,
is the starting and ending point. Constraints (3), (4), and (5) were added in
a similar manner, utlizing the add constraint command and following the logic
outlined above. Finally, we called the write lp command to write a .lp file, which
containted all of the aforementioned logic, to be opened and executed in lpsolve to
produce a solution.
The solution to the TSP, results in a total mileage cost of 50,109 miles.
4.3. Visualization. Using multidimensional scaling, a projection onto two dimen-
sional euclidean space can be made to visualize the distances between the cities.
We used the MATLAB function mdscale to get this projection from the inter-city
distances found in Appendix A. The optimized routes were then drawn onto this
projection. These figures can be found in in Appendix B.
1http://web.mit.edu/lpsolve/doc/MATLAB.htm
7. MATH 381: OPTIMAL VACATION ROUTING 7
Table 1. MTZ solution of TSP for 13 cities
nonzero xi j variables Departure, city i Arrival, city j
x1 12 Seattle Paris
x12 10 Paris London
x10 7 London Rio de Janeiro
x7 8 Rio de Janeiro Cairo
x8 9 Cairo Johannesburg
x9 11 Johannesburg Sydney
x11 6 Sydney Tokyo
x6 5 Tokyo ChengDu
x5 4 ChengDu ShangHai
x4 13 ShangHai Bangkok
x13 3 Bangkok New York
x3 2 New York Miami
x2 1 Miami Seattle
5. Discussion
From our result in last section about the least total mileage possible to travel
through all our chosen cities at least once, we can go one step ahead and gain
interesting insight on the possible cost it would at least take for one to take such
trip in real life.
Below is a graph on economy class airfares displayed by Rome2rio users over
the past 4 months, totaling some 1,780,832 price points. Airfares are grouped by
distance and selected the 20-th percentile fare for each distance (where 20% of fares
are less and 80% are more), to produce the following graph:
We can see that it shows an almost linear relationship between the airfare and
the flight distance. In a simplified manner, we can estimate the relationship with a
linear model:
Fare = $50 + (Distance ∗ $0.11/mile)
8. 8 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
Where Fare is the cost in $USD of flying Distance miles. On average, a fare
costs $50 before any flight distance is taken into account, plus an average of 11
cents per mile traveled.
With this linear airfare model, the average amount a person will spend on a TSP
itinerary through all 13 cities will be:
avg. TSP fare = $50 + (50109 ∗ $0.11) = $5561.99
It’s the cheapest average because our cost model here is based entirely on mileage
traveled, and our TSP model provides a path with minimal miles necessary.
We can also take a look at the cost per mile data for each airline company, and
calculate the actual least/most amount necessary for traveling all of our chosen
cities. The data is can be found on the same website nicely organized in a table
shown below. (Only parts were selected due to the large size of the picture)
Airlines with the least cost per miles:
From this we can know that the minimal cost is:
min. TSP fare = $50 + (50109 ∗ $0.03) = $1553.27
assuming that all flights can be flown on SEAir, at a rate equivalent to their average
cost per mile.
6. Model Variations
6.1. TSP Maximum Mileage. Up until this point we have been assuming that
the vacationer intends on spending the least amount of miles on an aircraft trav-
eling between destination locations. However, we do not think that it would be
unreasonable to assume in some situations that it may be beneficial to maximize
the distance that the vacationer could travel, i.e. if someone is looking to rack up
’frequent flier miles’ or he/she somehow gets paid for every mile traveled. At the
very least, we felt that it would be interesting to solve for the upper bound on
distance traveled in one trip.
In this particular situation, the adjustment to the .lp file is a simple two character
change from min to max. The results are much more drastic, in fact the MTZ
formulation of the maximum is not restricted to 13 cities, as with the minimum,
9. MATH 381: OPTIMAL VACATION ROUTING 9
and can in fact be solved for the original set of 20 cities that we started with. The
results of both are presented in the tables below.
Table 2. Most Costly MTZ solution of TSP for 20 cities
nonzero xi j variables Departure, city i Arrival, city j
x1 14 Seattle Johannesburg
x14 7 Johannesburg St. Petersburg
x7 2 St. Petersburg Miami
x2 6 Miami Moscow
x6 3 Moscow New York
x3 8 New York New Dehli
x8 5 New Dehli Reykjavik
x5 20 Reykjavik Bangkok
x20 18 Bangkok Oslo
x18 16 Oslo Sydney
x16 17 Sydney Paris
x17 9 Paris ShangHai
x9 12 ShangHai Rio de Janeiro
x12 11 Rio de Janeiro Tokyo
x11 15 Tokyo London
x15 10 London ChengDu
x10 4 ChengDu Stockholm
x4 19 Stockholm Honolulu
x19 13 Honolulu Cairo
x13 1 Cairo Seattle
Table 3. Most Costly MTZ solution of TSP for 13 cities
nonzero xi j variables Departure, city i Arrival, city j
x1 9 Seattle Johannesburg
x9 13 Johannesburg Bangkok
x13 12 Bangkok Paris
x12 11 Paris Sydney
x11 3 Sydney New York
x3 4 New York ShangHai
x4 7 ShangHai Rio de Janeiro
x7 6 Rio de Janeiro Tokyo
x6 10 Tokyo London
x10 5 London ChengDu
x5 2 ChengDu Miami
x2 8 Miami Cairo
x8 1 Cairo Seattle
10. 10 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
The results in table 2 and table 3 immediately makes sense because they include
as many far apart pairs of cities in the trip. Almost all flights cross continent.
Many flights cross equator and go to the other half of the globe (i.e., from Oslo to
Sydney). It’s possibly not the usual way people would travel, as it often goes back
and forth between continents, ignoring close-by destinations. However, for purpose
of maximizing the total mileage, this is a solid solution. With 20 destinations, the
maximum mileage possible is 142,601 and with 13 destinations, it is 105,361. This
provides an upper bound to TSP solutions for both cases. Now let’s reconsider the
cost per mile model we discussed in section 5. Here we can calculate the interesting
value of maximum cost a man can spend travelling all chosen cities.
avg. TSP fare20 = $50 + (142, 601 ∗ $0.11) = $15, 736.11
avg. TSP fare13 = $50 + (105, 361 ∗ $0.11) = $11, 639.71
Therefore, the resulting average cost for a maximum mileage TSP solution of the
20 and 13 destinations cases, is $15,736.11 and $11,639.71 respectively. This is
roughly three times the cost of traveling through the minimal cost route.
Taking it one step further, we can incorporate the airlines with the most cost
per miles, as shown below:
From this we can calculate the maximum cost of a 20 destination trip would be,
max TSP fare20 = 142, 601 ∗ $0.278 = $39, 643.08
and the cost of a 13 destination trip would be,
max TSP fare13 = 105, 361 ∗ $0.278 = $29, 290.36
assuming that all flights could be booked through Darwin Airlines at the average
fare of $0.278 per mile.
6.2. TSP Happiness Factors/Mileage Requirements. Another variation to
the TSP that we decided to investigate was changing the objective function of the
TSP to include our happiness factors, discussed in section 3.2, and creating a new
constraint that involves a minimum mileage requirement. In essence, we are still
solving the TSP for the least ”costly” solution, but we are altering the definition
11. MATH 381: OPTIMAL VACATION ROUTING 11
cost. The happiness factors now influence the cost, and we are instead maximizing
happiness while meeting a specific mileage goal. The happiness rating for each
flight is inversely proportional to the length of the flight so the model still has an
inherent ability to solve for the shortest path. This is where the mileage requirement
comes in. For example, if there is no mileage requirement then we are solving for
the TSP of the new, happiness factor based objective function, i.e. the trip with
the maximum happiness as defined in section 3.2. The corresponding solution is
presented below:
Table 4. Happiness factor TSP for 13 cities, no trip min.
nonzero xi j variables Departure, city i Arrival, city j
x1 2 Seattle Miami
x2 12 Miami Paris
x12 10 Paris London
x10 7 London Rio de Janeiro
x7 8 Rio de Janeiro Cairo
x8 9 Cairo Johannesburg
x9 11 Johannesburg Sydney
x11 6 Sydney Tokyo
x6 5 Tokyo ChengDu
x5 4 ChengDu ShangHai
x4 13 ShangHai Bangkok
x13 3 Bangkok New York
x3 1 New York Seattle
By summing flight distance of each of the above-mentioned flights, we can solve
for the minimum travel distance, 52,414 miles, while maximizing happiness. How-
ever, if we increase the mileage threshold above that of the solution of the TSP,
then we start to see new solutions emerge. For example, if we set the minimum
mileage threshold to 55,000 miles, then the optimal solution changes.
Table 5. Happiness TSP for 13 cities, 55,000 mi. trip min.
nonzero xi j variables Departure, city i Arrival, city j
x1 2 Seattle Miami
x2 9 Miami Johannesburg
x9 7 Johannesburg Rio de Janeiro
x7 5 Rio de Janeiro ChengDu
x5 8 ChengDu Cairo
x8 4 Cairo ShangHai
x4 6 ShangHai Tokyo
x6 11 Tokyo Sydney
x11 13 Sydney Bangkok
x13 12 Bangkok Paris
x12 10 Paris London
x10 3 London New York
x3 1 New York Seattle
12. 12 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
7. Conclusions
In this project, we first solved the basic TSP model minimizng total mileage.
Then we also solved a few variations of the original problem (maximizing mileage,
maximizing happiness factor... etc.) For all problems, our model with a graph
approach yields results that not only satisfies all constraints but also looks realistic.
In other words, the travel plans our model provides are close to what a reasonable
plan maker might actually use in real life. On this account, I think our approach
in using a graph model on the Traveling Salesperson Problem and its variations is
a sensible one.
The results of the TSP problem and its variations provides interesting insights
into the real-life travel plan making. For each different purpose, our model provides
the following advices/facts:
• The least amount of flight fare, on average, to travel through all 13 cities
at least once is $ 5561.99.
• On average, the most that one will spend to visit each of the 13 destination
once will be $ 11,639.71.
• However if we take into account the minimum distance at the cheapest
airfare, and the maximum distance at the most expensive prices, we can
estimate a range of prices that all possible trips through these locations
should fall within, $ 1,553.27 to $ 29,290.36. This range may seem quite
wide for the typical traveller, but from a mathematician’s perspective these
numbers intuitively seems to be on the correct order of magnititude for
standard international travel.
References
[1] Google maps maps.google.com.
[2] A. Berube, J. L. Trujillo, T. Ran, and J. Parilla. Report: Global metro monitor. Brookings
Institute.
[3] C. E. Miller, A. W. Tucker, and R. A. Zemlin. Integer programming formulation of traveling
salesman problems. J. ACM, 7(4):326–329, October 1960.
[4] Numbeo. Cost of living: https://www.numbeo.com/cost-of-living/.
[5] Geobytes Distance Tool: http://geobytes.com/citydistancetool/.
[6] Wikipedia. Museums by country and city: https://en.wikipedia.org/wiki/Category:
Museums_by_country_and_city.
A. Problem Data
14. 14 GERARD TRIMBERGER, YIZHOU YAO, SIMENG ZHU, AND JOHN CHAFFEE
B. Visualization
Table 6. City Indices
City Index
Seattle 1
Miami 2
New York 3
Stockholm 4
Reykjavik 5
Moscow 6
St. Petersburg 7
New Delhi 8
ShangHai 9
ChengDu 10
Tokyo 11
Rio de Janeiro 12
Cairo 13
Johannesburg 14
London 15
Sydney 16
Paris 17
Oslo 18
Honolulu 19
Bangkok 20