T drive enhancing driving directions with taxi drivers’ intelligence
Flow Capacity of the London Underground
1. Flow Capacity of the London
Underground: Mind the Gap
By
John Joseph Dougherty X
Abstract:
The London Underground, often referred to as ‘the Tube’, is one of the primary transportation
systems for the Greater London Area. As one might expect, the demand for ridership jumps
during the morning and evening rush hours which, in turn, congests the system. The goal of this
paper is to construct an appropriate model for the London Underground to better analyze the
system’s transportation capacity. Specifically, we will construct a graph of the rail system where
we will then assign a unique carrying capacity to the individual edges of the network. Once our
directed graph is set up, we will then use a maximum flow algorithm to calculate the maximal
number of people the system can move from the residential to the business districts in a given
hour. With this calculation we will then compare it to actual values and determine if the system
is operating efficiently.
2. Introduction:
The London Underground is a primary transportation artery of the Greater London area.
Locally it is known as the Tube and is an integral part of daily life, particularly in the city proper.
The system itself is the oldest underground rail system in the world, having opened its first line,
the Metropolitan Railway, in 1863 [3]. It is also the 12th
largest transit system in the world, with
the rail network covering over 402km and serving 270 stations. This extensive rail system,
which lies both above and below ground, is responsible for about 4 million passenger journeys
a day and accounts for about 17% of the total public transportation ridership [1].
As far as transportation demand goes, the hourly demand for ridership peaks at 8 am,
with about 400 thousand passenger journeys starting in this hour, and a total of about 1 million
total journeys starting between 7 and 10 am as of 2007. There is a similar rush hour period
from 4 to 7 pm in the evening, but this demand is more spread out [2].
The system itself is composed of 11 individual lines, all of which have their own unique
trains and carrying capacities. The lines are: Bakerloo, Central, Circle, District, Hammersmith &
City, Jubilee, Metropolitan, Northern, Piccadilly, Victoria, and Waterloo & City. Now, though
each line tends to follows a distinct route, at times some lines ride in parallel with others. For
example, from the Baker Street Station to the Liverpool Street Station, the Bakerloo, Circle, and
Hammersmith & City lines travel along the same rail. That is, these three lines make the same
stops and even share the station platform at times. This is something we will have to keep in
mind when assigning the carrying capacities to the edges within our graph model.
So as we can see, this is a fairly extensive network which conjures up many interesting
questions. In particular, we want to know what the maximum capacity of people the London
Underground can move from the Residential to Business Districts in an hours’ time. With a
solution to this problem, we can then ask ourselves if the Tube is running efficiently. That is;
does the London Underground meet the transportation demands of the city? Is it supplying less
or more capacity than necessary? In the case of an emergency, how long will it take to
empty/fill the city proper? Does London need to spend more or less money on meeting the
demand? These are the kind of questions we can begin to answer upon understand the
3. maximum flow through the network. Furthermore, enhancing the efficiency of such systems
leads to more profit for the City as well as an increase to quality of life for the consumer, in this
case the traveler. Thus this question is not only interesting to millions of commuters, it also
poses interest for the city.
Background:
There has been ample work done to study traffic systems in general. As mentioned
earlier, it is a problem that both the city and consumer cares greatly about. In fact, systems like
the London Underground are a common topic of study for companies interested in
transportation networks. Such systems of interest include: rail, road, foot, and even flight
networks. Moreover, while these systems may seem distinct, there are fundamental similarities
between them. Essentially, understanding and modeling these transportation networks falls
into the category of logistics and hence extensive work has been done in studying them.
It should be noted that there are two ‘views’ commonly used to analyze traffic systems.
On the one hand we have the view of the consumer, and on the other we have the perspective
of the supplier. In the first, the view of the consumer, average wait times and trip durations are
of paramount importance to ensure rider satisfaction; however, in the case of the supplier
which is a more macro view of the system, overall utility and net performance take the upper
hand. Now although these perspectives are intimately connected and in an ideal model of the
system all such factors would be taken into account, the difficulty in doing so reduces the
practicality of such an inclusion.
For our model we will take the latter view, which is that of the macro performance. In
this case, general variables of interest include: approach, dwell, depart, deadhead, transit, and
wait. Of these variables, some are rather self-explanatory like approach, dwell, depart and
transit which correspond respectively to the vehicle’s breaking, loading, accelerating, and
traveling times. Yet other variables like wait and deadhead are less obvious. While it may seem
clear, wait is more subtle than initially expected. This subtlety is a result of the range of aspects
4. that are considered in determining the wait variable. For instance, if a vehicle arrives early to a
destination but is on an explicit schedule, it then has to wait before departing to its next
location of interest. Another example of wait time is when a vehicle cannot progress due to
some ‘traffic jam’ further down the line. So while it is a clear variable in what it measures, the
subtlety of it makes it difficult to quantify. Finally, the other non-initiative variable is deadhead.
This factor corresponds to the ‘dead’ travel that may occur for logistic reasons. That is,
deadhead occurs when vehicles have to be moved to a new location due to necessity, yet the
vehicle is not transporting any cargo, or in the case of the Tube, passengers.
Now while all six variables are major contenders used to model the overall performance
of transportation networks, our model will not include all of these factors. Due to the size of the
system we are considering and the lack of available data, we simply cannot use these variables
to the appropriate level of detail. Hence for our model, while we would love to include all
possible factors and generate a model of relevant sophistication, we will instead have to make
some simplifying assumptions that will allow us to generate a rough model of the network.
Data:
Due to its importance, there is a lot of valuable data on the Tube transportation
network; however, while there is ample data collected, we do not necessarily have access to all
the information of interest. That said we do have access to some of the more fundamental
data.
Firstly, we have the official map of the London Underground provided by Transportation
for London [1]. This seemingly trivial piece of data is actually one of the more useful sources of
information we have. Due to the way it is set up, it allows us to determine not only what
stations are connected to each other, but it also provides information about what lines connect
these stations. This information is immensely important in generating a flow graph of the Tube
that will later be used to run various calculations.
5. Next we have, again due to Transportation for London [1], information about the train
stock of individual lines. So for the 11 lines we know what the individual carrying capacities of
the trains are. They are, in people per train, as follows:
Bakerloo: 730
Central: 892
Circle: 865
District: 827
Hammersmith & City: 865
Jubilee: 817
Metropolitan: 865
Northern: 665
Piccadilly: 684
Victoria: 864
Waterloo & City: 892
This data we will later use to calculate the carrying capacity of the network.
Along with the above information, we will also need information about the speed and
distance between stations. Unfortunately, information about the transit, approach, dwell and
depart variables were difficult to come by, and as a result they are all compacted into the
average speed of a train over the entire network, which we know to be 33 km per hour [1]. This
is a detrimental assumption as we will later find out, but a necessary one.
Now information about edge length, that is the distance between any two adjacent
stations, will actually be a derived value based on the latitude and longitude of the stations.
This data, along with a list of stations that are contained in zones 1 and 2, was provided by
Doogal Co.UK [4]. It should be noted that for the distances, we will simply assume that the
edges tend to follow geodesics and so we will use the latitude/longitude of the stations to make
this calculation.
6. The next piece of pertinent data comes from Samuel Hickey [2] and Transportation for London
(TFL) [1], which is the acceptable wait time for a train in the London Underground. Now, as
mentioned earlier, with a macro view of the system this data may seem unnecessary. However,
while the actual wait time is not used in our system, it is used in determining the minimum train
frequency for each line, which then determines the actual number of trains in the system on a
line to line basis. So, according to Samuel Hickey [2], the maximum ‘acceptable’ wait time for a
train is less than 10 minutes. This information along with TFL claiming that the average wait
time for a train is between 2 and 7 minutes allows us to figure realistic train frequencies.
Finally, we require an understanding of the two districts in question. That is, what
stations are in the Residential District, and what stations lay in the Business District? To gather
this data, we make some educated assumptions about zoning, which we then will use to find
the appropriate stations contained in the respective districts. Information about the two
districts comes from Business 2 Community [5]. Here it is stated that the 5 primary business
districts are: Square Mile, Canary Warf, Southwark and London Bridge, West End, and
Shoreditch. These 5 areas make up for over half a million jobs for commuters, and as a result
they are considered as London’s Business District. For the residential district, we note that
many commuters live outside of Central London. Because of this, we will assume that the
commuters for the London Underground come from zone 3 and out. Thus, with this notion of
residential and business districts, we can now use our map to find the stations that lay in the
respective districts. Via the map we find that the stations in the business district include: Bank,
Barbican, Blackfriairs, Cannon Street, Covent Garden, Elephant and Castle, Lambeth North,
Leicester Square, London Bridge, Mansion House, Moorgate, Oxford Circus, Piccadilly Circus,
Southwark, St Paul’s, Waterloo, and Canary Wharf; while the stations included in the residential
district are: Archway, Brixton, Bromley-by-Bow, Clapham South, East Putney, Hampstead,
Kensel Green, Manor House, Mile End, North Acton, North Greenwich, Turnham Green, and
Willesden Green. Thus, we now have all the pertinent data to construct our model.
7. Model:
Using the above information, we now construct a model of London’s subway network in
the form of a directed graph. Using graph theory, we build the directed graph with edge
capacity, 𝐺 = {𝑉, 𝐸, 𝐶} where: 𝑉 = {𝑣 𝑛: 1 ≤ 𝑛 ≤ 123} which is the set of vertices which in our
case is the 123 distinct stations contained in zones 1 and 2, 𝐸 = {𝑒𝑖: 1 ≤ 𝑖 ≤ 163} which is the
set of directed edges connecting the stations of which there are 163, and 𝐶 = {𝑐 𝑘: 1 ≤ 𝑘 ≤
163}, which is the set of respective carrying capacities of the edges. Now, as mentioned above,
𝑉 and 𝐸 are found from our map of the underground. The edge capacities 𝐶 on the other hand,
will take further motivation to calculate and will be discussed in the next section. It should also
be noted that much of the above data will go into the calculations of the 𝑐 𝑘’s.
Now, while the graph 𝐺 is a realistic model, it is an incomplete model based on the
calculations we wish to run. As it is, 𝐺 does represent zones 1 and 2 of the London
Underground, but it currently contains no additional information about the residential and
business districts. To remedy this we consider our source 𝑆 ⊂ 𝑉 and target 𝑇 ⊂ 𝑉 where 𝑆 =
{𝑅𝑒𝑠𝑒𝑑𝑒𝑛𝑡𝑖𝑎𝑙 𝑛𝑜𝑑𝑒𝑠}, and 𝑇 = {𝐵𝑢𝑠𝑖𝑛𝑒𝑠𝑠 𝑛𝑜𝑑𝑒𝑠}. We want to embed our source and target
into the graph with unique properties, after all the idea is to calculate the maximum flow from
𝑆 to 𝑇. Furthermore, the maximum flow algorithm is used to find the maximum flow through
the network from a single source node to a single target node, and we currently have a multi-
source/target system. In order to remedy this, we create two ‘dummy’ vertices 𝑠 and 𝑡,
thought as source and target respectively. Next, we connect each of the 13 stations in 𝑆 to our
source node 𝑠, and each of the 17 stations in 𝑇 to our target node 𝑡. Let these edges be
denoted as 𝐷. Now we can use 𝐷, 𝑠 & 𝑡 to generate the necessary directed graph 𝐺∗
=
{𝑉 ∪ {𝑠, 𝑡}, 𝐸 ∪ 𝐷, 𝐶∗}. In doing so we add a total of 30 new edges to 𝐺, and it should be noted
that
𝐶∗
= 𝐶 ∪ {𝑠 𝑚, 𝑡 𝑝: 1 ≤ 𝑚 ≤ 13, 1 ≤ 𝑝 ≤ 17}
Where the 𝑠 𝑚’s and 𝑡 𝑝’s the respective new edges connecting the residential and business
nodes to the source and target nodes. Next e claim that while every edge in 𝐸 is bidirectional,
8. the constructed edges are not. That is, the residential edges strictly go from 𝑠 to 𝑠 𝑚 and the
business edges strictly go from 𝑡 𝑝 to 𝑡. Furthermore, these edges are assumed to have infinite
capacity. All of this is done so that our synthetic nodes do not restrict the overall flow of the
system.
So now that we have our directed graph 𝐺∗
, we can use the maximum flow algorithm of
a directed graph to find the maximum number of people our system can move from 𝑆 to 𝑇 in an
hour, subject to the various constraints considered in the next section. It should be noted that
we use Mathematica to calculate the max flow our network due to its size, and that their
algorithm differs from the one that we know and is given by in the proof of the Min-Cut Max-
Flow Theorem as presented by Jacques Verstraete [6]. And with this, we are now ready to run
some calculations.
Calculations and Derivations:
First and foremost, we have to calculate the edge capacities of our graph 𝐺∗
. Thus we
must calculate the set 𝐶 which first requires a sub calculation. Our first order of business then is
to find the line capacities of our system. The 11 distinct line capacities of our network are given
by the equation
𝑙𝑗 =
(∑ 𝑥 𝑖)(𝐹 𝑗)(𝑇 𝑗)
𝑣 𝑗
, 1 ≤ 𝑗 ≤ 11
Here the 𝑥𝑖’s correspond to the length of the edges 𝑒𝑖, which are determined by the
geographical distance of the stations that make up the edges, and we sum over all such edges
that line 𝑗 lies on in order to calculate to overall length of line 𝑗 within our system. Next, we
multiply by 𝐹𝑗, which represents the desired train frequency of line 𝑗. We then multiply by the
carrying capacity of the type of train for line 𝑗 denoted 𝑇𝑗. And finally, we divide out by 𝑣𝑗,
which is the average velocity of line 𝑗, to obtain the line capacities as above. Now it should be
made clear that here, the units are in people. Thus, this value derives the maximum number of
people each line can hold at any instant of time.
9. Two additional things should be noted here. The first is that while we know that average
velocity of the network is 33 km per hour, the average velocity of each line is not necessarily 33
km per hour. To represent some of this variation, we allow edges of length 5 km or greater to
travel at an average speed of 66 km per hour, while edges of length less than 300 meters are
restricted to only 22 km per hour. In doing so, we are able to retain our 33 km per hour average
speed while yielding a more accurate model. The second note is that we will take the ceiling of
the 𝑙𝑗’s in our final calculations in order to have an integer number of trains for each line.
Now that we have the line capacities, 𝑙𝑗 for 1 ≤ 𝑗 ≤ 11, we can calculate the edge
capacities of 𝐺∗
. The equation for the edge capacity of edge 𝑒 𝑘 is given by
𝑐 𝑘 =
(𝑟𝑘)(∑ 𝑙𝑗)
𝑥 𝑘
We sum over all lines that connect the pair of stations that define our edge 𝑒 𝑘, and then
multiply by average speed of that edge denoted as 𝑟𝑘. We then divide out by the length of that
edge to obtain the carrying capacity of each edge given in people per hour.
So now we have the derivation of the carrying capacities for the individual edges of our
graph 𝐺∗
which we will use to run a few different max-flow calculations.
Calculation 1:
𝑇𝑗 = 80% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑇𝑗 = 100% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦
𝐹𝑗 = 1 ℎ𝑜𝑢𝑟 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛 𝐹𝑗 = 1 ℎ𝑜𝑢𝑟 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛
𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 281,834 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 353,707
Calculation 2:
𝑇𝑗 = 80% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑇𝑗 = 100% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦
𝐹𝑗 = 10 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛 𝐹𝑗 = 10 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛
𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 1,221,300 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 1,532,980
10. Calculation 3:
𝑇𝑗 = 80% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑇𝑗 = 100% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦
𝐹𝑗 = 7 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛 𝐹𝑗 = 7 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛
𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 1,747,770 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 2,193,500
Calculation 4:
𝑇𝑗 = 80% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑇𝑗 = 100% 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦
𝐹𝑗 = 5 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛 𝐹𝑗 = 5 𝑚𝑖𝑛𝑢𝑡𝑒𝑠 𝑝𝑒𝑟 𝑡𝑟𝑎𝑖𝑛
𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 2,285,740 𝑀𝑎𝑥 𝐹𝑙𝑜𝑤 = 2,868,850
Conclusions:
We can immediately see that, given our assumptions, the London Underground can flat
move an extraordinary number of people from outside zone 2 to the business district in an
hour. In fact, our calculation shows that in a matter of two hours, the Tube can exceed the daily
demand for ridership. So what went wrong?
Well, that is a difficult question to answer. We did make some averaging assumptions
that may have shifted our results. Furthermore, we did also use rounding in our calculations to
yield integer valued functions. Yet, these factors combined should not account for the gap we
see between our model and reality. In fact, it is likely that our biggest assumption responsible
for this discrepancy is derived from the Maximum Flow algorithm. The Maximum Flow
algorithm assumes that all aspects of the system are focused on moving the maximal number of
units from A to B, and completely ignores interactions within the system outside of
transportation from source to sink. That is, if we were dealing with freight and wanted to find
the Maximal Flow of freight we can move from source to target, the Max-Flow Min-Cut
11. algorithm would be much more accurate. Yet because we are looking at a system where the
‘freight’ interacts within the system and the macro demand isn’t strictly getting from A to B, the
algorithm is less powerful than initially anticipated. Finally, the maximum flow assumes rational
actors who have a foreknowledge of the system and will take considerable longer routes
despite the inconveniences. This of course is not true but was a necessary assumption for our
model. Yet, while these assumptions have been made, our results are not rendered utterly
useless.
While it is true that the question of efficiency becomes difficult to answer given our
result, we can say a few interesting things about the network. For example, we have found that
if required, the London Underground can evacuate the inhabitants of the City Proper, via the
business district stations, to the Greater London Area in as quickly as 4 hours. Furthermore, if
London restricts transportation on the Underground during rush hour to only transport
commuters from the residential district to the business district, and if all commuters were to
leave in the same hour, the Tube could transport all of its Underground commuters in an hour’s
time. So while we are unable to answer our initial question, we still yielded an interesting result
that does tell us valuable information about the system.
Now, to further improve the model a few things should be done. First and foremost is
finding access to more data. Instead of assuming an average speed for the system, it would be
ideal to know the travel, approach, dwell and depart variables for each edge of the network.
And even without that information, the model would be more viable with more accurate
knowledge of the average speed of each line. Furthermore, more data on the number of trains
and the spacing between them for each individual line would yield a more precise model.
Secondly, this model would improve by looking at different sources and targets. That is,
find the starting/ending destinations with the most demand and find the networks maximum
flow between those stations, given some restraints. This would allow us to make localized
improvements on the system which in the long run would improve the system as a whole.
Finally, this model could be further improved by taking a different approach to the
question of efficiency. As stated earlier, a single max-flow algorithm is not ideal for such
12. systems. Thus in addition to finding more sources/targets, it may be more efficient to abandon
the idea of maximum flow entirely for a more commuter specific model.
Thus, while we were unable to answer our initial question and our model could stand
improvements, we were able to build a network of the London Underground that yielded
interesting and pertinent information about the system.
13. References
[1] “Transportation for London”, http://www.tfl.gov.uk/corporate/about-tfl/what-we-
do/london-underground
[2] Hickey, Samuel Warren (2011). “Improving the Estimation of Platform Wait Times of
the London Underground”, Massachusetts Institute of Technology Library.
[3] Wolmar, Christian (2004). The Subterranean Railway: how the London Underground
was built and how it changed the city forever. Atlantic.
[4] Doogal Co.UK, http://www.doogal.co.uk/london_stations.php
[5] Business 2 Community, http://www.business2community.com/travel-leisure/5-
business-districts-to-work-in-london-0261885
[6] Verstraete, Jacques. http://www.math.ucsd.edu/~jverstra/154-part5-2014.pdf