Research Report - Merging Interstate Transportation Networks for routing Hazardous Materials
1. 1
Merging Interstate Transportation Networks for routing
Hazardous Materials
I. ABSTRACT
Andrew Emerson (Furman University, Greenville, SC 29613)
Ingrid Busch, PhD (Oak Ridge National Laboratory, Oak Ridge, TN 37830)
The Oak Ridge National Laboratory’s Center for Transportation Analysis (CTA) uses intermodal
transportation network models to analyze and organize methods for the routing of materials and
people. For example, one of the CTA’s ongoing projects is to develop plans and maps for safe
transportation of hazardous materials. Current transportation networks contain rich, detailed
information such as the designation of roads for hazardous materials. However, the intermodal
network that the CTA is currently using is outdated and inaccurate. To improve the network, we
planned to merge the current CTA highway network with a newer, open-source network called
OpenStreetMap (OSM). OSM contains much more precise and accurate geospatial data and is
updated daily. OSM, like other crowd-sourced developed networks, contains detailed geospatial
data. However, it lacks many of the attributes that CTA generally relies on for use in its projects.
First, we compared the OSM links and nodes to the links and nodes currently in CTA database.
The nomenclature OSM uses to classify road types is similar, but not identical, to the
classifications that CTA uses. Thus, finding the appropriate highways to update was the first
task. Second, we grouped the OSM data into individual datasets which we refer to as
“SuperLinks.” Each SuperLink contains links and nodes for various sections of each interstate.
By using these larger datasets, we could more easily find the corresponding CTA link containing
the attributes to be captured. We identified specific starting nodes within the CTA network and
employed geographical techniques to merge the two networks. By doing this, we have built an
improved CTA network and developed a reusable merging method leading to a much more
accurate and updated transportation model.
2. 2
II. INTRODUCTION
The CTA utilizes transportation data to complete various projects ranging from fuel
efficiency to the routing of hazardous materials (HM-164). These projects produce research that
goes towards the “efficient, safe and free movement of people and goods in our Nation's
transportation systems.”1
One of the major transportation networks that the CTA uses contains
multimodal routes: highway, railway, waterway, and airway. In projects relating to HM-164
routing, researchers rely on road designations contained in the CTA database. However, the data
in the CTA network is inaccurate and outdated. For instance, several of the interstates don’t quite
follow mapped satellite images. Additionally, since the CTA network was created, many roads
and interstates have been extended or rerouted. An improved network would allow more precise
and efficient routing. Also, a method to update the network would allow for a straightforward,
reusable process in obtaining new data. To improve the network and provide up-to-date data, our
project aimed to merge the current CTA network with a newer, open-source network called
OpenStreetMap (OSM). OSM is “built by a community of mappers that contribute and maintain
data about roads, trails, cafés, railway stations, and much more, all over the world.”2
Our project
was primarily concerned with interstates. To visualize results, we mapped the link/node data on
Google Earth.
In the process of investigating methods to merge the networks, we considered a couple
different strategies. One immediate issue that we faced is that OSM classifies its data in different
ways than the current CTA network. To accommodate this, we had to manipulate the
downloaded data in the OSM network to even begin our project. In addition to data
manipulation, we dealt with other concepts, such as conflation. In this paper, I will discuss
conflation, data manipulation, and network merging, along with the methods we decided to use
to merge the OSM and CTA transportation networks.
III. METHODS
A. Conflation
When investigating methods to combine the two networks, one existing concept that we
found is called “conflation”. Conflation is broadly defined as, “a set of procedures that aligns the
features of two geographic data layers and then transfers the attributes of one to the other.”3
This
essentially describes the general idea of our project, but there was not a specific set of procedures
that immediately fit our needs. Many current conflation software tools are complex and would be
more time consuming to use in our case. While we did not use conflation software in our project,
we followed the same general principles that these software packages follow, and we developed
a conflation technique specific to our networks.
B. Converting OSM Interstate Names
1
"Welcome Page." Center for Transportation Analysis. UT Battelle. Web. <http://cta.ornl.gov/cta/>.
2
"About." OpenStreetMap. Web. <http://www.openstreetmap.org/about>.
3
"GIS Dictionary." Esri: Support. Web.
<http://support.esri.com/en/knowledgebase/GISDictionary/term/conflation>.
3. 3
As stated previously, OSM organizes its data somewhat differently than CTA does. The
first step that we took when approaching this problem was to scan through OSM data to
determine what modifications we would need to make in order to compare it to CTA data. One
area where the data did not compare very well was in the interstate name. In the CTA network,
all interstates are named in the same format; “I” followed by the identifying number. We wanted
to stick with this uniform naming convention. Because OSM is open-source, the users who enter
information on their network may have inconsistencies with their naming methods. For instance,
the interstates could be named “I-10,” “I 10,” or “I10,” but still refer to the same road. Due to
this fact, we had to find a way to convert all of the interstate names in OSM to the format that
CTA uses, which would be “I10.”
Using Java and the integrated development environment Eclipse, we wrote a program
that would access the database containing all of the OSM interstates. In order to connect from
Eclipse to the CTA and OSM databases, we used Java Database Connectivity (JDBC), which is a
software package containing query/database commands for Java programs. The first program we
used stored all of the interstates and then parsed each individual link name. To parse correctly,
we set the delimiters as anything that wasn’t an “I” or an identifying number. When done
parsing, the program saved the new name and then inserted the name into the database as that
link’s name. While this seemed to be an efficient solution, there were still a few exceptions that
had unusual names in the OSM network. For example, some interstates technically have two
names, such as I40/I75. In the CTA database, there is only one name associated with a given
interstate. We had to manually go into the database and fix these names individually.
C. Creating “SuperLink” Datasets
The second step was to determine a way to organize the OSM data in a way that would make
it easier to compare to the CTA data. We chose to handle larger datasets to compare because the
networks themselves are very big. To do this, we created what we refer to as “SuperLinks,”
which are large segments of an individual interstate. SuperLinks start and end based on a few
guidelines: they start where a starting node is not attached to another interstate, where a starting
node is attached to an interstate with a different name, where a starting node is attached to a
merge between two links, and where an interstate splits into two or more links (each link is the
start of a SuperLink). The endpoints of each SuperLink would be at the end of an interstate,
where two or more links merge together, or where two or more links split apart. With these
guidelines, several SuperLinks would make up a single interstate, and the SuperLinks would
more closely resemble the links in the current CTA datasets.
Again by using Java within Eclipse, we wrote a program that would generate SuperLinks for
each existing interstate. The program begins by inducing a loop through each interstate, and
stores all links. The valuable link information that we stored are the wayID (link identification),
partID (links could have multiple parts), aNode (start node), bNode (end node), distance (in
miles) and name. Next, the program finds all instances where there is a start point on the
interstate – the guidelines mentioned in the previous paragraph. By storing each of these start
points, the program creates a SuperLink for each one of these points and inserts this into the
SuperLink table in the OSM database. Then, for each SuperLink, the program follows the
starting link in the database and finds each proceeding link in order, while storing the new
4. 4
endpoint and distance. Once the program finds the end of one SuperLink, it proceeds to the next
one. When the program finishes one interstate, it moves to the next one until all interstates are
composed of SuperLinks, and each link that is an interstate is contained in a SuperLink. The
result is that each link that is an interstate in the OSM network is now organized into a
SuperLink, making it easily comparable to the CTA network.
D. Generating SuperLink Geometry Unions
The biggest problem that we faced at this point was that none of the SuperLinks had a
geometric aspect to them, meaning they could not be mapped. At this point, the SuperLinks were
just a collection of links and not geometric entities themselves. Each link in the OSM network
has an attribute labeled as “coordinates,” which is just a Line String4
that represents the
geometrical aspect of the link. In the CTA network, this aspect is stored in the “geometryItem”
attribute column. To give the SuperLink its own geometrical aspect, we decided to use the SQL
server’s built in “STUnion” function that combines Line Strings of multiple geometric objects.
However, this function only works for two objects, or in our case, links, at a time.
To account for this, we wrote another Java program to handle each SuperLink and combine
the links in the correct succession. The program selects each SuperLink and stores the individual
link’s information: wayID, partID, aNode, and bNode. Then, the program finds the first link
within the SuperLink and the proceeding link. Next, the program executes the STUnion function
within the OSM database for the two links, and their “coordinates” are combined. We repeated
this for each link within every SuperLink until no more links remained for a given interstate.
This allowed us to have a geometrical representation of the SuperLinks as individual entities to
map on Google Earth. With this, we are now able to more easily compare the SuperLinks to
existing CTA links.
E. Setting HM-164 Designations
Now that the SuperLinks are in the right format to find the corresponding CTA links, our
goal was to find the right places to begin comparing the links. Since our project is handling
hazardous material (HM-164) designated routes, we decided that it would be best to find areas in
the CTA network where HM-164 routes intersected with non-HM-164 routes. Then, we planned
to take the non-HM-164 CTA route and find the corresponding OSM SuperLink. Once we had
the corresponding SuperLink, we set its HM-164 attribute to “false” in the database. When we
designated all of the corresponding OSM routes as non-HM-164, we then set the remaining OSM
SuperLinks as HM-164.
To handle each of these OSM routes, we used a Java program to loop through the
intersection points. To produce the intersection points in the CTA network, we ran a query in the
database to find where HM-164 meets a non-HM-164 link. After finding these points, the
program stored the latitude and longitude of each point. Next, we created a stored procedure to
find the closest OSM SuperLinks to the CTA node/point within a specific radius. Out of the
returned SuperLinks, the program found the distance from aNodes and bNodes for each
SuperLink to the CTA node. If neither the aNode nor the bNode of the SuperLink fell within the
specified radius, that link needed to be split to be properly assigned HM-164 designation. Taking
this SuperLink, the program split it into two, starting the new SuperLink at the closest individual
link to the CTA intersection point.
4
A Line String is a sequence of points representing a linear object.
5. 5
Once each appropriate SuperLink was split, we could then assign HM-164 designations. To
do this, we first had to run a Java program to take the non-HM-164 CTA link in the intersection
and find its geospatial bearing. Finding the bearing would allow us to find the corresponding
OSM SuperLinks and actually assign the links attributes. Once we had the CTA link bearing, we
found the bearing of the closest OSM SuperLinks, based on their direction. Given that the
SuperLinks and CTA links don’t match up exactly, we allowed a tolerance level of 15 degrees
for the difference in bearings. When we found the appropriate SuperLinks, we gave them all
non-HM-164 designations (“false” in the database), matching up with the CTA links. Then,
every remaining SuperLink would get an HM-164 designation (“true” in the database).
IV. CONCLUSION
While we took many steps in order to merge the two networks, the process led to an
accurate, updated version of the CTA network. We manipulated the OSM data to be able to
compare it to corresponding CTA data, starting with OSM nomenclature. Then, we created larger
datasets of links and nodes called “SuperLinks,” with which we created geometrical attributes.
After finding HM-164/non-HM-164 intersections in the CTA database, we split the SuperLinks
appropriately. At each split, we determined the SuperLinks that corresponded to the CTA non
HM-164 link by determining its bearing. Last, once we assigned the correct HM-164 route
designations, the merge was complete. Both directions of travel were accounted for, and we
achieved the same result that commercial conflation software would have provided. However,
even with the more accurate OSM geospatial data and preserved HM-164 designations, there is a
possibility for more research. One possible path for research would be to determine a way to
reduce the time complexity of the steps we took. While the steps worked, they could be designed
in a way that requires less time. Another path for future research would be to test out various
types of networks other than OSM and attributes other than HM-164.
V. REFERENCES
"Welcome Page." Center for Transportation Analysis. UT Battelle. Web.
<http://cta.ornl.gov/cta/>.
"About." OpenStreetMap. Web. <http://www.openstreetmap.org/about>.
"GIS Dictionary." Esri: Support. Web.
<http://support.esri.com/en/knowledgebase/GISDictionary/term/conflation>.