321 - A Linear Referencing System for Synchronizing Independent Data Sets (shortened speaker notes)
1. Central Transportation Planning Staff
Distinct Road Layer Correlation
Using Linear Referencing
July 2015
David Knudsen
Esri User Conference
2. The Situation in Formal Terms
Emergency dispatch
Public geocoding
HPMS
Planning
Police reports
Operator reports
Data entry pick lists
3. The Situation in Formal Terms (Cont’d.)
Emergency dispatch
Public geocoding
HPMS
Planning
Police reports
Operator reports
Centerlines
Ramp, service road names
Road names
Addresses
Data entry pick lists
4. The Situation in a Nutshell
• Two line layers
represent the same
real world objects
• Attributes from one
could enrich or correct
attributes in the other
• The layers will be
separately maintained
and updated
Asset management roads
vs.
Address geocoding roads
Street names,
address ranges
Different purposes,
agencies, providers
5. The Situation in a Nutshell
• Two line layers
represent the same
real world objects
• Attributes from one
could enrich or correct
attributes in the other
• The layers will be
separately maintained
and updated
Different
geographic accuracy,
network topology
Street names,
address ranges
Edits will break or
invalidate established
relationships
7. Simple Relation
Add the key value to each row in one data set
uniquely identifying the corresponding row in
the other data set
Optimal Conditions:
• Key values are reasonably stable
• Relationship is one-to-one
• Attribute values independent of shape
8. Simple Relation Illustration
Road Record
Road_ID 23391
Road_Class Arterial
Speed_Limit 30
Crash Record
Crash_ID 102234
Fatalities 2
Road_ID_FK 23391
13. Simple Relation Shortfall (Cont’d.)
Entire feature in one data set corresponds to
entire feature in other data set
A B
1 2
14. Complex Relation
Section of feature in one data set corresponds
to section of feature in other data set
A B
1 2
Linear referencing events can specify sections
of features
15. Complex Relation (Cont’d.)
Linear referencing events
usually store simple
attributes like surface
width, speed limit, etc. for
sections of features in
one layer
Event Record
Event
fields
Route ID
From measure
To measure
Attribute
fields
24 feet wide,
16. Complex Relation (Cont’d.)
Instead of simple
attributes, we can store
events for another layer
Now there is a
relationship between a
section of one feature and
a section of another
Event Record
Event
fields
Route ID
From measure
To measure
Attribute
fields
Other-
layer
event
Route ID
From meas.
To measure
Other-layer event
17. Complex Relation (Cont’d.)
To ensure correct
relationship of attributes
with a lateral component
(left-hand, right-hand), we
can store two events—
one for each side.
Event Record
Event
fields
Route ID
From measure
To measure
Attribute
fields
Other-layer event for
left-hand attributes
Other-layer event for
right-hand attributes
18. Road Event Record
Road_ID 23391
From_Meas
To_Meas
R_Addr_ID
R_Addr_From
R_Addr_To
Complex Relation Illustration
Road Event Record
Road_ID 23391
From_Meas 0
To_Meas 0.16
R_Addr_ID
R_Addr_From
R_Addr_To
Address Record
Addr_ID 102234
24. Creating Relations Automatically
• Use spatial join
(“is identical to”)
• Creates only relations
between full lengths
• Adjust tolerance to suit
• Fix overly tolerant
matches when
reviewing adjacent,
unmatched features
25. Creating Relations Manually
• Use utility tool (add-in
created using VB.NET)
• Click to pick features
being related
• Click to define start
and end of events
being related on those
features
1
2
3
4 5
6
27. Maintaining the Relations
• Identify stale relations by comparing edit
dates (when available)
• Identify missing, obsolete, and stale
relations by finding differences (added,
deleted, or changed features) between
subsequent versions of one data set
28. Statistics
Road features (each data set) ~half million
Features in one-to-one relations 58%
Features in part-to-part relations 4%
Features in single-centerline-pair relations 3%
Features with no corresponding feature 15%
Automatable (prior simple relation) ~66%
Staff time (approximate person-years) 2
29. Esri’s Roads and Highways (R&H)
• Relations transferred directly to new R&H
advanced linear referencing system
• Utility tool for manual matching recreated for
Roadway Characteristics Editor
• Upon acceptance of correlation, R&H could
support joint maintenance of formerly
separate data
30. Questions and Comments
David Knudsen, GIS Analyst
Central Transportation Planning Staff
Boston Region Metropolitan Planning Organization
dknudsen@ctps.org
+1 857 702 3669
Good morning.
I am going to illustrate a method for correlating two distinct road data layers.
This method was developed for a MA Department of Transportation project to improve the geocoding hit rate on MA crash reports.
Many crash reports are entered from paper copy, using an interface with pick lists to validate road names, address ranges, and so on. The pick lists are generated from MassDOT’s Road Inventory, and its road names needed to be updated for the project so that the pick lists would match those found in standard address geocoding databases and in crash reports.
The best source for road names is a true address geocoding database maintained by the MA geographic information agency, MassGIS, using commercial data, the state telecom board address list, municipal tax parcels, and other sources.
The Central Transportation Planning Staff (CTPS) was engaged by MassDOT to update the Road Inventory road names from MassGIS’s database.
MassGIS wanted to add ramp and service road names and more precise geometry from the Road Inventory to its database. Neither the agencies nor the infrastructure were ready yet to merge the two databases and jointly maintain a centralized one.
Instead of just updating the road names in a one-off way, CTPS wanted to support future updates and an eventual merge, by building some kind of durable relationship that would allow attributes to be exchanged back and forth as each data set was updated independently.
Here, I boil down the situation that CTPS faced at the start of the project in 2012.
The first point was a challenge because the relationship between the pieces of the road data sets was not simple.
The last point underscores that any bridge we constructed between the data sets had to be easy to fix without reconstructing it from scratch.
This image shows the block where I live in each data set, and illustrates some of the difficulties of relating them.
When the data sets are overlaid, differences in coordinate precision are obvious.
Also, the address database (in orange) has an intersection at the lower right that is not present in the Road Inventory.
The Road Inventory, for its part, faithfully renders a traffic island at the end of my street in the upper left, which the address database simplifies to a T intersection.
The simplest way to relate two data sets is to add a column to one that stores IDs from the other.
This method didn’t meet our requirements. However, it is quite easy to implement, and was used in a previous project to create a simple relationship between our two road databases.
This relationship served as a starting point for our work.
Here’s a case where simple relation could work reasonably well.
The road inventory has a unique ID column called Road_ID and some number of other attributes.
If I have a crash point data set I want to enrich with Road Inventory attributes, I can add a column to it to store the IDs of each point’s associated road.
The column can be filled manually, or often by an automated method, such as geographic overlay.
The relation is fairly resilient.
If a crash location is moved because of revised information about its distance from the intersection, the coded Road_ID is still valid, and the relation is still good.
And, even if the road layer is edited to improve the geographic accuracy of the road feature, the relation is still valid because the crash occurred on that road, regardless of its precise representation in GIS.
In fact, the coded relation supports continued association of Road Inventory attributes with the crash point even though a geographic overlay would no longer work.
But a simple relation isn’t adequate for features in two different road databases.
I’ve added a column for the Road Inventory ID to the address database, but this address feature relates to several Road Inventory features.
And, when transferring address ranges, we don’t want to transfer the address feature’s full address range to each Road Inventory feature.
We could switch the source and destination tables, adding a column to the Road Inventory to reference address features’ unique IDs instead.
Now each of those three Road Inventory features can be related to that one address feature, but they’ll each still pick up its full address range.
And there are Road Inventory features that need to be related to more than one address feature, as in the circled area.
Between our two data sets, an entire feature in one frequently does not correspond to an entire feature in the other.
We needed a way to relate sections of features to each other.
For example, the first half of Road Inventory feature 1 to the second half of address feature A.
Linear referencing is often used to specify attributes for sections of road, and Massachusetts’ Road Inventory was already using it, even before its move to Esri’s Roads and Highways extension.
The Road Inventory’s attributes—such as surface width, posted speed limit, and so forth—were stored in tables together with a trio of fields that defined the sections of the roads they applied to.
What if, instead of adding a single column to this event table to store the other data set’s ID values, we added three columns to store events on the other data set’s features?
In fact, while we’re at it, why not add a second triplet of columns to store a second event reference?
Because of differences in the representation of divided roads, there are places where a Road Inventory feature’s right- and left-hand addresses come from two different address features.
From our previous illustration we have a Road Inventory feature that is 0.16 miles long.
Its entire length corresponds to part of the nearby address feature.
We construct the event reference to the Road Inventory feature first.
The nearby address feature is 0.17 miles long, and is drawn in the opposite direction.
The part of the feature that corresponds to the Road Inventory feature starts from measure 0.02 and runs to measure 0.17.
The address feature’s left range is 1 to 13.
Our relation indicates a range on the full length of the address feature; we can use that to derive a corresponding adjusted range from the full address range.
You may have noticed that the address event from- and to-measure values are flipped, with the higher number first.
This reflects the opposing directions of the two features. The address feature’s first left-hand address must be transposed to the Road Inventory feature’s last right-hand address, and so on.
This shows in full the basic case where an entire feature can be related to an entire feature from the other data set.
The Road Inventory event is in the first three columns.
The address event for right-hand attributes is in the next three columns; and the event for left-hand attributes is in the last three.
As is usually the case, the right- and left-hand events are on the same feature.
Here, multiple adjoining Road Inventory features are related to sections of one address feature.
Road Inventory feature 1 is associated with address feature A from its measure 0 to its measure 0.1.
Road Inventory feature 2 gets an event associating its full length with the same address feature A, but from its measure 0.1 to its measure 0.2.
Other associations found between features in two different road data sets are listed here.
Complex relations can account for them all, and are documented in an appendix to this presentation.
We created as many relations automatically as possible, ignoring partial feature relations and finding only the more common whole-feature to whole-feature associations.
The spatial join tool can be configured to join features when their geometry is identical, within a given tolerance.
With higher tolerances, some inappropriate relations are created between features. In such cases, however, adjacent features are not matched, and the area as a whole is corrected during review of unmatched features.
Creating a complex relation requires constructing three event references, which is laborious to create using ArcMap’s built-in tools.
So, we built a custom tool to specify a typical complex relation in six clicks, taking advantage of the fact that the left- and right-hand event references were usually the same.
Fixing errors is critical so that emergency dispatchers can rely on the accuracy of transferred address ranges.
Batch methods found errors, which we reviewed in ArcMap. Here, magenta highlights an address feature section referenced by multiple Road Inventory features.
Yellow highlights an address feature section that has not been referenced at all, which could result in some house addresses not being transferred.
As we related feature sections, we automatically created arrows between their midpoints.
If the arrows would be too short to see easily, we created circles around the midpoints instead.
The longest arrow here stands out clearly, drawing attention to an incorrect relation.
Whenever the data sets are updated, relations must be efficiently updated to support continued transfer of attributes between the data sets.
Relations are revisited when their date stamps are older than the edit date stamps of their associated features.
The address database lacks an edit date field, so we use a comparison tool to identify features that have changed.
We create new relations for new features in either data set.
This was a sizeable effort, and automation benefits were limited.
However, considering the completeness and accuracy of the relations we built, the staff time was quite low.
It is a rough estimate, since the workers were doing other tasks while creating the relations; and I have tried to adjust their time accordingly.
Because the relations were built using the same event system used for the other Road Inventory attributes, it was easy to migrate them to Roads and Highways, which MassDOT adopted near the end of the project.
Roads and Highways moves event editing to the browser, so we had to redo our tool for creating complex relations.
The relations must be maintained, at least until the address data agency reviews and accepts them.
They can then be used to transfer address attributes to the Road Inventory, and the address agency could take over maintenance responsibility.
Roads and Highways makes it easy to expose applicable parts of the Road Inventory to other groups so they can perform editing and maintenance on their own data.
This slide shows what would change in a one-to-one relation if the two features are drawn in opposing directions.
As I described in the previous illustration, the from- and to-measure values must be swapped so that when direction-sensitive attributes are pulled across the relation they can be transformed appropriately.
Here, sections of one Road Inventory feature are associated with multiple address features.
And, as I hinted earlier, a big stumbling block in relating two road data sets may be in their representation of divided roads.
A data set may use a separate GIS line for each carriageway or barrel, or only one.
A data set may not be consistent about it, and two data sets using the same standard still can differ if a road reconstruction project that adds or subtracts a median has gone unnoticed in one, but not the other.
The complex relation scheme must be able to encode the association between a single line in one data set and dual lines in the other…
…and vice versa.
Finally, the complex relation scheme should handle even cases where there is no associated feature in the other data set. This is simply a workflow consideration. If references to such features are not recorded with a complex relation, they cannot easily be distinguished from features that have been newly added to either data set and whose association with the other data set has not yet been determined. In other words, they could end up being needlessly reviewed repeatedly as the complex relation is updated for edits in the two data sets.