SlideShare a Scribd company logo
Why Geolocating Written Routes is Harder than it LooksIan TurtonDepartment of GeographyPenn State University, University Park, PAijt1@psu.edu
Acknowledgements   Research for this paper was funded by the National Geospatial-Intelligence Agency/NGA through the NGA University Research Initiative Program/NURI program. The views, opinions, and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the National Geospatial-Intelligence Agency or the U.S. Government.
SummaryRoute mappingWhy?How?ProblemsFixes?
Aims GeoCAM project aims to take written route descriptions and convert them to maps.
Extracting and Mapping Geographic Entities A simple natural language processing taskAll you need to do:Is find the named entitiesDetermine if they are geographic featuresJoin them up to form a routeThen draw a map with them on
BUT
Multiple Places Berlin – any reasonable system chooses Germany (40 worldwide, 28 in US)Springfield occurs in many states and many times per stateMaitri (flickr) CC-NC-SA
Strange Place NamesLook out for North East (both MD and PA). Or North (VA, SC)Or South (AL, KY)Not to be confused with NE Irving St or Nebraska.
But…If you thought towns were hard wait until you try streets.
213,142 Towns, 13,543,533 Streets in the USASpringfield – 66, Berlin - 28
Common Nouns as Street NamesThese are hard to distinguish from nouns at the start of the sentence.
Ambiguous Road NamesTurn from Independence into Washington.Washington fought for independence.Plus there are 246 Washington Twp
Interesting Road NamesColorado has many “interesting” road names.Consider also Street Rd which occurs in 9 different states.
Really Interesting Names!Slworking2 Flickr – CC-NC-SA
Roads with Multiple NamesMany highways and interstates have two or more names.I-99/US 220/Bud Shuster HwyStreets are no betterPhotograph by Joe Mabel, licensed under GFDL
The Main Street issueAwesome Joolie Flickr CC-SA -NC
Numbered StreetsIs this 39th St?Or Thirty Ninth St?Or 39 St?All appear to be equally acceptable when writing directions.Bitchcakesny (Flickr) CC-NC-SA
Directions as seen in the wild
Directions to the State College Farmers’ MarketFROM  EAST or WEST : Exit  Rt 322 at College Ave.(also named  RT 26; Benner Pike). Turn west onto College Aveand proceed  2 mi. to Locust Lane (2nd street to left after campus intersection of College Ave. & Shortlidge Rd(Garner St).FROM  NORTH or SOUTH (within town): Follow Atherton Street (business Rt. 322) to Beaver Ave. light. Turn east on Beaver Ave. and go thru 4 lights (6 blocks) to Locust lane.
Downtown State College, PAN Atherton StS Atherton StE College AvW College AvPA26
IssuesCollege Av (Rt. 26 or Benner Pike)In the database as West (or East) College AvAlternate name is PA 26Benner Pike is actually PA150.Atherton Street (business Rt. 322)In the database as North (or South) Atherton StMostly referred to as Atherton by locals
Directions to The Callan Theatre   From west (Adams Morgan, Georgetown)At the intersection of 16th Street and Irving Street, two blocks east of the heart of Adams Morgan, go east on Irving (right if you are coming from downtown). Cross Georgia Avenue, pass the Washington Hospital Center, go under North Capitol Street. Irving dead-ends on Michigan Avenue; turn left onto Michigan. The next light, which comes up quickly, is Harewood Road; turn left onto Harewood. The theater is half a mile up Harewood on the right.
Catholic University NeighbourhoodIrving St NWIrving St NEMichigan Av NWMichigan Av NE
IssuesAll references are missing the vital NE/NWNeed to distinguish between Michigan (Avenue) and the State of Michigan.
How to handle Street names?We need to be able to look up shortened (and possibly partly wrong) street names in our database.Could use SQL LIKE querySlow, difficultProgrammatic adjustment of names encountered (add N/S/E/W etc)
StemmingIn linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
Stemming ExamplesSoDog and dogs (and doggedly) stem to dog.Cat, cats and cattily stem to cat. fishing, fished, fish, and fisher stem to fish.This allows a search engine to return pages about dogs when you search for dog.
So Stemming for Street Names…Washington Pl, Washington Ave and Washington St all become Washington.North Atherton St, South Atherton St and Atherton Av become Atherton.1St St, First Ave and 1 St N become 1.Interstate 80, I-80 and I 80w become 80.State Rd 26, PA 26 and county road 26 become 26.
Does this help?This allows us to take (possibly wrong) and often shortened street names and look them up in the database.(after we have worked through the 7.3 million named street segments in the USA)(Caching is obviously our friend here)
Why Geolocating Written Routes Is Harder Than It Looks
Route Detection AlgorithmFind numeric strings in textLookup zip codes and telephone numbersSelect Noun Phrases (NP)JTextProFor each NP: Is it a state?Is it a town (populated place)?Is it a point of interest (POI)?Is it a street name (after stemming)?
Algorithm (continued)Select any NP that is unambiguous (Zzyzx Rd, Autumn Crocus Ct, Abell City) Define a minimum bounding box (or polygon) based on the unambiguous points.Sort the ambiguous NP based on number of matches (so prefer Gibsonia (3) over Midway (214)) then see if only one falls in the polygon if select that one.
Route FormationOnce you have a beginning and an end for the route attempt to determine the road segments that form the route.Note: not all road segments are named nor do they all join correctlyWhere possible determine turns from one road to another and use this to truncate highlighted sections of roadPass the details to the mapping server and display
Further WorkImprove detection of streets, POI and places (linguistics).Take the X exit (X is probably a place)Turn left at X (X is probably a POI)Turn left on X (X is probably a street) Improve route determinationFix up streets database – naming and joinsImprecise routingGiven a set of POI which roads pass nearby?Probably a graph problem?
Foreign LanguagesSecret Pilgrim (Flickr) CC-NC-SAGullevek (flickr) cc-nc-sa

More Related Content

Why Geolocating Written Routes Is Harder Than It Looks

  • 1. Why Geolocating Written Routes is Harder than it LooksIan TurtonDepartment of GeographyPenn State University, University Park, PAijt1@psu.edu
  • 2. Acknowledgements Research for this paper was funded by the National Geospatial-Intelligence Agency/NGA through the NGA University Research Initiative Program/NURI program. The views, opinions, and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the National Geospatial-Intelligence Agency or the U.S. Government.
  • 4. Aims GeoCAM project aims to take written route descriptions and convert them to maps.
  • 5. Extracting and Mapping Geographic Entities A simple natural language processing taskAll you need to do:Is find the named entitiesDetermine if they are geographic featuresJoin them up to form a routeThen draw a map with them on
  • 6. BUT
  • 7. Multiple Places Berlin – any reasonable system chooses Germany (40 worldwide, 28 in US)Springfield occurs in many states and many times per stateMaitri (flickr) CC-NC-SA
  • 8. Strange Place NamesLook out for North East (both MD and PA). Or North (VA, SC)Or South (AL, KY)Not to be confused with NE Irving St or Nebraska.
  • 9. But…If you thought towns were hard wait until you try streets.
  • 10. 213,142 Towns, 13,543,533 Streets in the USASpringfield – 66, Berlin - 28
  • 11. Common Nouns as Street NamesThese are hard to distinguish from nouns at the start of the sentence.
  • 12. Ambiguous Road NamesTurn from Independence into Washington.Washington fought for independence.Plus there are 246 Washington Twp
  • 13. Interesting Road NamesColorado has many “interesting” road names.Consider also Street Rd which occurs in 9 different states.
  • 15. Roads with Multiple NamesMany highways and interstates have two or more names.I-99/US 220/Bud Shuster HwyStreets are no betterPhotograph by Joe Mabel, licensed under GFDL
  • 16. The Main Street issueAwesome Joolie Flickr CC-SA -NC
  • 17. Numbered StreetsIs this 39th St?Or Thirty Ninth St?Or 39 St?All appear to be equally acceptable when writing directions.Bitchcakesny (Flickr) CC-NC-SA
  • 18. Directions as seen in the wild
  • 19. Directions to the State College Farmers’ MarketFROM  EAST or WEST : Exit  Rt 322 at College Ave.(also named  RT 26; Benner Pike). Turn west onto College Aveand proceed  2 mi. to Locust Lane (2nd street to left after campus intersection of College Ave. & Shortlidge Rd(Garner St).FROM  NORTH or SOUTH (within town): Follow Atherton Street (business Rt. 322) to Beaver Ave. light. Turn east on Beaver Ave. and go thru 4 lights (6 blocks) to Locust lane.
  • 20. Downtown State College, PAN Atherton StS Atherton StE College AvW College AvPA26
  • 21. IssuesCollege Av (Rt. 26 or Benner Pike)In the database as West (or East) College AvAlternate name is PA 26Benner Pike is actually PA150.Atherton Street (business Rt. 322)In the database as North (or South) Atherton StMostly referred to as Atherton by locals
  • 22. Directions to The Callan Theatre From west (Adams Morgan, Georgetown)At the intersection of 16th Street and Irving Street, two blocks east of the heart of Adams Morgan, go east on Irving (right if you are coming from downtown). Cross Georgia Avenue, pass the Washington Hospital Center, go under North Capitol Street. Irving dead-ends on Michigan Avenue; turn left onto Michigan. The next light, which comes up quickly, is Harewood Road; turn left onto Harewood. The theater is half a mile up Harewood on the right.
  • 23. Catholic University NeighbourhoodIrving St NWIrving St NEMichigan Av NWMichigan Av NE
  • 24. IssuesAll references are missing the vital NE/NWNeed to distinguish between Michigan (Avenue) and the State of Michigan.
  • 25. How to handle Street names?We need to be able to look up shortened (and possibly partly wrong) street names in our database.Could use SQL LIKE querySlow, difficultProgrammatic adjustment of names encountered (add N/S/E/W etc)
  • 26. StemmingIn linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
  • 27. Stemming ExamplesSoDog and dogs (and doggedly) stem to dog.Cat, cats and cattily stem to cat. fishing, fished, fish, and fisher stem to fish.This allows a search engine to return pages about dogs when you search for dog.
  • 28. So Stemming for Street Names…Washington Pl, Washington Ave and Washington St all become Washington.North Atherton St, South Atherton St and Atherton Av become Atherton.1St St, First Ave and 1 St N become 1.Interstate 80, I-80 and I 80w become 80.State Rd 26, PA 26 and county road 26 become 26.
  • 29. Does this help?This allows us to take (possibly wrong) and often shortened street names and look them up in the database.(after we have worked through the 7.3 million named street segments in the USA)(Caching is obviously our friend here)
  • 31. Route Detection AlgorithmFind numeric strings in textLookup zip codes and telephone numbersSelect Noun Phrases (NP)JTextProFor each NP: Is it a state?Is it a town (populated place)?Is it a point of interest (POI)?Is it a street name (after stemming)?
  • 32. Algorithm (continued)Select any NP that is unambiguous (Zzyzx Rd, Autumn Crocus Ct, Abell City) Define a minimum bounding box (or polygon) based on the unambiguous points.Sort the ambiguous NP based on number of matches (so prefer Gibsonia (3) over Midway (214)) then see if only one falls in the polygon if select that one.
  • 33. Route FormationOnce you have a beginning and an end for the route attempt to determine the road segments that form the route.Note: not all road segments are named nor do they all join correctlyWhere possible determine turns from one road to another and use this to truncate highlighted sections of roadPass the details to the mapping server and display
  • 34. Further WorkImprove detection of streets, POI and places (linguistics).Take the X exit (X is probably a place)Turn left at X (X is probably a POI)Turn left on X (X is probably a street) Improve route determinationFix up streets database – naming and joinsImprecise routingGiven a set of POI which roads pass nearby?Probably a graph problem?
  • 35. Foreign LanguagesSecret Pilgrim (Flickr) CC-NC-SAGullevek (flickr) cc-nc-sa