This document discusses the challenges of geolocating written route descriptions by extracting geographic entities and mapping them. It notes that determining locations from text is difficult due to issues like ambiguous place names, uncommon street names, and lack of address standardization. The document outlines an algorithm to detect places, streets and points of interest in text, geocode them, and connect locations to form a route. It acknowledges further work is needed to improve location detection and route determination, especially for foreign languages.
Quarter 1-ENGLISH GRADE 4 WEEK 2 POWEPOINT PRESENTATION FOR PLURALIZATION OF ...
Report
Share
1 of 35
More Related Content
Why Geolocating Written Routes Is Harder Than It Looks
1. Why Geolocating Written Routes is Harder than it LooksIan TurtonDepartment of GeographyPenn State University, University Park, PAijt1@psu.edu
2. Acknowledgements Research for this paper was funded by the National Geospatial-Intelligence Agency/NGA through the NGA University Research Initiative Program/NURI program. The views, opinions, and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the National Geospatial-Intelligence Agency or the U.S. Government.
4. Aims GeoCAM project aims to take written route descriptions and convert them to maps.
5. Extracting and Mapping Geographic Entities A simple natural language processing taskAll you need to do:Is find the named entitiesDetermine if they are geographic featuresJoin them up to form a routeThen draw a map with them on
7. Multiple Places Berlin – any reasonable system chooses Germany (40 worldwide, 28 in US)Springfield occurs in many states and many times per stateMaitri (flickr) CC-NC-SA
8. Strange Place NamesLook out for North East (both MD and PA). Or North (VA, SC)Or South (AL, KY)Not to be confused with NE Irving St or Nebraska.
15. Roads with Multiple NamesMany highways and interstates have two or more names.I-99/US 220/Bud Shuster HwyStreets are no betterPhotograph by Joe Mabel, licensed under GFDL
19. Directions to the State College Farmers’ MarketFROM EAST or WEST : Exit Rt 322 at College Ave.(also named RT 26; Benner Pike). Turn west onto College Aveand proceed 2 mi. to Locust Lane (2nd street to left after campus intersection of College Ave. & Shortlidge Rd(Garner St).FROM NORTH or SOUTH (within town): Follow Atherton Street (business Rt. 322) to Beaver Ave. light. Turn east on Beaver Ave. and go thru 4 lights (6 blocks) to Locust lane.
21. IssuesCollege Av (Rt. 26 or Benner Pike)In the database as West (or East) College AvAlternate name is PA 26Benner Pike is actually PA150.Atherton Street (business Rt. 322)In the database as North (or South) Atherton StMostly referred to as Atherton by locals
22. Directions to The Callan Theatre From west (Adams Morgan, Georgetown)At the intersection of 16th Street and Irving Street, two blocks east of the heart of Adams Morgan, go east on Irving (right if you are coming from downtown). Cross Georgia Avenue, pass the Washington Hospital Center, go under North Capitol Street. Irving dead-ends on Michigan Avenue; turn left onto Michigan. The next light, which comes up quickly, is Harewood Road; turn left onto Harewood. The theater is half a mile up Harewood on the right.
24. IssuesAll references are missing the vital NE/NWNeed to distinguish between Michigan (Avenue) and the State of Michigan.
25. How to handle Street names?We need to be able to look up shortened (and possibly partly wrong) street names in our database.Could use SQL LIKE querySlow, difficultProgrammatic adjustment of names encountered (add N/S/E/W etc)
26. StemmingIn linguistic morphology, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
27. Stemming ExamplesSoDog and dogs (and doggedly) stem to dog.Cat, cats and cattily stem to cat. fishing, fished, fish, and fisher stem to fish.This allows a search engine to return pages about dogs when you search for dog.
28. So Stemming for Street Names…Washington Pl, Washington Ave and Washington St all become Washington.North Atherton St, South Atherton St and Atherton Av become Atherton.1St St, First Ave and 1 St N become 1.Interstate 80, I-80 and I 80w become 80.State Rd 26, PA 26 and county road 26 become 26.
29. Does this help?This allows us to take (possibly wrong) and often shortened street names and look them up in the database.(after we have worked through the 7.3 million named street segments in the USA)(Caching is obviously our friend here)
31. Route Detection AlgorithmFind numeric strings in textLookup zip codes and telephone numbersSelect Noun Phrases (NP)JTextProFor each NP: Is it a state?Is it a town (populated place)?Is it a point of interest (POI)?Is it a street name (after stemming)?
32. Algorithm (continued)Select any NP that is unambiguous (Zzyzx Rd, Autumn Crocus Ct, Abell City) Define a minimum bounding box (or polygon) based on the unambiguous points.Sort the ambiguous NP based on number of matches (so prefer Gibsonia (3) over Midway (214)) then see if only one falls in the polygon if select that one.
33. Route FormationOnce you have a beginning and an end for the route attempt to determine the road segments that form the route.Note: not all road segments are named nor do they all join correctlyWhere possible determine turns from one road to another and use this to truncate highlighted sections of roadPass the details to the mapping server and display
34. Further WorkImprove detection of streets, POI and places (linguistics).Take the X exit (X is probably a place)Turn left at X (X is probably a POI)Turn left on X (X is probably a street) Improve route determinationFix up streets database – naming and joinsImprecise routingGiven a set of POI which roads pass nearby?Probably a graph problem?