Presented by David Smiley, Software Systems Engineer, Lead, MITRE
OpenSextant is an unstructured-text geotagger. A core component of OpenSextant is a general-purpose text tagger that scans a text document for matching multi-word based substrings from a large dictionary. Harnessing the power of Lucene’s state-of-the-art finite state transducer (FST) technology, the text tagger was able to save over 40x the amount of memory estimated for a leading in-memory alternative. Lucene’s FSTs are elusive due to their technical complexity but overcoming the learning curve can pay off handsomely.