Unstructured Text in Big Data: the Elephant in the Room
David Milward (Linguamatics, UK)
A traditional approach has been to extract structured data from the unstructured text. When it comes to big data, manual or semi-automated methods just don't scale. Even with fully automated methods, it is infeasible to extract all the facts and relationships buried in the text to produce a 'knowledge base of everything' that answers every possible end-user question. In recent years, there has therefore been a trend towards using text mining over unstructured data to directly answer questions, effectively creating novel databases on the fly. However, this has widened the gap between information professionals and end users.
In this talk I will explain how multiple approaches are being adopted to exploit the skills of information professionals to improve decision support for end users. These include specialised real-time querying, alerting based on semantic queries, semantic enrichment of documents, and population of semantic stores or linked data.