Be the first to like this
Leveraging enterprise information is no easy task, especially when unstructured information represents more than 80% of enterprise content. Meaningfully structuring content is critical for companies, Natural Language Processing and Semantic Enrichment is becoming increasingly important to improve the quality of tasks related to information retrieval.
With the Semantic Web moving towards full realisation thanks to the Linked Data initiative and with the interest of major search engines in structured data, the enterprise search world is finding it more attractive to make its information machine readable and exploit that information to improve search over its content.
In this scenario, three trends are transforming the face of search:
Entity-oriented search. Searching not by keyword, but by entities that represent specific concepts in a certain domain.
Knowledge graphs. Leveraging relationships amongst entities: Linked Data datasets (Freebase, DbPedia….) or custom companies’ knowledge bases.
Search assistance. Autocomplete and spellchecking are now common features, but making use of semantic data makes it possible to offer smarter features, guiding the users to what they want, in a natural way.
Sometimes, the proper resources for building such features are not easy to obtain. In order to generate these, our approach includes a number of unstructured data processing mechanisms the goal of which is to automatically extract semantic information:
Extract content from heterogeneous data sources
Extract domain information and enrich the content through different NLP processes like Named Entity Recognition, Coreference Resolution, Entity Linking and Disambiguation, and Topic Annotation
Create specialised indexes to store the semantic information extracted
Currently there are a number of well developed uses of semantic extracted information such as faceting and concept indexing, however further methods of exploiting semantic extracted information are presenting themselves in the industry:
The target of this feature is to automatically complete users’ phrase with entity names and properties, helping them to find the desired documents through exploration of the domain Knowledge Graph. As the user keys in the phrase, the system will propose a set of named entities and/or a set of entity types. As the user accepts a suggestion, the system will dynamically adapt following suggestions to the chosen context.
The accuracy delivered by entity driven search brings increased satisfaction among users. They will see documents that are about a specific semantic concept, with concrete properties, and not about a keyword that can be ambiguously interpreted.
Semantic More Like This
A feature to find documents similar to one that is input, based on the underlying knowledge in the documents, instead of tokens.