Be the first to like this
Chemical databases grow larger every year. Without investing in additional hardware or improved software, the time to search these databases will in turn grow longer annually. With an ever-increasing number of pharmaceutical patents, the amount of chemical data associated with these is growing at a rate with which hardware advances alone cannot keep up.
Using automated mining of U.S. and European patents, we have extracted large collections of structural data in the form of reactions, mixtures, and exemplified compounds. Additional information such as protein targets and diseases are also extracted from each patent and associated with the structural data. We will describe how this data can be queried with natural language phrases and how these phrases are interpreted as structural queries.
Through innovations in substructure and similarity search algorithms, it is possible to search and retrieve hundreds of millions of chemical records in fractions of a second. We will demonstrate how this is achieved on a regular desktop machine using just-in-time and ahead-of-time compilation techniques.