All US patents from 1976 onwards are freely available in computer-readable formats providing a large corpus for chemical text mining. Other patent offices are increasingly also offering their back-catalogues as XML, allowing chemical text mining to be performed in the same way as for recent US patents.
We investigate how much chemistry is found in non-US patents (compared to US patents) and, where the chemistry is present in publications from multiple patent offices, how long were the delays between these publications.
We show that non-US patents can be text mined for a large number of chemical reactions and analyse the overlap with reactions from US patents. Finally we use all the extracted chemical reactions to explore whether models for predicting reaction yield may be built from features such as the reaction type and its reaction conditions (as text mined from the patent text).