Be the first to like this
When searching across full text patent data, the numerical data given in a document is often at least as important in determining a document’s relevancy as the keywords present. Chemists or metallurgists may be interested in compositions or alloys citing specific amounts or ranges of a certain metal or compound, and engineers may be interested in specific dimensions, current measurements or temperature ranges. Unfortunately, searching comprehensively for physical quantities using conventional text-searching is practically impossible as many lexically distinct quantities can be matches. The fact that measurements of the same type may use different units further complicates matters.
Minesoft have been working on the problem of facilitating searches including physical quantity criteria, and here we report on our success with using text-mining to automate the identification and interpretation of quantities in patents in our new PatDocs tool.
All indexed terms and user queries are converted into ranges in standardized units, for example, “>5 to ≤10 miles per hour” is interpreted as the range (2.2352,4.4704] m/s.) Rather than forcing the user to learn a search engine specific syntax, the same formats as appear in actual documents are used to write queries. The search then finds all indexed ranges, with the same standardized units, that intersect with the user’s query range.
It is also important to know what a quantity is referring to. For specific cases, such as alloy compositions, this is captured during the text-mining e.g. 2 wt% Fe, refers specifically to a weight percentage of iron. For the general case, we now allow searching for quantities in close proximity to arbitrary phrases, or even other quantities. The tool will also facilitate the user by showing the context of where their query matched as well as allowing combining of quantity queries with metadata queries.