This document discusses running the InChI chemical structure identifier program using WebAssembly. It describes how the 165,000 lines of C code that make up InChI were compiled to WebAssembly using LLVM. This allows InChI to run in any modern web browser or serverless environment. Benchmark results show the WebAssembly version of InChI achieves native-like performance while identifying over 100,000 chemical structures from SureChEMBL in under 30 seconds with the same outputs as the native version. The project provides a template for deploying other legacy scientific programs written in C/C++/Fortran to run in the browser and serverless environments using WebAssembly.
2. • Unique Key: Does this database have this molecule?
• Foreign Key: What other databases have this molecule?
• Permissionless (cf. CAS)
The InChI Value Proposition
6. The Most Important App
https://www.wsj.com/articles/your-browser-is-the-most-important-app-you-havemake-sure-you-use-the-right-one-1537707600
7. •Browser debut in 1994
•Universal browser support
•Pretty fast
JavaScript
8. InChI to JavaScript
Rewrite
160,000 lines
of C to JS.
Transpile
160,000 lines
of C to JS.
{ }
InChI
Source
https://github.com/metamolecular/inchi-js
9. •Ad hoc solution
•Limited tooling
•Limited scope
•Gobs of glue code
•Not built for speed
JavaScript as Assembly Language
{ }
C Source
10. • Binary instruction format
• Fast, portable compile target
• Sandboxed for security
• Runs in all browsers, 2017
• W3C Standard, 2019
• Runs InChI?
WebAssembly (Wasm)
11. • Minimal tooling
• No auto-generated glue code
• Use verbatim InChI source
• In other words, a build system
Goals
InChI Wasm
compile
{ } 1010100111…
18. Data Schlepping in JS
Inputs: molfile and options
Byte wranglers
Translate input
} Write output
Call InChI Wrapper
Return result
}
19. Performance
“WebAssembly aims to execute at native
speed by taking advantage of common
hardware capabilities available on a wide
range of platforms.”
- webassembly.org
25. • Legacy code written in C, C++, or FORTRAN
• Incentive for in-browser and/or serverless deployment
• Compile to Wasm with LLVM
• User Interface in HTML5
A Template for Future Work
https://metamolecular.com
Editor's Notes
Increasing the scope and reach of InChI
What’s InChI’s irreducible value proposition?
Answers two questions
Permissions interfere
One challenge: the code itself
C has a long history, and for most of software deployment hasn’t changed much.
This model excludes some very important platforms.
…
Often the solution is to run the native code on a server, but that creates a permission relationship.
Thought experiment.
As of midnight tonight I’m going to delete every app from all your devices except one.
Which one will you keep?
Major factor contributing to the success of the Web browser
Items
Re-cast InChI source as JavaScript?
Two conceptual paths
Rewrite
Transpile
Emscripten works, but it leaves some important things to be desired
Recognizing these limitations, browser vendors worked together to deliver a new way to run software in Web browsers.
To answer this question, I started with some goals.
After several months of part-time work, success
Items
LLVM - suite of compiler and toolchain technologies
Widely-supported and used
Wasm support built in
Zooming out, this is how an InChI-Wasm application might be structured.
At the top…
InChI-Wasm project contains an HTML test page, shown here
A more user-friendly test page is available at chemwriter.com slash inch
Important component in the project is the wrapper.
Small bit of code written in C
Purpose: expose a valid Wasm interface. …
Orchestrating the compilation of InChI to Wasm is the build script, most of which is shown here.
Ordinary shell script that calls clang, the LLVM compiler.
Running on the browser side is a small amount of code that talks to the InChI Wasm instance.
Vanilla JavaScript
main functionality expressed in the molfileToInChI function. …
Performance is always an important topic
Indeed, the WebAssembly documentation itself makes a very bold claim.
To judge the performance of InChI-Wasm, built a benchmark
Within 2x of native
Pretty good!
Take a few steps back and talk about where I think projects like InChI-Wasm fit into chemistry.
There’s this tendency to view the Web browser as a dumb data terminal.
I think this sells the platform short.
For a good example of this approach, consider the Wikipedia Chemical Structure Explorer.
Structure search works offline.
Many have used Jupiter
Pyodide
Python Notebook, including NumPy, Pandas, Matplotlib and parts of SciPy
All running in a browser w/ Wasm
Wasm steadily working its way into popular programming languages.
All of the ones here have Wasm runtimes, can be compiled to Wasm, or both.
This solves a thorny integration
Native binaries create platform-specific problems
To conclude, I’d like to propose a template for further work in this area.