Headstart takes reproducibility very seriously.
Our system needs to be fully auditable: our “match score” is a crucial element for candidate selection. At any point in time we need to be able to:
- Access the models that were being used in production when the match score was computed;
- Examine their code (including all upstream ETL/preprocessing pipelines);
- Examine the data they were trained on;
- Be able to deserialize the models and run diagnostics/tests on them.
To support our requirements, we developed our own internal model versioning system using Git, Docker, CircleCI, AWS S3 and Pipenv.
This presentation will share the design, implementation and functionalities of our versioning system, with a detailed walkthrough using our skill recommendation engine as a streamlined running example.