ODSC West 2022 – Kitbashing in ML

Kit-bashing in ML:
The age of toolbox Machine Learning
ODSC West - Nov 3, 2022
Dr. Bryan Bischof
– Head of Data Science @ Weights and Biases –
1
In collaboration with Dr. Eric Bunch & Ashraf Shaik
Email: bryan.bischof@gmail.com

Deﬁnition
Kit-bashing, or model-bashing, is taking parts of kits to create a
new kit. It can be to increase complexity of the model (greebling),
or to build the model into a new
form expressively and quickly.
Two key aspects of kit-bashing:
- Rapidity
- Nuance
3
c.f. Kitbashing Experience, 2021, Kitbashing in the digital age, 2020

Mere aggregation
These sculptural bricolage
often encode meaning and
context in the relationship
between components to
demonstrate a striking
example of Aristotleʼs:
“something besides its parts”
4
c.f. Nathalie Miebach

In Machine Learning
Letʼs return to our friend, Compositionality – determined by the
meanings of its constituent parts and the rules for how those parts
are combined.
Is Kit-bashing just composition?
No.
5
c.f. Composition in ML, ODSC West, 2021, Fong, Spivak, 2018

Operationalizing Kit-bashing
6

What can we do with this analogy?
7
“The art of the possible”
🤝
“How far can we go on our current gas tank”
We are starting to move towards a paradigm of Machine
Learning products where understanding the existing
resources and clever way to combine them outstrips the
ability to build from scratch.

Universal Foundations
8
The Center for Research on Foundation Models (CRFM) group
argues in this book for the power of Foundation Models via
homogenization in other words, the trending of disparate
domains towards the same information learning structures.
This is risk as it restricts the broadness of approach and
diversity in progress. This is opportunity because of the speed
of iteration and the combinatorial explosion of ways to
combine things.

So how does one kit-bash?
9
1. Track and monitor your experiments
2. Build reproducible pipelines
3. Establish hard contracts
4. Automate integrations
5. Experiment like hell
6. Make the tent bigger

10
1. Track and monitor your experiments
You can and should track more of your experiments.
Are you tracking your feature selection? How about your
regularization eﬀorts? Are you tracking you ensembling? Did you try
ﬁne-tuning? How did you compare to the base model?...

11
2. Build reproducible pipelines
If you canʼt tie back models and results to
training and evaluation data, what are you doing?
When you build a pipeline, each run should be a version of the assets
along the way; in many Data Science domains this isnʼt optional.

3. Establish hard contracts
Hard contracts and strong composition
You donʼt have to buy the farm and go fully functional, but you
should expect that the components in your system have input
and output types. You should be able to pass parameter updates
between them; and ideally train them jointly. 12

13
4. Automate integrations
Trying a new model architecture should be plug-and-play
Iteration speed is going to be extremely limited if youʼre not
automating integration; validation on blessed hold-outs and worst
case tests should ✨ just happen ✨.

14
5. Experiment like hell
Seriously, just try shit.
Thatʼs the beauty of the previous steps! New model types, new
data sources, maybe a side task? Label some additional data out of
distribution? Wanna include a new feature pipeline? These are the
salad days.

6. Make the tent bigger
Internal task challenges
The more people who can try their hand at solving these problems,
the higher the likelihood someone will. Make it easy to get started,
even for people with less ML experience. Some people will come
with ideas on other aspects of the problem, welcome them too! 15

Ok, letʼs get a bit more explicit
16

FC RecSys
18
Kit-bashing a recommender
system and personalized search
for the blogging platform Fully
Connected.
wandb.ai/fully-connected

FC RecSys
19
User-Document
Interaction data
Recommender
System
Offline evaluation:
Tag classification
task (cold start)
W&B Artifacts
Document
Dataset
Document
Embeddings
Model Training
W&B Models
User feedback

FC RecSys Deep Dive: Trigger based validation
20
Based on triggers that a new data artifact is up:
● Use hand-labeled tags for ground truth
● Fit a multiclass, multilabel KNN classiﬁer
to choose the initial document
embedding for the RecSys.
○ This mimics the behavior of the
vector similarity-search architecture
chosen for the RecSys in prod.
● Rank by multi-objective loss on these
classiﬁers; automatically promote.
● Generate UMAP projections of vectors.
● Send the promoted latent space over for
user-feedback training.

22
Secure inference architecture

Key components of deployment
1. Github actions trigger a deploy to the designed infrastructure.
2. Secure perimeter to restrict API access from outside and
mitigate the risk of data exﬁltration.
3. Prediction service on Cloud Run.
4. Streamlit UI App for debugging; internal service integration for
frontend
23

24
● FastAPI - HTTP inference endpoint route
● Build the serving image as a docker container, pushed to GCR (Google container registry)
● Cloud Run - Attach compute by selecting instance and traﬀic based auto scaling configuration.
● Optionally and internal service that talks to the inference endpoint. E.g. a Streamlit App on CloudRun or
your main app service.
● Secure with VPC service control
○ Allows configuring secure perimeter rules that restricts API access outside and mitigate the risk of
data exfiltration.
○ Setup a Serverless VPC Access connector
○ Setup Ingress policy
○ Allow internal and cloud load balancing requests into the service.
○ Setup egress policy as Allow all, allowing all traﬀic to go through the VPC firewall.
● Set up API Gateway
● Streamlit service requests to inference service have to authorize via an ID token in the request.
● Github actions - for CI/CD of both model training on dagster + FastAPI service on cloud run
Production Inference (GCP + W&B)

Steal this look
25
This isnʼt my ﬁrst time
kit-bashing…
If you want to map arbitrary UGC
to personalized recommendations,
grab some kits and some super
glue!

Whatʼs available?
26
We already had a Match-score
model, that we used for core
recommendations.
And we had a computer vision
model for featurization.
So we used the CV model to do
item-similarity, and ﬁne-tuned the
match score model to the new recs.

Things can go wrong
29
1. Make sure that your experimentation practice is strong: global holdouts,
published validation artifacts, random re-testing, and no peeking can decrease
the risk of bridges to nowhere (long term type 1 error)
2. There should be a team who ultimately signs oﬀ on all experimental design.
3. Weak-composition should be avoided or strengthened to strong-composition;
make them contracts hard, and always try joint learning
4. Experiment documentation for ML experiments and development is crucial to
avoid duplication or mis-reporting of results.
5. Make comparison to SOTA (internal) easy, otherwise people wonʼt bother
6. Partnership should be upheld as a core value, and preferred to 🍠YAM (yet
another model) developed independently

Ya, ya, I work at a tools company
31
Weʼre building the tools to make all of this easy;
- The tools for reproducible pipelines and experiment tracking
- Sharing your results with teammates
- Logging tables artifacts and models to the registry
- Automated job triggering with retraining and parameter exploration
But most excitingly, an ecosystem of arbitrary Machine Learning functions coming
next year.
Hope youʼve still got room in your garage for a new toolbox.

Thanks!
Iʼm Bryan Bischof, ﬁnd me on
Twitter @bebischof
Look out for my forthcoming
book from Oʼreilly next year
about recommendation systems:
32

Thanks!
Check out W&Bʼs composable tools at:
Wandb.ai
Totally free for individuals & academics.
Come chat with us at our booth, or email contact@wandb.ai. 33

ODSC West 2022 – Kitbashing in ML

Recommended

Recommended

More Related Content

Similar to ODSC West 2022 – Kitbashing in ML

Similar to ODSC West 2022 – Kitbashing in ML (20)

Recently uploaded

Recently uploaded (20)

ODSC West 2022 – Kitbashing in ML