The Critical Missing Component in the Production ML Stack

The critical missing
component in the
production ML stack
Alessya Visnjic

CEO, WhyLabs.ai

Agenda:

▪ ML Stack: what is missing?

▪ How to design data logging

▪ whylogs: open standard for data logging

▪ Use cases

▪ Q&A

ML models are full of
surprises…
… every surprise will
launch a debugging
expedition!

ML models routinely struggle in the wild…

ML Stack: moving massive volumes of data

Test

Monitor

Debug

Document
How do you data?

ML Stack is missing data & metadata logging
Log metadata &
statistical properties
of data

Good data log
should capture:
Metadata

Counts

Statistics

Distributions

Stratified samples

Key properties

of a data log:
Lightweight

Portable

Mergeable

Configurable

Close to code

whylogs:

logging for
the ML stack bit.ly/whylogs

whylogs: a standard format for
representing a snapshot of data

bit.ly/whylogs

Profile and log a dataframe
bit.ly/whylogs

Profile, log, and track with
bit.ly/whylogs

Feature name count max min stddev nunique null_count quantile_0.0000 … quantile_1.0000
chlorides 1199.0 0.611 0.012 0.044 134.0 0.0 0.012 … 0.611
quality 1199 8.000 3.000 0.785 6.0 0.0 3.000 … 8.000
alcohol 1199 14.900 8.400 1.060 65.0 0.0 8.400 … 14.900
density 1199 1.004 0.997 0.001 390.0 0.0 0.990 … 1.004
pH 1199 4.010 2.890 0.153 82.0 0.0 2.890 … 4.010
Log rich statistics for each feature
Each data log captures summary statistics, counters, distributions, metadata and custom metrics
Sample of a flattened data log captured by whylogs on the Wine Quality dataset

Track data statistics across batches
Distribution plot for one of the columns in the model input, collected at inference time

Distribution of “free sulfur dioxide” feature over 20 inference batches of the Wine Quality model

Dataset Size # of entries # of features Memory consumption Output size
Lending Club 1.6G 2.2M 151 14MB 7.4MB
NYC Tickets 1.9G 10.8 43 14MB 2.3MB
Pain pills 75GB 178M 42 15MB 2MB
Run data logging without overhead
Using streaming algorithms to capture data statistics, whylogs ensures a constant memory footprint,
scales with the number of features in the dataframe, and outputs lightweight log files (json, protobuf, etc).
Sample of whylogs benchmarks on public datasets

Whylogs profiles 100% of the data to accurately capture distributions. Calculating distributions from randomly sampled data is significantly
less accurate. The chart presents median error for distributions estimated with whylogs vs. random sampling techniques.
0
0.1
0.2
0.2
0.3
0.4
Normal Normal discrete Normal outlier Uniform discrete Uniform Pareto
Profiling Sampling
Capture accurate data distributions

Test

Monitor

Debug

Document
You can
data with

whylogs!

Whylogs captures mergeable histograms for each feature. To catch distribution drift,
continuously compare training distribution of a feature to the serving distribution.
0
200
400
600
800
1000
Training Serving
Use case: training-serving distribution drift

Logging enables all key MLOps activities
Once data is logged systematically, whylogs outputs can be used to test, monitor, and debug data.
Use whylogs at any point of the ML stack and through the lifecycle of the ML application.

alessya@whylabs.ai

@zalessya

bit.ly/whylogs
Help build the open
standard for data
logging!

The Critical Missing Component in the Production ML Stack

More Related Content

What's hot

More from Databricks

Recently uploaded

The Critical Missing Component in the Production ML Stack