The critical missing
component in the
production ML stack
Alessya Visnjic


CEO, WhyLabs.ai
Agenda:


▪ ML Stack: what is missing?


▪ How to design data logging


▪ whylogs: open standard for data logging


▪ Use cases


▪ Q&A
ML models are full of
surprises…
… every surprise will
launch a debugging
expedition!
ML models routinely struggle in the wild…
ML Stack: moving massive volumes of data
Test


Monitor


Debug


Document
How do you data?
ML Stack is missing data & metadata logging
Log metadata &
statistical properties
of data
Good data log
should capture:
Metadata


Counts


Statistics


Distributions


Stratified samples
Key properties


of a data log:
Lightweight


Portable


Mergeable


Configurable


Close to code
whylogs:


logging for
the ML stack bit.ly/whylogs
whylogs: a standard format for
representing a snapshot of data


bit.ly/whylogs
Profile and log a dataframe
bit.ly/whylogs
Profile, log, and track with
bit.ly/whylogs
Feature name count max min stddev nunique null_count quantile_0.0000 … quantile_1.0000
chlorides 1199.0 0.611 0.012 0.044 134.0 0.0 0.012 … 0.611
quality 1199 8.000 3.000 0.785 6.0 0.0 3.000 … 8.000
alcohol 1199 14.900 8.400 1.060 65.0 0.0 8.400 … 14.900
density 1199 1.004 0.997 0.001 390.0 0.0 0.990 … 1.004
pH 1199 4.010 2.890 0.153 82.0 0.0 2.890 … 4.010
Log rich statistics for each feature
Each data log captures summary statistics, counters, distributions, metadata and custom metrics
Sample of a flattened data log captured by whylogs on the Wine Quality dataset
Track data statistics across batches
Distribution plot for one of the columns in the model input, collected at inference time


Distribution of “free sulfur dioxide” feature over 20 inference batches of the Wine Quality model
Dataset Size # of entries # of features Memory consumption Output size
Lending Club 1.6G 2.2M 151 14MB 7.4MB
NYC Tickets 1.9G 10.8 43 14MB 2.3MB
Pain pills 75GB 178M 42 15MB 2MB
Run data logging without overhead
Using streaming algorithms to capture data statistics, whylogs ensures a constant memory footprint,
scales with the number of features in the dataframe, and outputs lightweight log files (json, protobuf, etc).
Sample of whylogs benchmarks on public datasets
Whylogs profiles 100% of the data to accurately capture distributions. Calculating distributions from randomly sampled data is significantly
less accurate. The chart presents median error for distributions estimated with whylogs vs. random sampling techniques.
0
0.1
0.2
0.2
0.3
0.4
Normal Normal discrete Normal outlier Uniform discrete Uniform Pareto
Profiling Sampling
Capture accurate data distributions
Test


Monitor


Debug


Document
You can
data with


whylogs!
Whylogs captures mergeable histograms for each feature. To catch distribution drift,
continuously compare training distribution of a feature to the serving distribution.
0
200
400
600
800
1000
Training Serving
Use case: training-serving distribution drift
Logging enables all key MLOps activities
Once data is logged systematically, whylogs outputs can be used to test, monitor, and debug data.
Use whylogs at any point of the ML stack and through the lifecycle of the ML application.
alessya@whylabs.ai


@zalessya


bit.ly/whylogs
Help build the open
standard for data
logging!

The Critical Missing Component in the Production ML Stack

  • 1.
    The critical missing componentin the production ML stack Alessya Visnjic CEO, WhyLabs.ai
  • 2.
    Agenda: ▪ ML Stack:what is missing? ▪ How to design data logging ▪ whylogs: open standard for data logging ▪ Use cases ▪ Q&A
  • 3.
    ML models arefull of surprises… … every surprise will launch a debugging expedition!
  • 4.
    ML models routinelystruggle in the wild…
  • 5.
    ML Stack: movingmassive volumes of data
  • 6.
  • 7.
    ML Stack ismissing data & metadata logging Log metadata & statistical properties of data
  • 8.
    Good data log shouldcapture: Metadata Counts Statistics Distributions Stratified samples
  • 9.
    Key properties of adata log: Lightweight Portable Mergeable Configurable Close to code
  • 10.
    whylogs: logging for the MLstack bit.ly/whylogs
  • 11.
    whylogs: a standardformat for representing a snapshot of data bit.ly/whylogs
  • 12.
    Profile and loga dataframe bit.ly/whylogs
  • 13.
    Profile, log, andtrack with bit.ly/whylogs
  • 14.
    Feature name countmax min stddev nunique null_count quantile_0.0000 … quantile_1.0000 chlorides 1199.0 0.611 0.012 0.044 134.0 0.0 0.012 … 0.611 quality 1199 8.000 3.000 0.785 6.0 0.0 3.000 … 8.000 alcohol 1199 14.900 8.400 1.060 65.0 0.0 8.400 … 14.900 density 1199 1.004 0.997 0.001 390.0 0.0 0.990 … 1.004 pH 1199 4.010 2.890 0.153 82.0 0.0 2.890 … 4.010 Log rich statistics for each feature Each data log captures summary statistics, counters, distributions, metadata and custom metrics Sample of a flattened data log captured by whylogs on the Wine Quality dataset
  • 15.
    Track data statisticsacross batches Distribution plot for one of the columns in the model input, collected at inference time Distribution of “free sulfur dioxide” feature over 20 inference batches of the Wine Quality model
  • 16.
    Dataset Size #of entries # of features Memory consumption Output size Lending Club 1.6G 2.2M 151 14MB 7.4MB NYC Tickets 1.9G 10.8 43 14MB 2.3MB Pain pills 75GB 178M 42 15MB 2MB Run data logging without overhead Using streaming algorithms to capture data statistics, whylogs ensures a constant memory footprint, scales with the number of features in the dataframe, and outputs lightweight log files (json, protobuf, etc). Sample of whylogs benchmarks on public datasets
  • 17.
    Whylogs profiles 100%of the data to accurately capture distributions. Calculating distributions from randomly sampled data is significantly less accurate. The chart presents median error for distributions estimated with whylogs vs. random sampling techniques. 0 0.1 0.2 0.2 0.3 0.4 Normal Normal discrete Normal outlier Uniform discrete Uniform Pareto Profiling Sampling Capture accurate data distributions
  • 18.
  • 19.
    Whylogs captures mergeablehistograms for each feature. To catch distribution drift, continuously compare training distribution of a feature to the serving distribution. 0 200 400 600 800 1000 Training Serving Use case: training-serving distribution drift
  • 20.
    Logging enables allkey MLOps activities Once data is logged systematically, whylogs outputs can be used to test, monitor, and debug data. Use whylogs at any point of the ML stack and through the lifecycle of the ML application.
  • 21.