SlideShare a Scribd company logo
calculation | consulting
weightwatcher - a diagnostic tool
for deep neural networks
(TM)
c|c
(TM)
charles@calculationconsulting.com
calculation|consulting
weightwatcher - a diagnostic tool
for deep neural networks
(TM)
charles@calculationconsulting.com
calculation | consulting why deep learning works
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago, Chemical Physics
NSF Fellow in Theoretical Chemistry, UIUC
Over 15 years experience in applied Machine Learning and AI
ML algos for: Aardvark, acquired by Google (2010)
Demand Media (eHow); first $1B IPO since Google
Wall Street: BlackRock
Fortune 500: Roche, France Telecom
BigTech: eBay, Aardvark (Google), GoDaddy
Private Equity: Griffin Advisors
Alt. Energy: Anthropocene Institute (Page Family)
www.calculationconsulting.com
charles@calculationconsulting.com
(TM)
3
calculation | consulting why deep learning works
c|c
(TM)
(TM)
4
Michael W. Mahoney
ICSI, RISELab, Dept. of Statistics UC Berkeley
Algorithmic and statistical aspects of modern large-scale data analysis.
large-scale machine learning | randomized linear algebra
geometric network analysis | scalable implicit regularization
PhD, Yale University, computational chemical physics
SAMSI National Advisory Committee
NRC Committee on the Analysis of Massive Data
Simons Institute Fall 2013 and 2018 program on the Foundations of Data
Biennial MMDS Workshops on Algorithms for Modern Massive Data Sets
NSF/TRIPODS-funded Foundations of Data Analysis Institute at UC Berkeley
https://www.stat.berkeley.edu/~mmahoney/
mmahoney@stat.berkeley.edu
Who Are We?
c|c
(TM)
(TM)
5
calculation | consulting why deep learning works
Understanding deep learning requires rethinking generalization
Motivations: WeightWatcher Theory
The weightwatcher theory is a Semi-Empirical theory based on:

the Statistical Mechanics of Generalization,
Random MatrixTheory, and
the theory of Strongly Correlated Systems
Great nerdy stuff, however, I will be discussing the weightwatcher tool
and what it can do for you
c|c
(TM)
(TM)
6
calculation | consulting why deep learning works
Open source tool: weightwatcher
c|c
(TM)
(TM)
7
calculation | consulting why deep learning works
Open source tool: weightwatcher
pip install weightwatcher
…
import weightwatcher as ww
watcher = ww.WeightWatcher(model=model)
results = watcher.analyze()
watcher.get_summary()
watcher.print_results()
https://github.com/CalculatedContent/WeightWatcher
c|c
(TM)
WeightWatcher: A diagnostic tool
(TM)
8
calculation | consulting why deep learning works
Analyze pre/trained pyTorch, TF/Keras, and ONNX models
Inspect models that are difficult to train
Gauge improvements in model performance
Predict test accuracies across different models
Detect problems when compressing or fine-tuning pretrained models
pip install weightwatcher
c|c
(TM)
Research: Implicit Self-Regularization in Deep Learning
(TM)
9
calculation | consulting why deep learning works
• Implicit Self-Regularization in Deep Neural Networks: Evidence from
Random Matrix Theory and Implications for Learning.
• Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large
Pre-Trained Deep Neural Networks
• Workshop: Statistical Mechanics Methods for Discovering Knowledge from
Production-Scale Neural Networks
• Predicting trends in the quality of state-of-the-art neural networks without
access to training or testing data
• (more submitted and in progress today)
(JMLR 2021)
(ICML 2019)
(KDD 2020)
(Nature Communications 2021)
Selected publications
c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
10
calculation | consulting why deep learning works
The tail of the ESD contains the information
c|c
(TM)
(TM)
11
calculation | consulting why deep learning works
ESD of DNNs: detailed insight into W
Empirical Spectral Density (ESD: eigenvalues of X)
import keras
import numpy as np
import matplotlib.pyplot as plt
…
W = model.layers[i].get_weights()[0]
N,M = W.shape()
…
X = np.dot(W.T, W)/N
evals = np.linalg.eigvals(X)
plt.hist(evals, bin=100, density=True)
c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
12
calculation | consulting why deep learning works
details_df = watcher.analyze(model=your_model)
The tool provides various plots, quality metrics, and transforms
c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
13
calculation | consulting why deep learning works
watcher.analyze(…, plot=True, …)
c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
14
calculation | consulting why deep learning works
Well trained laters are heavy-tailed and well shaped
GPT-2 Fits a Power Law
(or Truncated Power Law)
alpha in [2, 6]
watcher.analyze(plot=True)
Good quality of fit (D is small)
c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
15
calculation | consulting why deep learning works
Better trained laters are more heavy-tailed and better shaped
GPT GPT-2
c|c
(TM)
(TM)
16
calculation | consulting why deep learning works
Heavy Tailed Metrics: GPT vs GPT2
The original GPT is poorly trained on purpose; GPT2 is well trained
alpha for every layer

smaller alpha is better
large alpha are bad fits
c|c
(TM)
(TM)
17
calculation | consulting why deep learning works
Power Law Universality: ImageNet
All ImageNet models display remarkable Heavy Tailed Universality
500 matrices
~50 architectures
Linear layers &
Conv2D feature maps
80-90% < 4
c|c
(TM)
(TM)
18
calculation | consulting why deep learning works
Random Matrix Theory: detailed insight into WL
DNN training induces breakdown of Gaussian random structure
and the onset of a new kind of heavy tailed self-regularization
Gaussian
random
matrix
Bulk+
Spikes
Heavy
Tailed
Small, older NNs
Large, modern DNNs
and/or
Small batch sizes
c|c
(TM)
(TM)
19
calculation | consulting why deep learning works
Random Matrix Theory: Marcenko Pastur
plus Tracy-Widom fluctuations
very crisp edges
Q
RMT says if W is a simple random Gaussian matrix,
then the ESD will have a very simple , known form
Shape depends on Q=N/M
(and variance ~ 1)
Eigenvalues tightly bounded
a few spikes may appear
c|c
(TM)
(TM)
20
calculation | consulting why deep learning works
RMT: AlexNet
Marchenko-Pastur Bulk-decay | Heavy Tailed
FC1
zoomed in
FC2
zoomed in
c|c
(TM)
(TM)
21
calculation | consulting why deep learning works
Random Matrix Theory: Heavy Tailed
But if W is heavy tailed, the ESD will also have heavy tails
(i.e. its all spikes, bulk vanishes)
If W is strongly correlated , then the ESD can be modeled as if W is drawn
from a heavy tailed distribution
Nearly all pre-trained DNNs display heavy tails…as shall soon see
c|c
(TM)
(TM)
22
calculation | consulting why deep learning works
AlexNet, 

VGG,
ResNet,
Inception,
DenseNet,
…
Heavy Tailed RMT: Scale Free ESD
All large, well trained, modern DNNs exhibit heavy tailed self-regularization
scale free
c|c
(TM)
(TM)
23
calculation | consulting why deep learning works
HT-SR Theory: 5+1 Phases of Training
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin, Michael W. Mahoney; JMLR 22(165):1−73, 2021.
c|c
(TM)
(TM)
24
calculation | consulting why deep learning works
Heavy Tailed RMT: Universality Classes
The familiar Wigner/MP Gaussian class is not the only Universality class in RMT
c|c
(TM)
WeightWatcher: predict trends in generalization
(TM)
25
calculation | consulting why deep learning works
Predict test accuracies across variations in hyper-parameters
The average Power Law exponent alpha
predicts generalization—at fixed depth


Smaller average-alpha is better
Better models are easier to treat
c|c
(TM)
WeightWatcher: Shape vs Scale metrics
(TM)
26
calculation | consulting why deep learning works
Purely norm-based (scale) metrics (from SLT) can be correlated with depth
but anti-correlated with hyper-parameter changes
c|c
(TM)
WeightWatcher: treat architecture changes
(TM)
27
calculation | consulting why deep learning works
Predict test accuracies across variations in hyper-parameters and depth
The alpha-hat metric combines
shape and scale metrics
and corrects
for different depths (grey line)
can be derived from theory…
c|c
(TM)
WeightWatcher: predict test accuracies
(TM)
28
calculation | consulting why deep learning works
alpha-hat works for 100s of different CV and NLP models
(Nature Communications 2021)
We do not have access to
The training or test data
But we can still predict
trends in the generalization
c|c
(TM)
(TM)
29
calculation | consulting why deep learning works
Predicting test accuracies: Heavy-tailed shape metrics
The heavy tailed metrics perform best
Weighted Alpha Alpha (Shatten) Norm
c|c
(TM)
WeightWatcher: predict test accuracies
(TM)
30
calculation | consulting why deep learning works
ResNet, DenseNet, etc.
(Nature Communications 2021)
c|c
(TM)
(TM)
31
calculation | consulting why deep learning works
c|c
(TM)
(TM)
32
calculation | consulting why deep learning works
Experiments: just apply to pre-trained Models
LeNet5 (1998)
AlexNet (2012)
InceptionV3 (2014)
ResNet (2015)
…
DenseNet201 (2018)
https://medium.com/@siddharthdas_32104/
cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5
Conv2D MaxPool Conv2D MaxPool FC FC
c|c
(TM)
(TM)
33
calculation | consulting why deep learning works
Predicting test accuracies: 100 pretrained models
The heavy tailed metrics perform best
https://github.com/osmr/imgclsmob
From an open source sandbox of
nearly 500 pretrained CV models
(picked >=5 models / regression)
c|c
(TM)
(TM)
34
calculation | consulting why deep learning works
Correlation Flow: CV Models
We can study correlation flow looking at vs. depth
VGG ResNet DenseNet
c|c
(TM)
WeightWatcher: detect overfitting
(TM)
35
calculation | consulting why deep learning works
Provide a data-independent criteria for early-stopping
c|c
(TM)
WeightWatcher: global and local convexity metrics
(TM)
36
calculation | consulting why deep learning works
Smaller alpha corresponds to more convex energy landscapes
Transformers (alpha ~ 3-4 or more)
alpha 2-3 (or less)
Rational Decisions, Random Matrices and Spin Glasses" (1998)
by Galluccio, Bouchaud, and Potters:
c|c
(TM)
WeightWatcher: global and local convexity metrics
(TM)
37
calculation | consulting why deep learning works
When the layer alpha < 2, we think this means the layer is overfit
We suspect that the early layers
of some Convolutional Nets
may be slightly overtrained
Some alpha < 2
This is predicted from our HTSR theory
c|c
(TM)
WeightWatcher: scale and shape anomalies
(TM)
38
calculation | consulting why deep learning works
We can detect problems in layers not detectable otherwise
c|c
(TM)
(TM)
39
calculation | consulting why deep learning works
Detect potential signatures of over-fitting
WeightWatcher: Correlation Traps
watcher.analyze(plot=True, randomize=True)
c|c
(TM)
WeightWatcher: SVDSharpness transform
(TM)
40
calculation | consulting why deep learning works
Remove potential signatures of over-fitting
Like PAC-bounds Sharpness transform
Clips bad elements in W using RMT
clip
smoothed_model =
watcher.SVDSharpness(model=your_model)
c|c
(TM)
WeightWatcher: RMT-based shape metrics
(TM)
41
calculation | consulting why deep learning works
ww also includes predictive, non-parametric shape metrics
rand_distance =
jensen_shannon_distance(original_esd, random_esd)
c|c
(TM)
WeightWatcher: RMT-based shape metrics
(TM)
42
calculation | consulting why deep learning works
the layer rand_distance and alpha metrics are correlated
c|c
(TM)
WeightWatcher: interpreting shapes
(TM)
43
calculation | consulting why deep learning works
very high accuracy requires advanced methods
hard
easy
X
:)
c|c
(TM)
WeightWatcher: more Power Law shape metrics
(TM)
44
calculation | consulting why deep learning works
watcher.analyze(…, fit=‘TPL’)
Truncated Power Law fits
fit=‘E_TPL’)
weightwatcher provides several shape (and scale) metrics
plus several more unpublished experimental options
c|c
(TM)
WeightWatcher: E_TPL shape metric
(TM)
45
calculation | consulting why deep learning works
the E_TPL (and rand_distance)
shape metrics track the
learning curve epoch-by-epoch
Training MT transformers
from scratch to SOTA
Extended Truncated Power Law
highly accurate results leverage the advanced shape metrics
Here, (Lambda) is the shape metric
c|c
(TM)
WeightWatcher: why Power Law fits ?
(TM)
46
calculation | consulting why deep learning works
Spiking (i.e real) neurons exhibit power law behavior
weightwatcher supports several PL fits
from experimental neuroscience
plus totally new shape metrics
we have invented (and published)
c|c
(TM)
WeightWatcher: why Power Law fits ?
(TM)
47
calculation | consulting why deep learning works
Spiking (i.e real) neurons exhibit (truncated) power law behavior
The Critical Brain Hypothesis
Evidence of Self-Organized Criticality (SOC)
Per Bak (How Nature Works)
As neural systems become more complex
they exhibit power law behavior
and then truncated power law behavior
We see exactly this behavior in DNNs
and it is predictive of learning capacity
c|c
(TM)
WeightWatcher: open-source, open-science
(TM)
48
calculation | consulting why deep learning works
We are looking for early adopters and collaborators
github.com/CalculatedContent/WeightWatcher
We have a Slack channel to support the tool
Please file issues
Ping me to join
c|c
(TM)
(TM)
49
calculation | consulting why deep learning works
Statistical Mechanics
derivation of the alpha-hat metric
c|c
(TM)
(TM)
50
calculation | consulting why deep learning works
Classic Set Up: Student-Teacher model
Statistical Mechanics of Learning Engle &Van den Broeck (2001)
Generalization error ~ phase space volume
Average error ~ overlap between T and J
c|c
(TM)
(TM)
51
calculation | consulting why deep learning works
Classic Set Up: Student-Teacher model
Statistical Mechanics of Learning Engle &Van den Broeck (2001)
Standard approach:
• Teacher (T) and Student (J) random Perceptron vectors
• Treat data as an external random Gaussian field
• Apply Hubbard–Stratonovich to get mean-field result
• Assume continuous or discrete J
• Solve for as a function of load (# data points / # parameters)
c|c
(TM)
(TM)
52
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
x
Continuous perceptron
Ising Perceptron
Uninteresting
Replica theory, shows phase behavior,
entropy collapse, etc
c|c
(TM)
(TM)
53
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
“Towards a new theory…” Martin, Milletari, & Mahoney (in preparation)
real DNN matrices:
NxM
Strongly correlated
Heavy tailed
correlation matrices
Solve for total integrated phase-space volume
c|c
(TM)
(TM)
54
calculation | consulting why deep learning works
New approach: HCIZ Matrix Integrals
Write the overlap as
Fix a Teacher. The integral is now over all random students J that overlap w/T
Use the following result in RMT
“Asymptotics of HCZI integrals …” Tanaka (2008)
c|c
(TM)
(TM)
55
calculation | consulting why deep learning works
RMT: Annealed vs Quenched averages
“A First Course in Random Matrix Theory” Potters and Bouchaud (2020)
good outside spin-glass phases
where system is trained well
We imagine averaging over all (random) students DNNs
with (correlations that) look like the teacher DNN
c|c
(TM)
(TM)
56
calculation | consulting why deep learning works
New interpretation: HCIZ Matrix Integrals
Generating functional
R-Transform (inverse Green’s function, via Contour Integral)
in terms of Teacher's eigenvalues , and Student’s cumulants
c|c
(TM)
(TM)
57
calculation | consulting why deep learning works
Some basic RMT: Greens functions
The Green’s function is the Stieltjes transform of eigenvalue distribution
Given the empirical spectral density (average eigenvalue density)
and using:
c|c
(TM)
(TM)
58
calculation | consulting why deep learning works
Some basic RMT: Moment generating functions
The Green’s function has poles
at the actual eigenvalues
But is analytic in the complex plane
up and away from the real axis
z
Expand in a series around z = ∞
moment generating function
c|c
(TM)
(TM)
59
calculation | consulting why deep learning works
Some basic RMT: R-Transforms
Which gives a (similar) moment generating function
The free-cumulant-generating function (R-transform)
is related to the Green function as
Gaussian random and very Heavy Tailed (Levy) random matrices
but which takes a simple form for both
c|c
(TM)
(TM)
60
calculation | consulting why deep learning works
Results: Gaussian Random Weight Matrices
“Random Matrix Theory (book)” Bouchaud and Potters (2020)
Recover the Frobenius Norm (squared) as the metric
c|c
(TM)
(TM)
61
calculation | consulting why deep learning works
Results: (very) Heavy Tailed Weight Matrices
“Heavy-tailed random matrices” Burda and Jukiewicz (2009)
Recover a Shatten Norm, in terms of the Heavy Tailed exponent
c|c
(TM)
(TM)
62
calculation | consulting why deep learning works
Application to: Heavy Tailed Weight Matrices
Some reasonable approximations give the weighted alpha metric
Q.E.D.
(TM)
c|c
(TM)
c | c
charles@calculationconsulting.com

More Related Content

What's hot

OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
MLReview
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
Chris Johnson
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
Darshan Patel
 
Cascading behavior in the networks
Cascading behavior in the networksCascading behavior in the networks
Cascading behavior in the networks
Vani Kandhasamy
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
Crossing Minds
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
Neville Li
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
Mounia Lalmas-Roelleke
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
bhavesh_physics
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Max Irwin
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
HackerEarth
 
Recommender system
Recommender systemRecommender system
Recommender system
Nilotpal Pramanik
 
Challenges and Solutions in Group Recommender Systems
Challenges and Solutions in Group Recommender SystemsChallenges and Solutions in Group Recommender Systems
Challenges and Solutions in Group Recommender Systems
Ludovico Boratto
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
Ganesh Venkataraman
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
Trieu Nguyen
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
Mounia Lalmas-Roelleke
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 

What's hot (20)

OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Cascading behavior in the networks
Cascading behavior in the networksCascading behavior in the networks
Cascading behavior in the networks
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Challenges and Solutions in Group Recommender Systems
Challenges and Solutions in Group Recommender SystemsChallenges and Solutions in Group Recommender Systems
Challenges and Solutions in Group Recommender Systems
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 

Similar to Weight watcher Bay Area ACM Feb 28, 2022

ENS Macrh 2022.pdf
ENS Macrh 2022.pdfENS Macrh 2022.pdf
ENS Macrh 2022.pdf
Charles Martin
 
WeightWatcher LLM Update
WeightWatcher LLM UpdateWeightWatcher LLM Update
WeightWatcher LLM Update
Charles Martin
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019
Charles Martin
 
Stanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksStanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning Works
Charles Martin
 
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Charles Martin
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Charles Martin
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Charles Martin
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Charles Martin
 
WeightWatcher Update: January 2021
WeightWatcher Update:  January 2021WeightWatcher Update:  January 2021
WeightWatcher Update: January 2021
Charles Martin
 
ICCF24.pdf
ICCF24.pdfICCF24.pdf
ICCF24.pdf
Charles Martin
 
Cc stat phys draft
Cc stat phys draftCc stat phys draft
Cc stat phys draft
Charles Martin
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
Mokhtar SELLAMI
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
Pooyan Jamshidi
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
Alexander Rakhlin
 
Search relevance
Search relevanceSearch relevance
Search relevance
Charles Martin
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
Nikos Fasarakis-Hilliard
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
Pooyan Jamshidi
 
CC mmds talk 2106
CC mmds talk 2106CC mmds talk 2106
CC mmds talk 2106
Charles Martin
 
Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
Charles Martin
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based Systems
CHOOSE
 

Similar to Weight watcher Bay Area ACM Feb 28, 2022 (20)

ENS Macrh 2022.pdf
ENS Macrh 2022.pdfENS Macrh 2022.pdf
ENS Macrh 2022.pdf
 
WeightWatcher LLM Update
WeightWatcher LLM UpdateWeightWatcher LLM Update
WeightWatcher LLM Update
 
This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019This Week in Machine Learning and AI Feb 2019
This Week in Machine Learning and AI Feb 2019
 
Stanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning WorksStanford ICME Lecture on Why Deep Learning Works
Stanford ICME Lecture on Why Deep Learning Works
 
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
 
Why Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural NetworksWhy Deep Learning Works: Self Regularization in Deep Neural Networks
Why Deep Learning Works: Self Regularization in Deep Neural Networks
 
WeightWatcher Update: January 2021
WeightWatcher Update:  January 2021WeightWatcher Update:  January 2021
WeightWatcher Update: January 2021
 
ICCF24.pdf
ICCF24.pdfICCF24.pdf
ICCF24.pdf
 
Cc stat phys draft
Cc stat phys draftCc stat phys draft
Cc stat phys draft
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
Integrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of RobotsIntegrated Model Discovery and Self-Adaptation of Robots
Integrated Model Discovery and Self-Adaptation of Robots
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Search relevance
Search relevanceSearch relevance
Search relevance
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
CC mmds talk 2106
CC mmds talk 2106CC mmds talk 2106
CC mmds talk 2106
 
Heavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdfHeavy Tails Workshop NeurIPS2023.pdf
Heavy Tails Workshop NeurIPS2023.pdf
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based Systems
 

More from Charles Martin

LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
Charles Martin
 
Georgetown B-school Talk 2021
Georgetown B-school Talk  2021Georgetown B-school Talk  2021
Georgetown B-school Talk 2021
Charles Martin
 
WeightWatcher Introduction
WeightWatcher IntroductionWeightWatcher Introduction
WeightWatcher Introduction
Charles Martin
 
Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery
Charles Martin
 
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Charles Martin
 
AI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start UpAI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start Up
Charles Martin
 
Capsule Networks
Capsule NetworksCapsule Networks
Capsule Networks
Charles Martin
 
Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107
Charles Martin
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin
 
Cc hass b school talk 2105
Cc hass b school talk  2105Cc hass b school talk  2105
Cc hass b school talk 2105
Charles Martin
 
CC Talk at Berekely
CC Talk at BerekelyCC Talk at Berekely
CC Talk at Berekely
Charles Martin
 

More from Charles Martin (11)

LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
 
Georgetown B-school Talk 2021
Georgetown B-school Talk  2021Georgetown B-school Talk  2021
Georgetown B-school Talk 2021
 
WeightWatcher Introduction
WeightWatcher IntroductionWeightWatcher Introduction
WeightWatcher Introduction
 
Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery Building AI Products: Delivery Vs Discovery
Building AI Products: Delivery Vs Discovery
 
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
Statistical Mechanics Methods for Discovering Knowledge from Production-Scale...
 
AI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start UpAI and Machine Learning for the Lean Start Up
AI and Machine Learning for the Lean Start Up
 
Capsule Networks
Capsule NetworksCapsule Networks
Capsule Networks
 
Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107Palo alto university rotary club talk Sep 29, 2107
Palo alto university rotary club talk Sep 29, 2107
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Cc hass b school talk 2105
Cc hass b school talk  2105Cc hass b school talk  2105
Cc hass b school talk 2105
 
CC Talk at Berekely
CC Talk at BerekelyCC Talk at Berekely
CC Talk at Berekely
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Weight watcher Bay Area ACM Feb 28, 2022

  • 1. calculation | consulting weightwatcher - a diagnostic tool for deep neural networks (TM) c|c (TM) charles@calculationconsulting.com
  • 2. calculation|consulting weightwatcher - a diagnostic tool for deep neural networks (TM) charles@calculationconsulting.com
  • 3. calculation | consulting why deep learning works Who Are We? c|c (TM) Dr. Charles H. Martin, PhD University of Chicago, Chemical Physics NSF Fellow in Theoretical Chemistry, UIUC Over 15 years experience in applied Machine Learning and AI ML algos for: Aardvark, acquired by Google (2010) Demand Media (eHow); first $1B IPO since Google Wall Street: BlackRock Fortune 500: Roche, France Telecom BigTech: eBay, Aardvark (Google), GoDaddy Private Equity: Griffin Advisors Alt. Energy: Anthropocene Institute (Page Family) www.calculationconsulting.com charles@calculationconsulting.com (TM) 3
  • 4. calculation | consulting why deep learning works c|c (TM) (TM) 4 Michael W. Mahoney ICSI, RISELab, Dept. of Statistics UC Berkeley Algorithmic and statistical aspects of modern large-scale data analysis. large-scale machine learning | randomized linear algebra geometric network analysis | scalable implicit regularization PhD, Yale University, computational chemical physics SAMSI National Advisory Committee NRC Committee on the Analysis of Massive Data Simons Institute Fall 2013 and 2018 program on the Foundations of Data Biennial MMDS Workshops on Algorithms for Modern Massive Data Sets NSF/TRIPODS-funded Foundations of Data Analysis Institute at UC Berkeley https://www.stat.berkeley.edu/~mmahoney/ mmahoney@stat.berkeley.edu Who Are We?
  • 5. c|c (TM) (TM) 5 calculation | consulting why deep learning works Understanding deep learning requires rethinking generalization Motivations: WeightWatcher Theory The weightwatcher theory is a Semi-Empirical theory based on:
 the Statistical Mechanics of Generalization, Random MatrixTheory, and the theory of Strongly Correlated Systems Great nerdy stuff, however, I will be discussing the weightwatcher tool and what it can do for you
  • 6. c|c (TM) (TM) 6 calculation | consulting why deep learning works Open source tool: weightwatcher
  • 7. c|c (TM) (TM) 7 calculation | consulting why deep learning works Open source tool: weightwatcher pip install weightwatcher … import weightwatcher as ww watcher = ww.WeightWatcher(model=model) results = watcher.analyze() watcher.get_summary() watcher.print_results() https://github.com/CalculatedContent/WeightWatcher
  • 8. c|c (TM) WeightWatcher: A diagnostic tool (TM) 8 calculation | consulting why deep learning works Analyze pre/trained pyTorch, TF/Keras, and ONNX models Inspect models that are difficult to train Gauge improvements in model performance Predict test accuracies across different models Detect problems when compressing or fine-tuning pretrained models pip install weightwatcher
  • 9. c|c (TM) Research: Implicit Self-Regularization in Deep Learning (TM) 9 calculation | consulting why deep learning works • Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning. • Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks • Workshop: Statistical Mechanics Methods for Discovering Knowledge from Production-Scale Neural Networks • Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data • (more submitted and in progress today) (JMLR 2021) (ICML 2019) (KDD 2020) (Nature Communications 2021) Selected publications
  • 10. c|c (TM) WeightWatcher: analyzes the ESD (eigenvalues) of the layer weight matrices (TM) 10 calculation | consulting why deep learning works The tail of the ESD contains the information
  • 11. c|c (TM) (TM) 11 calculation | consulting why deep learning works ESD of DNNs: detailed insight into W Empirical Spectral Density (ESD: eigenvalues of X) import keras import numpy as np import matplotlib.pyplot as plt … W = model.layers[i].get_weights()[0] N,M = W.shape() … X = np.dot(W.T, W)/N evals = np.linalg.eigvals(X) plt.hist(evals, bin=100, density=True)
  • 12. c|c (TM) WeightWatcher: analyzes the ESD (eigenvalues) of the layer weight matrices (TM) 12 calculation | consulting why deep learning works details_df = watcher.analyze(model=your_model) The tool provides various plots, quality metrics, and transforms
  • 13. c|c (TM) WeightWatcher: analyzes the ESD (eigenvalues) of the layer weight matrices (TM) 13 calculation | consulting why deep learning works watcher.analyze(…, plot=True, …)
  • 14. c|c (TM) WeightWatcher: analyzes the ESD (eigenvalues) of the layer weight matrices (TM) 14 calculation | consulting why deep learning works Well trained laters are heavy-tailed and well shaped GPT-2 Fits a Power Law (or Truncated Power Law) alpha in [2, 6] watcher.analyze(plot=True) Good quality of fit (D is small)
  • 15. c|c (TM) WeightWatcher: analyzes the ESD (eigenvalues) of the layer weight matrices (TM) 15 calculation | consulting why deep learning works Better trained laters are more heavy-tailed and better shaped GPT GPT-2
  • 16. c|c (TM) (TM) 16 calculation | consulting why deep learning works Heavy Tailed Metrics: GPT vs GPT2 The original GPT is poorly trained on purpose; GPT2 is well trained alpha for every layer
 smaller alpha is better large alpha are bad fits
  • 17. c|c (TM) (TM) 17 calculation | consulting why deep learning works Power Law Universality: ImageNet All ImageNet models display remarkable Heavy Tailed Universality 500 matrices ~50 architectures Linear layers & Conv2D feature maps 80-90% < 4
  • 18. c|c (TM) (TM) 18 calculation | consulting why deep learning works Random Matrix Theory: detailed insight into WL DNN training induces breakdown of Gaussian random structure and the onset of a new kind of heavy tailed self-regularization Gaussian random matrix Bulk+ Spikes Heavy Tailed Small, older NNs Large, modern DNNs and/or Small batch sizes
  • 19. c|c (TM) (TM) 19 calculation | consulting why deep learning works Random Matrix Theory: Marcenko Pastur plus Tracy-Widom fluctuations very crisp edges Q RMT says if W is a simple random Gaussian matrix, then the ESD will have a very simple , known form Shape depends on Q=N/M (and variance ~ 1) Eigenvalues tightly bounded a few spikes may appear
  • 20. c|c (TM) (TM) 20 calculation | consulting why deep learning works RMT: AlexNet Marchenko-Pastur Bulk-decay | Heavy Tailed FC1 zoomed in FC2 zoomed in
  • 21. c|c (TM) (TM) 21 calculation | consulting why deep learning works Random Matrix Theory: Heavy Tailed But if W is heavy tailed, the ESD will also have heavy tails (i.e. its all spikes, bulk vanishes) If W is strongly correlated , then the ESD can be modeled as if W is drawn from a heavy tailed distribution Nearly all pre-trained DNNs display heavy tails…as shall soon see
  • 22. c|c (TM) (TM) 22 calculation | consulting why deep learning works AlexNet, 
 VGG, ResNet, Inception, DenseNet, … Heavy Tailed RMT: Scale Free ESD All large, well trained, modern DNNs exhibit heavy tailed self-regularization scale free
  • 23. c|c (TM) (TM) 23 calculation | consulting why deep learning works HT-SR Theory: 5+1 Phases of Training Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning Charles H. Martin, Michael W. Mahoney; JMLR 22(165):1−73, 2021.
  • 24. c|c (TM) (TM) 24 calculation | consulting why deep learning works Heavy Tailed RMT: Universality Classes The familiar Wigner/MP Gaussian class is not the only Universality class in RMT
  • 25. c|c (TM) WeightWatcher: predict trends in generalization (TM) 25 calculation | consulting why deep learning works Predict test accuracies across variations in hyper-parameters The average Power Law exponent alpha predicts generalization—at fixed depth 
 Smaller average-alpha is better Better models are easier to treat
  • 26. c|c (TM) WeightWatcher: Shape vs Scale metrics (TM) 26 calculation | consulting why deep learning works Purely norm-based (scale) metrics (from SLT) can be correlated with depth but anti-correlated with hyper-parameter changes
  • 27. c|c (TM) WeightWatcher: treat architecture changes (TM) 27 calculation | consulting why deep learning works Predict test accuracies across variations in hyper-parameters and depth The alpha-hat metric combines shape and scale metrics and corrects for different depths (grey line) can be derived from theory…
  • 28. c|c (TM) WeightWatcher: predict test accuracies (TM) 28 calculation | consulting why deep learning works alpha-hat works for 100s of different CV and NLP models (Nature Communications 2021) We do not have access to The training or test data But we can still predict trends in the generalization
  • 29. c|c (TM) (TM) 29 calculation | consulting why deep learning works Predicting test accuracies: Heavy-tailed shape metrics The heavy tailed metrics perform best Weighted Alpha Alpha (Shatten) Norm
  • 30. c|c (TM) WeightWatcher: predict test accuracies (TM) 30 calculation | consulting why deep learning works ResNet, DenseNet, etc. (Nature Communications 2021)
  • 31. c|c (TM) (TM) 31 calculation | consulting why deep learning works
  • 32. c|c (TM) (TM) 32 calculation | consulting why deep learning works Experiments: just apply to pre-trained Models LeNet5 (1998) AlexNet (2012) InceptionV3 (2014) ResNet (2015) … DenseNet201 (2018) https://medium.com/@siddharthdas_32104/ cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5 Conv2D MaxPool Conv2D MaxPool FC FC
  • 33. c|c (TM) (TM) 33 calculation | consulting why deep learning works Predicting test accuracies: 100 pretrained models The heavy tailed metrics perform best https://github.com/osmr/imgclsmob From an open source sandbox of nearly 500 pretrained CV models (picked >=5 models / regression)
  • 34. c|c (TM) (TM) 34 calculation | consulting why deep learning works Correlation Flow: CV Models We can study correlation flow looking at vs. depth VGG ResNet DenseNet
  • 35. c|c (TM) WeightWatcher: detect overfitting (TM) 35 calculation | consulting why deep learning works Provide a data-independent criteria for early-stopping
  • 36. c|c (TM) WeightWatcher: global and local convexity metrics (TM) 36 calculation | consulting why deep learning works Smaller alpha corresponds to more convex energy landscapes Transformers (alpha ~ 3-4 or more) alpha 2-3 (or less) Rational Decisions, Random Matrices and Spin Glasses" (1998) by Galluccio, Bouchaud, and Potters:
  • 37. c|c (TM) WeightWatcher: global and local convexity metrics (TM) 37 calculation | consulting why deep learning works When the layer alpha < 2, we think this means the layer is overfit We suspect that the early layers of some Convolutional Nets may be slightly overtrained Some alpha < 2 This is predicted from our HTSR theory
  • 38. c|c (TM) WeightWatcher: scale and shape anomalies (TM) 38 calculation | consulting why deep learning works We can detect problems in layers not detectable otherwise
  • 39. c|c (TM) (TM) 39 calculation | consulting why deep learning works Detect potential signatures of over-fitting WeightWatcher: Correlation Traps watcher.analyze(plot=True, randomize=True)
  • 40. c|c (TM) WeightWatcher: SVDSharpness transform (TM) 40 calculation | consulting why deep learning works Remove potential signatures of over-fitting Like PAC-bounds Sharpness transform Clips bad elements in W using RMT clip smoothed_model = watcher.SVDSharpness(model=your_model)
  • 41. c|c (TM) WeightWatcher: RMT-based shape metrics (TM) 41 calculation | consulting why deep learning works ww also includes predictive, non-parametric shape metrics rand_distance = jensen_shannon_distance(original_esd, random_esd)
  • 42. c|c (TM) WeightWatcher: RMT-based shape metrics (TM) 42 calculation | consulting why deep learning works the layer rand_distance and alpha metrics are correlated
  • 43. c|c (TM) WeightWatcher: interpreting shapes (TM) 43 calculation | consulting why deep learning works very high accuracy requires advanced methods hard easy X :)
  • 44. c|c (TM) WeightWatcher: more Power Law shape metrics (TM) 44 calculation | consulting why deep learning works watcher.analyze(…, fit=‘TPL’) Truncated Power Law fits fit=‘E_TPL’) weightwatcher provides several shape (and scale) metrics plus several more unpublished experimental options
  • 45. c|c (TM) WeightWatcher: E_TPL shape metric (TM) 45 calculation | consulting why deep learning works the E_TPL (and rand_distance) shape metrics track the learning curve epoch-by-epoch Training MT transformers from scratch to SOTA Extended Truncated Power Law highly accurate results leverage the advanced shape metrics Here, (Lambda) is the shape metric
  • 46. c|c (TM) WeightWatcher: why Power Law fits ? (TM) 46 calculation | consulting why deep learning works Spiking (i.e real) neurons exhibit power law behavior weightwatcher supports several PL fits from experimental neuroscience plus totally new shape metrics we have invented (and published)
  • 47. c|c (TM) WeightWatcher: why Power Law fits ? (TM) 47 calculation | consulting why deep learning works Spiking (i.e real) neurons exhibit (truncated) power law behavior The Critical Brain Hypothesis Evidence of Self-Organized Criticality (SOC) Per Bak (How Nature Works) As neural systems become more complex they exhibit power law behavior and then truncated power law behavior We see exactly this behavior in DNNs and it is predictive of learning capacity
  • 48. c|c (TM) WeightWatcher: open-source, open-science (TM) 48 calculation | consulting why deep learning works We are looking for early adopters and collaborators github.com/CalculatedContent/WeightWatcher We have a Slack channel to support the tool Please file issues Ping me to join
  • 49. c|c (TM) (TM) 49 calculation | consulting why deep learning works Statistical Mechanics derivation of the alpha-hat metric
  • 50. c|c (TM) (TM) 50 calculation | consulting why deep learning works Classic Set Up: Student-Teacher model Statistical Mechanics of Learning Engle &Van den Broeck (2001) Generalization error ~ phase space volume Average error ~ overlap between T and J
  • 51. c|c (TM) (TM) 51 calculation | consulting why deep learning works Classic Set Up: Student-Teacher model Statistical Mechanics of Learning Engle &Van den Broeck (2001) Standard approach: • Teacher (T) and Student (J) random Perceptron vectors • Treat data as an external random Gaussian field • Apply Hubbard–Stratonovich to get mean-field result • Assume continuous or discrete J • Solve for as a function of load (# data points / # parameters)
  • 52. c|c (TM) (TM) 52 calculation | consulting why deep learning works New Set Up: Matrix-generalized Student-Teacher x Continuous perceptron Ising Perceptron Uninteresting Replica theory, shows phase behavior, entropy collapse, etc
  • 53. c|c (TM) (TM) 53 calculation | consulting why deep learning works New Set Up: Matrix-generalized Student-Teacher “Towards a new theory…” Martin, Milletari, & Mahoney (in preparation) real DNN matrices: NxM Strongly correlated Heavy tailed correlation matrices Solve for total integrated phase-space volume
  • 54. c|c (TM) (TM) 54 calculation | consulting why deep learning works New approach: HCIZ Matrix Integrals Write the overlap as Fix a Teacher. The integral is now over all random students J that overlap w/T Use the following result in RMT “Asymptotics of HCZI integrals …” Tanaka (2008)
  • 55. c|c (TM) (TM) 55 calculation | consulting why deep learning works RMT: Annealed vs Quenched averages “A First Course in Random Matrix Theory” Potters and Bouchaud (2020) good outside spin-glass phases where system is trained well We imagine averaging over all (random) students DNNs with (correlations that) look like the teacher DNN
  • 56. c|c (TM) (TM) 56 calculation | consulting why deep learning works New interpretation: HCIZ Matrix Integrals Generating functional R-Transform (inverse Green’s function, via Contour Integral) in terms of Teacher's eigenvalues , and Student’s cumulants
  • 57. c|c (TM) (TM) 57 calculation | consulting why deep learning works Some basic RMT: Greens functions The Green’s function is the Stieltjes transform of eigenvalue distribution Given the empirical spectral density (average eigenvalue density) and using:
  • 58. c|c (TM) (TM) 58 calculation | consulting why deep learning works Some basic RMT: Moment generating functions The Green’s function has poles at the actual eigenvalues But is analytic in the complex plane up and away from the real axis z Expand in a series around z = ∞ moment generating function
  • 59. c|c (TM) (TM) 59 calculation | consulting why deep learning works Some basic RMT: R-Transforms Which gives a (similar) moment generating function The free-cumulant-generating function (R-transform) is related to the Green function as Gaussian random and very Heavy Tailed (Levy) random matrices but which takes a simple form for both
  • 60. c|c (TM) (TM) 60 calculation | consulting why deep learning works Results: Gaussian Random Weight Matrices “Random Matrix Theory (book)” Bouchaud and Potters (2020) Recover the Frobenius Norm (squared) as the metric
  • 61. c|c (TM) (TM) 61 calculation | consulting why deep learning works Results: (very) Heavy Tailed Weight Matrices “Heavy-tailed random matrices” Burda and Jukiewicz (2009) Recover a Shatten Norm, in terms of the Heavy Tailed exponent
  • 62. c|c (TM) (TM) 62 calculation | consulting why deep learning works Application to: Heavy Tailed Weight Matrices Some reasonable approximations give the weighted alpha metric Q.E.D.