Description: WeightWatcher (WW): is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data. It can be used to:analyze pre/trained PyTorch, Keras, DNN models (Conv2D and Dense layers) monitor models, and the model layers, to see if they are over-trained or over-parameterized, predict test accuracies across different models, with or without training data, and detect potential problems when compressing or fine-tuning pre-trained models. see https://weightwatcher.ai
by Harald Steck (Netflix Inc., US), Roelof van Zwol (Netflix Inc., US) and Chris Johnson (Spotify Inc., US)
Slides of the tutorial on interactive recommender systems at the 2015 conference on Recommender Systems (RecSys).
Interactive recommender systems enable the user to steer the received recommendations in the desired direction through explicit interaction with the system. In the larger ecosystem of recommender systems used on a website, it is positioned between a lean-back recommendation experience and an active search for a specific piece of content. Besides this aspect, we will discuss several parts that are especially important for interactive recommender systems, including the following: design of the user interface and its tight integration with the algorithm in the back-end; computational efficiency of the recommender algorithm; as well as choosing the right balance between exploiting the feedback from the user as to provide relevant recommendations, and enabling the user to explore the catalog and steer the recommendations in the desired direction.
In particular, we will explore the field of interactive video and music recommendations and their application at Netflix and Spotify. We outline some of the user-experiences built, and discuss the approaches followed to tackle the various aspects of interactive recommendations. We present our insights from user studies and A/B tests.
The tutorial targets researchers and practitioners in the field of recommender systems, and will give the participants a unique opportunity to learn about the various aspects of interactive recommender systems in the video and music domain. The tutorial assumes familiarity with the common methods of recommender systems.
DATE: Wednesday, Sept 16, 2015, 11:00-12:30
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
A talk on Transformers at GDG DevParty
27.06.2020
Link to Google Slides version: https://docs.google.com/presentation/d/1N7ayCRqgsFO7TqSjN4OWW-dMOQPT5DZcHXsZvw8-6FU/edit?usp=sharing
This document outlines key concepts in recommendation systems. It begins by defining the traditional recommender problem as predicting user ratings for items based on past behavior and relationships. It then discusses lessons learned from the Netflix Prize competition, including the effectiveness of singular value decomposition and the limitations of models designed only for rating prediction. The document outlines approaches beyond rating prediction, including ranking, similarity, social recommendations, and explore/exploit tradeoffs. It discusses optimizing recommendation pages and using higher-order models like tensor factorization. In summary, it provides an overview of traditional and modern approaches in recommendation systems.
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
Tutorial for KDD 2019:
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems.
In this tutorial, we summarize the current effort of deep learning for NLP in search/recommender systems. We first give an overview of search/recommender systems with NLP, then introduce basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we share our hands-on experience with LinkedIn applications. In the end, we highlight several important future trends.
Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications and ML workloads from a laptop to a cluster, with an emphasis on the unique performance challenges of ML/AI systems. It is now used in many production deployments.
This talk will cover Ray’s overview, architecture, core concepts, and primitives, such as remote Tasks and Actors; briefly discuss Ray native libraries (Ray Tune, Ray Train, Ray Serve, Ray Datasets, RLlib); and Ray’s growing ecosystem.
Through a demo using XGBoost for classification, we will demonstrate how you can scale training, hyperparameter tuning, and inference—from a single node to a cluster, with tangible performance difference when using Ray.
The takeaways from this talk are :
Learn Ray architecture, core concepts, and Ray primitives and patterns
Why Distributed computing will be the norm not an exception
How to scale your ML workloads with Ray libraries:
Training on a single node vs. Ray cluster, using XGBoost with/without Ray
Hyperparameter search and tuning, using XGBoost with Ray Tune
Inferencing at scale, using XGBoost with/without Ray
The Netflix experience is driven by a number of Machine Learning algorithms: personalized ranking, page generation, search, similarity, ratings, etc. On the 6th of January, we simultaneously launched Netflix in 130 new countries around the world, which brings the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this post, we highlight the four most interesting challenges we’ve encountered in making our algorithms operate globally and, most importantly, how this improved our ability to connect members worldwide with stories they'll love.
by Harald Steck (Netflix Inc., US), Roelof van Zwol (Netflix Inc., US) and Chris Johnson (Spotify Inc., US)
Slides of the tutorial on interactive recommender systems at the 2015 conference on Recommender Systems (RecSys).
Interactive recommender systems enable the user to steer the received recommendations in the desired direction through explicit interaction with the system. In the larger ecosystem of recommender systems used on a website, it is positioned between a lean-back recommendation experience and an active search for a specific piece of content. Besides this aspect, we will discuss several parts that are especially important for interactive recommender systems, including the following: design of the user interface and its tight integration with the algorithm in the back-end; computational efficiency of the recommender algorithm; as well as choosing the right balance between exploiting the feedback from the user as to provide relevant recommendations, and enabling the user to explore the catalog and steer the recommendations in the desired direction.
In particular, we will explore the field of interactive video and music recommendations and their application at Netflix and Spotify. We outline some of the user-experiences built, and discuss the approaches followed to tackle the various aspects of interactive recommendations. We present our insights from user studies and A/B tests.
The tutorial targets researchers and practitioners in the field of recommender systems, and will give the participants a unique opportunity to learn about the various aspects of interactive recommender systems in the video and music domain. The tutorial assumes familiarity with the common methods of recommender systems.
DATE: Wednesday, Sept 16, 2015, 11:00-12:30
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
A talk on Transformers at GDG DevParty
27.06.2020
Link to Google Slides version: https://docs.google.com/presentation/d/1N7ayCRqgsFO7TqSjN4OWW-dMOQPT5DZcHXsZvw8-6FU/edit?usp=sharing
This document outlines key concepts in recommendation systems. It begins by defining the traditional recommender problem as predicting user ratings for items based on past behavior and relationships. It then discusses lessons learned from the Netflix Prize competition, including the effectiveness of singular value decomposition and the limitations of models designed only for rating prediction. The document outlines approaches beyond rating prediction, including ranking, similarity, social recommendations, and explore/exploit tradeoffs. It discusses optimizing recommendation pages and using higher-order models like tensor factorization. In summary, it provides an overview of traditional and modern approaches in recommendation systems.
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
Tutorial for KDD 2019:
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems.
In this tutorial, we summarize the current effort of deep learning for NLP in search/recommender systems. We first give an overview of search/recommender systems with NLP, then introduce basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we share our hands-on experience with LinkedIn applications. In the end, we highlight several important future trends.
Modern machine learning (ML) workloads, such as deep learning and large-scale model training, are compute-intensive and require distributed execution. Ray is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications and ML workloads from a laptop to a cluster, with an emphasis on the unique performance challenges of ML/AI systems. It is now used in many production deployments.
This talk will cover Ray’s overview, architecture, core concepts, and primitives, such as remote Tasks and Actors; briefly discuss Ray native libraries (Ray Tune, Ray Train, Ray Serve, Ray Datasets, RLlib); and Ray’s growing ecosystem.
Through a demo using XGBoost for classification, we will demonstrate how you can scale training, hyperparameter tuning, and inference—from a single node to a cluster, with tangible performance difference when using Ray.
The takeaways from this talk are :
Learn Ray architecture, core concepts, and Ray primitives and patterns
Why Distributed computing will be the norm not an exception
How to scale your ML workloads with Ray libraries:
Training on a single node vs. Ray cluster, using XGBoost with/without Ray
Hyperparameter search and tuning, using XGBoost with Ray Tune
Inferencing at scale, using XGBoost with/without Ray
The Netflix experience is driven by a number of Machine Learning algorithms: personalized ranking, page generation, search, similarity, ratings, etc. On the 6th of January, we simultaneously launched Netflix in 130 new countries around the world, which brings the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this post, we highlight the four most interesting challenges we’ve encountered in making our algorithms operate globally and, most importantly, how this improved our ability to connect members worldwide with stories they'll love.
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
This document proposes using meta-learning and an LSTM model to learn an optimization algorithm for few-shot learning. The model, called a meta-learner, is trained on multiple datasets to learn how to efficiently train a learner network on new small datasets. The meta-learner LSTM models the parameter updates of the learner network during training, learning an initialization and update rule. The inputs to the meta-learner are the loss, parameters, and gradient, and it outputs updated parameters. This learned update rule can then be used to train the learner network on new small datasets, enabling few-shot learning using only a small amount of labeled data.
This document summarizes an approach for scaling implicit matrix factorization to large datasets using Apache Spark. It discusses three attempts at implementing alternating least squares for collaborative filtering in Spark. The first two attempts shuffle data across nodes on each iteration. The third attempt partitions and caches the user/item vectors, then builds mappings to join local blocks of data and update vectors within each partition, avoiding shuffles between iterations for more efficient distributed computation.
This slides explains how Convolution Neural Networks can be coded using Google TensorFlow.
Video available at : https://www.youtube.com/watch?v=EoysuTMmmMc
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
This document provides an overview of Scala data pipelines at Spotify. It discusses:
- The speaker's background and Spotify's scale with over 75 million active users.
- Spotify's music recommendation systems including Discover Weekly and personalized radio.
- How Scala and frameworks like Scalding, Spark, and Crunch are used to build data pipelines for tasks like joins, aggregations, and machine learning algorithms.
- Techniques for optimizing pipelines including distributed caching, bloom filters, and Parquet for efficient storage and querying of large datasets.
- The speaker's success in migrating over 300 jobs from Python to Scala and growing the team of engineers building Scala pipelines at Spotify.
These are the slides of a talk about some of our research at Spotify, as part of the celebration kickoff of Chalmers AI Research Centre in Gothenburg. I always like to make a story in my talk, and this time I wanted to reflect on the "push" (think recommender system) and "pull" (think search) paradigms. I am using this quote from Nicholas Belkin and Bruce Croft from their Communications of the ACM article published in 1992 to frame my story: "We conclude that information retrieval and information filtering are indeed two sides of the same coin. They work together to help people get the information needed to perform their tasks."
This document discusses fine-tuning the BERT model with PyTorch and the Transformers library. It provides an overview of BERT, how it was trained, its special tokens, the Transformers library, preprocessing text for BERT, using the BertModel class, the approach to fine-tuning BERT for a task, creating a dataset and data loaders, and training and validating the model.
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
Presentation as given to the Haystack Conference, which outlines research and techniques for automatic extraction of keywords, concepts, and vocabularies from text corpora.
This document proposes a calibrated recommendations approach that aims to provide recommendations that reflect all of a user's interests in correct proportions. Standard recommender systems trained for accuracy can lead to unbalanced recommendations that amplify a user's main interests and crowd out lesser interests. The calibrated recommendations approach uses a post-processing re-ranking step to optimize a submodular calibration metric, balancing accuracy and fairness by recommending items from all a user's interests in their correct proportions. Experiments on MovieLens data show that calibration can be improved significantly without degrading accuracy much.
How to Win Machine Learning Competitions ? HackerEarth
This presentation was given by Marios Michailidis (a.k.a Kazanova), Current Kaggle Rank #3 to help community learn machine learning better. It comprises of useful ML tips and techniques to perform better in machine learning competitions. Read the full blog: http://blog.hackerearth.com/winning-tips-machine-learning-competitions-kazanova-current-kaggle-3
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
Challenges and Solutions in Group Recommender SystemsLudovico Boratto
The document discusses group recommender systems. It begins with an overview of recommender systems principles and introduces the concept of group recommendation. It then outlines several key tasks in group recommendation systems, including defining different types of groups, acquiring preferences, modeling groups, predicting ratings, helping groups reach consensus, and explaining recommendations to groups. The document provides examples of approaches used in existing systems for each of these tasks. It also surveys common techniques for modeling groups, such as additive utilitarian, multiplicative utilitarian, Borda count, and Copeland rule strategies.
This document provides an overview of learning to rank search results. It discusses how search involves understanding queries and systems to retrieve relevant documents. Ranking search results is framed as a learning problem where machine learning models are trained on human-labeled data. The document compares three approaches to learning to rank - pointwise, pairwise, and listwise - and notes that listwise is preferred as it directly optimizes ranked lists while avoiding issues of the other methods. It also addresses challenges in collecting unbiased training data from click logs to train ranking models.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
These are the slides of my talk at the 2019 Netflix Workshop on Personalization, Recommendation and Search (PRS). This talk is based on previous talks on research we are doing at Spotify, but here I focus on the work we do on personalizing Spotify Home, with respect to success, intent & diversity. The link to the workshop is https://prs2019.splashthat.com/. This is research from various people at Spotify, and has been published at RecSys 2018, CIKM 2018 and WWW (The Web Conference) 2019.
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
This document provides an outline for a tutorial on deep learning in recommender systems. The tutorial covers various models from linear families such as matrix factorization and topic models, as well as non-linear models using deep learning techniques. It discusses modeling context, interpreting neural network recommender models, and using reinforcement learning in recommender systems. The outline also includes background on Netflix's recommender system and an evolution of recommender models from explicit to implicit feedback and linear to non-linear approaches.
This document introduces WeightWatcher, an open-source tool for analyzing the eigenvalue spectrum distributions (ESD) of deep neural network weight matrices. WeightWatcher finds that well-trained networks exhibit heavy-tailed ESDs, in line with predictions from random matrix theory and the theory of strongly correlated systems. The tool can predict trends in test accuracy based on the shape of ESDs, without access to training or test data. The document provides an overview of the theoretical foundations and capabilities of WeightWatcher.
This document introduces WeightWatcher, a tool for analyzing the eigenvalues of weight matrices in deep neural networks. It was created by Dr. Charles H. Martin and Calculation Consulting to provide "data free diagnostics" for deep learning models using insights from random matrix theory and statistical mechanics. WeightWatcher can analyze pre-trained models to evaluate layer quality, predict generalization performance, and compare different network architectures, without access to the training data. The document provides an overview of the theoretical foundations and empirical evidence supporting WeightWatcher's methods.
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
This document proposes using meta-learning and an LSTM model to learn an optimization algorithm for few-shot learning. The model, called a meta-learner, is trained on multiple datasets to learn how to efficiently train a learner network on new small datasets. The meta-learner LSTM models the parameter updates of the learner network during training, learning an initialization and update rule. The inputs to the meta-learner are the loss, parameters, and gradient, and it outputs updated parameters. This learned update rule can then be used to train the learner network on new small datasets, enabling few-shot learning using only a small amount of labeled data.
This document summarizes an approach for scaling implicit matrix factorization to large datasets using Apache Spark. It discusses three attempts at implementing alternating least squares for collaborative filtering in Spark. The first two attempts shuffle data across nodes on each iteration. The third attempt partitions and caches the user/item vectors, then builds mappings to join local blocks of data and update vectors within each partition, avoiding shuffles between iterations for more efficient distributed computation.
This slides explains how Convolution Neural Networks can be coded using Google TensorFlow.
Video available at : https://www.youtube.com/watch?v=EoysuTMmmMc
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
This document provides an overview of Scala data pipelines at Spotify. It discusses:
- The speaker's background and Spotify's scale with over 75 million active users.
- Spotify's music recommendation systems including Discover Weekly and personalized radio.
- How Scala and frameworks like Scalding, Spark, and Crunch are used to build data pipelines for tasks like joins, aggregations, and machine learning algorithms.
- Techniques for optimizing pipelines including distributed caching, bloom filters, and Parquet for efficient storage and querying of large datasets.
- The speaker's success in migrating over 300 jobs from Python to Scala and growing the team of engineers building Scala pipelines at Spotify.
These are the slides of a talk about some of our research at Spotify, as part of the celebration kickoff of Chalmers AI Research Centre in Gothenburg. I always like to make a story in my talk, and this time I wanted to reflect on the "push" (think recommender system) and "pull" (think search) paradigms. I am using this quote from Nicholas Belkin and Bruce Croft from their Communications of the ACM article published in 1992 to frame my story: "We conclude that information retrieval and information filtering are indeed two sides of the same coin. They work together to help people get the information needed to perform their tasks."
This document discusses fine-tuning the BERT model with PyTorch and the Transformers library. It provides an overview of BERT, how it was trained, its special tokens, the Transformers library, preprocessing text for BERT, using the BertModel class, the approach to fine-tuning BERT for a task, creating a dataset and data loaders, and training and validating the model.
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
Presentation as given to the Haystack Conference, which outlines research and techniques for automatic extraction of keywords, concepts, and vocabularies from text corpora.
This document proposes a calibrated recommendations approach that aims to provide recommendations that reflect all of a user's interests in correct proportions. Standard recommender systems trained for accuracy can lead to unbalanced recommendations that amplify a user's main interests and crowd out lesser interests. The calibrated recommendations approach uses a post-processing re-ranking step to optimize a submodular calibration metric, balancing accuracy and fairness by recommending items from all a user's interests in their correct proportions. Experiments on MovieLens data show that calibration can be improved significantly without degrading accuracy much.
How to Win Machine Learning Competitions ? HackerEarth
This presentation was given by Marios Michailidis (a.k.a Kazanova), Current Kaggle Rank #3 to help community learn machine learning better. It comprises of useful ML tips and techniques to perform better in machine learning competitions. Read the full blog: http://blog.hackerearth.com/winning-tips-machine-learning-competitions-kazanova-current-kaggle-3
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
Challenges and Solutions in Group Recommender SystemsLudovico Boratto
The document discusses group recommender systems. It begins with an overview of recommender systems principles and introduces the concept of group recommendation. It then outlines several key tasks in group recommendation systems, including defining different types of groups, acquiring preferences, modeling groups, predicting ratings, helping groups reach consensus, and explaining recommendations to groups. The document provides examples of approaches used in existing systems for each of these tasks. It also surveys common techniques for modeling groups, such as additive utilitarian, multiplicative utilitarian, Borda count, and Copeland rule strategies.
This document provides an overview of learning to rank search results. It discusses how search involves understanding queries and systems to retrieve relevant documents. Ranking search results is framed as a learning problem where machine learning models are trained on human-labeled data. The document compares three approaches to learning to rank - pointwise, pairwise, and listwise - and notes that listwise is preferred as it directly optimizes ranked lists while avoiding issues of the other methods. It also addresses challenges in collecting unbiased training data from click logs to train ranking models.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
These are the slides of my talk at the 2019 Netflix Workshop on Personalization, Recommendation and Search (PRS). This talk is based on previous talks on research we are doing at Spotify, but here I focus on the work we do on personalizing Spotify Home, with respect to success, intent & diversity. The link to the workshop is https://prs2019.splashthat.com/. This is research from various people at Spotify, and has been published at RecSys 2018, CIKM 2018 and WWW (The Web Conference) 2019.
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
This document provides an outline for a tutorial on deep learning in recommender systems. The tutorial covers various models from linear families such as matrix factorization and topic models, as well as non-linear models using deep learning techniques. It discusses modeling context, interpreting neural network recommender models, and using reinforcement learning in recommender systems. The outline also includes background on Netflix's recommender system and an evolution of recommender models from explicit to implicit feedback and linear to non-linear approaches.
This document introduces WeightWatcher, an open-source tool for analyzing the eigenvalue spectrum distributions (ESD) of deep neural network weight matrices. WeightWatcher finds that well-trained networks exhibit heavy-tailed ESDs, in line with predictions from random matrix theory and the theory of strongly correlated systems. The tool can predict trends in test accuracy based on the shape of ESDs, without access to training or test data. The document provides an overview of the theoretical foundations and capabilities of WeightWatcher.
This document introduces WeightWatcher, a tool for analyzing the eigenvalues of weight matrices in deep neural networks. It was created by Dr. Charles H. Martin and Calculation Consulting to provide "data free diagnostics" for deep learning models using insights from random matrix theory and statistical mechanics. WeightWatcher can analyze pre-trained models to evaluate layer quality, predict generalization performance, and compare different network architectures, without access to the training data. The document provides an overview of the theoretical foundations and empirical evidence supporting WeightWatcher's methods.
This Week in Machine Learning and AI Feb 2019Charles Martin
This document summarizes research into implicit self-regularization in deep neural networks. It discusses how analyzing the eigenvalue spectrum of weight matrices can provide insights into the learning dynamics. Large, well-trained modern networks exhibit heavy-tailed eigenvalue distributions rather than Gaussian distributions. This heavy-tailed behavior acts as a form of self-regularization and may explain why large networks generalize well despite having many parameters. The document presents analysis of various networks showing this heavy-tailed behavior is universal across different architectures and datasets. It proposes that metrics based on the heavy-tailed behavior could predict a network's generalization performance without access to test data.
Stanford ICME Lecture on Why Deep Learning WorksCharles Martin
Random Matrix Theory (RMT) is applied to analyze the weight matrices
of Deep Neural Networks (DNNs), including production quality,
pre-trained models, and smaller models trained from scratch. Empirical
and theoretical results indicate that the DNN training process itself
implements a form of self-regularization, evident in the empirical
spectral density (ESD) of DNN layer matrices. To understand this, we
provide a phenomenology to identify 5+1 Phases of Training,
corresponding to increasing amounts of implicit self-regularization.
For smaller and/or older DNNs, this implicit self-regularization is
like traditional Tikhonov regularization, with a "size scale"
separating signal from noise. For state-of-the-art DNNs, however, we
identify a novel form of heavy-tailed self-regularization, similar to
the self-organization seen in the statistical physics of disordered systems.
To that end, building on the statistical mechanics of generalization,
and applying recent results from RMT, we derive a new VC-like
complexity metric that resembles the familiar product norms, but is
suitable for studying average-case generalization behavior in real
systems. We then demonstrate its effectiveness by testing how well
this new metric correlates with trends in the reported test accuracies
across models for over 450 pretrained DNNs covering a range of data
sets and architectures.
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyCharles Martin
Talk given on Dec 13, 2018 at ICSI, UC Berkeley
http://www.icsi.berkeley.edu/icsi/events/2018/12/regularization-neural-networks
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit self-regularization. For smaller and/or older DNNs, this implicit self-regularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of heavy-tailed self-regularization, similar to the self-organization seen in the statistical physics of disordered systems. Moreover, we can use these heavy tailed results to form a VC-like average case complexity metric that resembles the product norm used in analyzing toy NNs, and we can use this to predict the test accuracy of pretrained DNNs without peeking at the test data.
Why Deep Learning Works: Self Regularization in Deep Neural NetworksCharles Martin
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Why Deep Learning Works: Self Regularization in Deep Neural Networks Charles Martin
Talk given on June 8, 2018 at UC Berkeley / NERSC
In Collaboration with Michael Mahoney, UC Berkeley
National Energy Research Scientific Computing Center
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Why Deep Learning Works: Self Regularization in Deep Neural NetworksCharles Martin
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
In Collaboration with Michael Mahoney, UC Berkeley
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
1) Calculation Consulting is led by Dr. Charles H. Martin, who has over 10 years of experience in applied machine learning.
2) They developed a technique called "weightwatcher" to predict test accuracy on over 100 neural network models using only the training data.
3) Weightwatcher was able to predict generalization by addressing Simpson's Paradox and accounting for how network depth and solver hyperparameters interact in complex ways.
This document introduces Calculation Consulting, a firm that provides expertise in applied machine learning and artificial intelligence. The firm was founded by Dr. Charles H. Martin and Michael W. Mahoney, who have extensive academic and industry experience in machine learning algorithms. The document then discusses Calculation Consulting's work on developing a new semi-empirical theory called WeightWatcher to better understand why deep learning is effective, such as their tool that can predict trends in test accuracies for common deep learning models without training or test data.
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
This document presents a study comparing Long Short-Term Memory (LSTM) architectures for next frame forecasting in satellite image time series data. Three models - ConvLSTM, Stack-LSTM and CNN-LSTM - were implemented and evaluated based on training loss, time and structural similarity between predicted and actual images. The CNN-LSTM architecture was found to provide the best performance, achieving accurate predictions while requiring less processing time than ConvLSTM for higher resolution images. Overall, the study demonstrates the suitability of deep learning models like CNN-LSTM for predictive tasks using earth observation satellite imagery time series data.
Integrated Model Discovery and Self-Adaptation of RobotsPooyan Jamshidi
Machine learn models efficiently under budget constraints to adapt to perturbations such as environmental changes or changes in the internal resources.
Modern software-intensive systems are composed of components that are likely to change their behaviour over time (e.g., adding/removing components).
For software to continue to operate under such changes, the assumptions about parts of the system made at design time may not hold at runtime due to uncertainty.
Mechanisms must be put in place that can dynamically learn new models of these assumptions and use them to make decisions about missions, configurations, etc.
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
Presentation for "Heavy Flavour Data Mining workshop", February 18-19, University of Zurich. I discuss the solution that won Physics Prize of Flavours of Physics challenge organized by CERN, Yandex, Intel at Kaggle.
Calculation Consulting provides machine learning and AI consulting services, specializing in search relevance and personalized recommendations. Dr. Charles Martin has over 20 years of experience developing machine learning algorithms and models for companies including eBay, Walmart Labs, and Fortune 500 companies. Calculation Consulting's services include developing learning to rank models, text feature engineering, and using neural embeddings and transfer learning to improve search and recommendation systems.
The document presents an algorithm for cooperative particle filtering for sensor network localization. It describes a distributed cooperative particle filter (CoopPF) that allows nodes to estimate their unknown locations by exploiting inter-node ranging measurements and communicating location probability distributions. The algorithm factorizes weight calculations to allow an iterative distributed implementation. It also proposes parametric distribution approximations to further reduce communication costs. Simulation results show the CoopPF and variants achieve accurate localization and perform better than existing methods in terms of mean square error over time and ranging noise levels.
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
https://arxiv.org/abs/1606.06543
Finding optimal configurations for Stream Processing Systems (SPS) is a challenging problem due to the large number of parameters that can influence their performance and the lack of analytical models to anticipate the effect of a change. To tackle this issue, we consider tuning methods where an experimenter is given a limited budget of experiments and needs to carefully allocate this budget to find optimal configurations. We propose in this setting Bayesian Optimization for Configuration Optimization (BO4CO), an auto-tuning algorithm that leverages Gaussian Processes (GPs) to iteratively capture posterior distributions of the configuration spaces and sequentially drive the experimentation. Validation based on Apache Storm demonstrates that our approach locates optimal configurations within a limited experimental budget, with an improvement of SPS performance typically of at least an order of magnitude compared to existing configuration algorithms.
This document discusses theoretical perspectives from chemistry to explain why deep learning works. It outlines analogies between deep learning models and concepts from statistical physics such as spin glasses, the random energy model (REM), and energy landscapes. Temperature is described as a proxy for constraints on network weights. The glass transition and dynamics on energy landscapes are also discussed, as well as minimizing frustration in spin glasses and the idea of a "funneled" energy landscape with few local minima.
The document discusses research into analyzing the eigenvalue spectrum distribution (ESD) of deep neural network layer weight matrices. It is proposed that well-trained networks exhibit "heavy-tailed self-regularization" where the ESD follows a heavy-tailed distribution like a power law. A tool called WeightWatcher is introduced that analyzes layer quality by fitting the ESD to theoretical heavy-tailed distributions inspired by random matrix theory and neuroscience. WeightWatcher can detect overfitting and help accelerate training by adjusting layer learning rates.
Continuous Architecting of Stream-Based SystemsCHOOSE
Pooyan Jamshidi CHOOSE Talk 2016-11-01
Big data architectures have been gaining momentum in recent years. For instance, Twitter uses stream processing frameworks like Storm to analyse billions of tweets per minute and learn the trending topics. However, architectures that process big data involve many different components interconnected via semantically different connectors making it a difficult task for software architects to refactor the initial designs. As an aid to designers and developers, we developed OSTIA (On-the-fly Static Topology Inference Analysis) that allows: (a) visualizing big data architectures for the purpose of design-time refactoring while maintaining constraints that would only be evaluated at later stages such as deployment and run-time; (b) detecting the occurrence of common anti-patterns across big data architectures; (c) exploiting software verification techniques on the elicited architectural models. In the lecture, OSTIA will be shown on three industrial-scale case studies.
See: http://www.choose.s-i.ch/events/jamshidi-2016/
Similar to Weight watcher Bay Area ACM Feb 28, 2022 (20)
This document appears to be a presentation about WeightWatcher, an open-source tool for data-free model monitoring of deep learning models. It discusses how WeightWatcher analyzes the internal structure of models using metrics like "alpha" and power law fits, which can evaluate properties like learning capacity. WeightWatcher is presented as a way to compare different large language models without access to their training data, and its code is available on GitHub for early adopters and collaborators to use.
The document is a presentation by Dr. Charles H. Martin about data science leadership. It discusses who Dr. Martin is and his experience in data science. It then covers several topics related to building a successful data science team and practice, including understanding the maturity of an organization's data, the types of tools and infrastructure needed, and ensuring algorithmic accountability. The overall message is that effective data science requires strong leadership to develop strategies, acquire the right talent, and provide the proper resources to generate business value from data and machine learning.
Building AI Products: Delivery Vs Discovery Charles Martin
This document discusses the differences between data science and other technical roles like IT and software engineering. It notes that data scientists are focused on discovering unknown patterns in data through experimentation and hypothesis testing, rather than software deployment or coding. The document outlines challenges data scientists face related to new technologies, processes, and testing models, and provides examples of how to take a lean startup approach to data science through rapid prototyping and getting models into production quickly.
AI and Machine Learning for the Lean Start UpCharles Martin
This document discusses machine learning and artificial intelligence for startups. It compares lean startups like Aardvark, which was acquired by Google, to larger startups like eHow that had a $1 billion IPO. It discusses how funding environments shape different startup models and how machine learning was implemented differently. It also covers lessons from consulting on rapid prototyping and gaining improvements incrementally over time.
This document provides an overview of capsule networks as proposed by Geoff Hinton. It summarizes Hinton's criticisms of convolutional neural networks, including their lack of spatial equivariance and inability to distinguish pose. Hinton proposes capsule networks as an alternative, where capsules encode visual features through vector outputs and can represent the same entity at different poses through affine transformations. Capsule networks use a routing-by-agreement algorithm to determine relationships between capsules, implementing explaining away to aid in segmentation. They have shown improved performance over convolutional networks on tasks requiring pose discrimination and segmentation.
Palo alto university rotary club talk Sep 29, 2107Charles Martin
- Dr. Charles H. Martin has over 15 years of experience applying machine learning and artificial intelligence, developing algorithms for companies like Demand Media, eBay, and BlackRock.
- He discusses the history and academic roots of neural networks and deep learning, including pioneering work by researchers in the 1960s-1980s.
- The document outlines several problems and applications that deep learning is well-suited for, such as image classification, speech recognition, self-driving cars, and improving medical diagnosis. It also discusses implications for jobs, education, and data-driven decision making.
Applied machine learning for search engine relevance 3Charles Martin
The document discusses using support vector machines (SVMs) for ranking web search results, where SVMs learn weight vectors to maximize the relevance score of correct results based on training data while minimizing a multivariate loss function between item pairs. It mentions that a ranking SVM consistently improved the click rank performance on Shopping.com by a certain percentage, indicating SVMs are effective for learning document relevance in web search ranking. Large-scale linear SVMs for ranking can be solved using conjugate gradient or a cutting plane algorithm.
Calculation Consulting provides data science leadership and machine learning consulting services, with a focus on developing algorithms that can generate sustainable revenue. The company is led by Dr. Charles Martin, who has over 10 years of experience in applied machine learning and developing algorithms for companies like Demand Media. Calculation Consulting helps clients address challenges like measuring the impact of data science work, managing the data science process, and ensuring algorithmic accountability and transparency.
Calculation Consulting provides data science leadership and expertise. Led by Dr. Charles Martin, who has over 10 years of experience developing machine learning algorithms for companies like Demand Media, BlackRock, and eBay. Demand Media's machine learning algorithms created a $1 billion company but later collapsed due to overdependence on search traffic and lack of adaptation to Google's algorithm updates. Effective data science requires senior leadership, cross-functional collaboration, experimental methodology, and accountability to generate sustainable long-term revenue rather than just cost savings.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
3. calculation | consulting why deep learning works
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago, Chemical Physics
NSF Fellow in Theoretical Chemistry, UIUC
Over 15 years experience in applied Machine Learning and AI
ML algos for: Aardvark, acquired by Google (2010)
Demand Media (eHow); first $1B IPO since Google
Wall Street: BlackRock
Fortune 500: Roche, France Telecom
BigTech: eBay, Aardvark (Google), GoDaddy
Private Equity: Griffin Advisors
Alt. Energy: Anthropocene Institute (Page Family)
www.calculationconsulting.com
charles@calculationconsulting.com
(TM)
3
4. calculation | consulting why deep learning works
c|c
(TM)
(TM)
4
Michael W. Mahoney
ICSI, RISELab, Dept. of Statistics UC Berkeley
Algorithmic and statistical aspects of modern large-scale data analysis.
large-scale machine learning | randomized linear algebra
geometric network analysis | scalable implicit regularization
PhD, Yale University, computational chemical physics
SAMSI National Advisory Committee
NRC Committee on the Analysis of Massive Data
Simons Institute Fall 2013 and 2018 program on the Foundations of Data
Biennial MMDS Workshops on Algorithms for Modern Massive Data Sets
NSF/TRIPODS-funded Foundations of Data Analysis Institute at UC Berkeley
https://www.stat.berkeley.edu/~mmahoney/
mmahoney@stat.berkeley.edu
Who Are We?
5. c|c
(TM)
(TM)
5
calculation | consulting why deep learning works
Understanding deep learning requires rethinking generalization
Motivations: WeightWatcher Theory
The weightwatcher theory is a Semi-Empirical theory based on:
the Statistical Mechanics of Generalization,
Random MatrixTheory, and
the theory of Strongly Correlated Systems
Great nerdy stuff, however, I will be discussing the weightwatcher tool
and what it can do for you
7. c|c
(TM)
(TM)
7
calculation | consulting why deep learning works
Open source tool: weightwatcher
pip install weightwatcher
…
import weightwatcher as ww
watcher = ww.WeightWatcher(model=model)
results = watcher.analyze()
watcher.get_summary()
watcher.print_results()
https://github.com/CalculatedContent/WeightWatcher
8. c|c
(TM)
WeightWatcher: A diagnostic tool
(TM)
8
calculation | consulting why deep learning works
Analyze pre/trained pyTorch, TF/Keras, and ONNX models
Inspect models that are difficult to train
Gauge improvements in model performance
Predict test accuracies across different models
Detect problems when compressing or fine-tuning pretrained models
pip install weightwatcher
9. c|c
(TM)
Research: Implicit Self-Regularization in Deep Learning
(TM)
9
calculation | consulting why deep learning works
• Implicit Self-Regularization in Deep Neural Networks: Evidence from
Random Matrix Theory and Implications for Learning.
• Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large
Pre-Trained Deep Neural Networks
• Workshop: Statistical Mechanics Methods for Discovering Knowledge from
Production-Scale Neural Networks
• Predicting trends in the quality of state-of-the-art neural networks without
access to training or testing data
• (more submitted and in progress today)
(JMLR 2021)
(ICML 2019)
(KDD 2020)
(Nature Communications 2021)
Selected publications
10. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
10
calculation | consulting why deep learning works
The tail of the ESD contains the information
11. c|c
(TM)
(TM)
11
calculation | consulting why deep learning works
ESD of DNNs: detailed insight into W
Empirical Spectral Density (ESD: eigenvalues of X)
import keras
import numpy as np
import matplotlib.pyplot as plt
…
W = model.layers[i].get_weights()[0]
N,M = W.shape()
…
X = np.dot(W.T, W)/N
evals = np.linalg.eigvals(X)
plt.hist(evals, bin=100, density=True)
12. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
12
calculation | consulting why deep learning works
details_df = watcher.analyze(model=your_model)
The tool provides various plots, quality metrics, and transforms
13. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
13
calculation | consulting why deep learning works
watcher.analyze(…, plot=True, …)
14. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
14
calculation | consulting why deep learning works
Well trained laters are heavy-tailed and well shaped
GPT-2 Fits a Power Law
(or Truncated Power Law)
alpha in [2, 6]
watcher.analyze(plot=True)
Good quality of fit (D is small)
15. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
15
calculation | consulting why deep learning works
Better trained laters are more heavy-tailed and better shaped
GPT GPT-2
16. c|c
(TM)
(TM)
16
calculation | consulting why deep learning works
Heavy Tailed Metrics: GPT vs GPT2
The original GPT is poorly trained on purpose; GPT2 is well trained
alpha for every layer
smaller alpha is better
large alpha are bad fits
17. c|c
(TM)
(TM)
17
calculation | consulting why deep learning works
Power Law Universality: ImageNet
All ImageNet models display remarkable Heavy Tailed Universality
500 matrices
~50 architectures
Linear layers &
Conv2D feature maps
80-90% < 4
18. c|c
(TM)
(TM)
18
calculation | consulting why deep learning works
Random Matrix Theory: detailed insight into WL
DNN training induces breakdown of Gaussian random structure
and the onset of a new kind of heavy tailed self-regularization
Gaussian
random
matrix
Bulk+
Spikes
Heavy
Tailed
Small, older NNs
Large, modern DNNs
and/or
Small batch sizes
19. c|c
(TM)
(TM)
19
calculation | consulting why deep learning works
Random Matrix Theory: Marcenko Pastur
plus Tracy-Widom fluctuations
very crisp edges
Q
RMT says if W is a simple random Gaussian matrix,
then the ESD will have a very simple , known form
Shape depends on Q=N/M
(and variance ~ 1)
Eigenvalues tightly bounded
a few spikes may appear
21. c|c
(TM)
(TM)
21
calculation | consulting why deep learning works
Random Matrix Theory: Heavy Tailed
But if W is heavy tailed, the ESD will also have heavy tails
(i.e. its all spikes, bulk vanishes)
If W is strongly correlated , then the ESD can be modeled as if W is drawn
from a heavy tailed distribution
Nearly all pre-trained DNNs display heavy tails…as shall soon see
22. c|c
(TM)
(TM)
22
calculation | consulting why deep learning works
AlexNet,
VGG,
ResNet,
Inception,
DenseNet,
…
Heavy Tailed RMT: Scale Free ESD
All large, well trained, modern DNNs exhibit heavy tailed self-regularization
scale free
23. c|c
(TM)
(TM)
23
calculation | consulting why deep learning works
HT-SR Theory: 5+1 Phases of Training
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin, Michael W. Mahoney; JMLR 22(165):1−73, 2021.
24. c|c
(TM)
(TM)
24
calculation | consulting why deep learning works
Heavy Tailed RMT: Universality Classes
The familiar Wigner/MP Gaussian class is not the only Universality class in RMT
25. c|c
(TM)
WeightWatcher: predict trends in generalization
(TM)
25
calculation | consulting why deep learning works
Predict test accuracies across variations in hyper-parameters
The average Power Law exponent alpha
predicts generalization—at fixed depth
Smaller average-alpha is better
Better models are easier to treat
26. c|c
(TM)
WeightWatcher: Shape vs Scale metrics
(TM)
26
calculation | consulting why deep learning works
Purely norm-based (scale) metrics (from SLT) can be correlated with depth
but anti-correlated with hyper-parameter changes
27. c|c
(TM)
WeightWatcher: treat architecture changes
(TM)
27
calculation | consulting why deep learning works
Predict test accuracies across variations in hyper-parameters and depth
The alpha-hat metric combines
shape and scale metrics
and corrects
for different depths (grey line)
can be derived from theory…
28. c|c
(TM)
WeightWatcher: predict test accuracies
(TM)
28
calculation | consulting why deep learning works
alpha-hat works for 100s of different CV and NLP models
(Nature Communications 2021)
We do not have access to
The training or test data
But we can still predict
trends in the generalization
29. c|c
(TM)
(TM)
29
calculation | consulting why deep learning works
Predicting test accuracies: Heavy-tailed shape metrics
The heavy tailed metrics perform best
Weighted Alpha Alpha (Shatten) Norm
30. c|c
(TM)
WeightWatcher: predict test accuracies
(TM)
30
calculation | consulting why deep learning works
ResNet, DenseNet, etc.
(Nature Communications 2021)
32. c|c
(TM)
(TM)
32
calculation | consulting why deep learning works
Experiments: just apply to pre-trained Models
LeNet5 (1998)
AlexNet (2012)
InceptionV3 (2014)
ResNet (2015)
…
DenseNet201 (2018)
https://medium.com/@siddharthdas_32104/
cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5
Conv2D MaxPool Conv2D MaxPool FC FC
33. c|c
(TM)
(TM)
33
calculation | consulting why deep learning works
Predicting test accuracies: 100 pretrained models
The heavy tailed metrics perform best
https://github.com/osmr/imgclsmob
From an open source sandbox of
nearly 500 pretrained CV models
(picked >=5 models / regression)
34. c|c
(TM)
(TM)
34
calculation | consulting why deep learning works
Correlation Flow: CV Models
We can study correlation flow looking at vs. depth
VGG ResNet DenseNet
36. c|c
(TM)
WeightWatcher: global and local convexity metrics
(TM)
36
calculation | consulting why deep learning works
Smaller alpha corresponds to more convex energy landscapes
Transformers (alpha ~ 3-4 or more)
alpha 2-3 (or less)
Rational Decisions, Random Matrices and Spin Glasses" (1998)
by Galluccio, Bouchaud, and Potters:
37. c|c
(TM)
WeightWatcher: global and local convexity metrics
(TM)
37
calculation | consulting why deep learning works
When the layer alpha < 2, we think this means the layer is overfit
We suspect that the early layers
of some Convolutional Nets
may be slightly overtrained
Some alpha < 2
This is predicted from our HTSR theory
38. c|c
(TM)
WeightWatcher: scale and shape anomalies
(TM)
38
calculation | consulting why deep learning works
We can detect problems in layers not detectable otherwise
39. c|c
(TM)
(TM)
39
calculation | consulting why deep learning works
Detect potential signatures of over-fitting
WeightWatcher: Correlation Traps
watcher.analyze(plot=True, randomize=True)
40. c|c
(TM)
WeightWatcher: SVDSharpness transform
(TM)
40
calculation | consulting why deep learning works
Remove potential signatures of over-fitting
Like PAC-bounds Sharpness transform
Clips bad elements in W using RMT
clip
smoothed_model =
watcher.SVDSharpness(model=your_model)
41. c|c
(TM)
WeightWatcher: RMT-based shape metrics
(TM)
41
calculation | consulting why deep learning works
ww also includes predictive, non-parametric shape metrics
rand_distance =
jensen_shannon_distance(original_esd, random_esd)
42. c|c
(TM)
WeightWatcher: RMT-based shape metrics
(TM)
42
calculation | consulting why deep learning works
the layer rand_distance and alpha metrics are correlated
44. c|c
(TM)
WeightWatcher: more Power Law shape metrics
(TM)
44
calculation | consulting why deep learning works
watcher.analyze(…, fit=‘TPL’)
Truncated Power Law fits
fit=‘E_TPL’)
weightwatcher provides several shape (and scale) metrics
plus several more unpublished experimental options
45. c|c
(TM)
WeightWatcher: E_TPL shape metric
(TM)
45
calculation | consulting why deep learning works
the E_TPL (and rand_distance)
shape metrics track the
learning curve epoch-by-epoch
Training MT transformers
from scratch to SOTA
Extended Truncated Power Law
highly accurate results leverage the advanced shape metrics
Here, (Lambda) is the shape metric
46. c|c
(TM)
WeightWatcher: why Power Law fits ?
(TM)
46
calculation | consulting why deep learning works
Spiking (i.e real) neurons exhibit power law behavior
weightwatcher supports several PL fits
from experimental neuroscience
plus totally new shape metrics
we have invented (and published)
47. c|c
(TM)
WeightWatcher: why Power Law fits ?
(TM)
47
calculation | consulting why deep learning works
Spiking (i.e real) neurons exhibit (truncated) power law behavior
The Critical Brain Hypothesis
Evidence of Self-Organized Criticality (SOC)
Per Bak (How Nature Works)
As neural systems become more complex
they exhibit power law behavior
and then truncated power law behavior
We see exactly this behavior in DNNs
and it is predictive of learning capacity
48. c|c
(TM)
WeightWatcher: open-source, open-science
(TM)
48
calculation | consulting why deep learning works
We are looking for early adopters and collaborators
github.com/CalculatedContent/WeightWatcher
We have a Slack channel to support the tool
Please file issues
Ping me to join
50. c|c
(TM)
(TM)
50
calculation | consulting why deep learning works
Classic Set Up: Student-Teacher model
Statistical Mechanics of Learning Engle &Van den Broeck (2001)
Generalization error ~ phase space volume
Average error ~ overlap between T and J
51. c|c
(TM)
(TM)
51
calculation | consulting why deep learning works
Classic Set Up: Student-Teacher model
Statistical Mechanics of Learning Engle &Van den Broeck (2001)
Standard approach:
• Teacher (T) and Student (J) random Perceptron vectors
• Treat data as an external random Gaussian field
• Apply Hubbard–Stratonovich to get mean-field result
• Assume continuous or discrete J
• Solve for as a function of load (# data points / # parameters)
52. c|c
(TM)
(TM)
52
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
x
Continuous perceptron
Ising Perceptron
Uninteresting
Replica theory, shows phase behavior,
entropy collapse, etc
53. c|c
(TM)
(TM)
53
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
“Towards a new theory…” Martin, Milletari, & Mahoney (in preparation)
real DNN matrices:
NxM
Strongly correlated
Heavy tailed
correlation matrices
Solve for total integrated phase-space volume
54. c|c
(TM)
(TM)
54
calculation | consulting why deep learning works
New approach: HCIZ Matrix Integrals
Write the overlap as
Fix a Teacher. The integral is now over all random students J that overlap w/T
Use the following result in RMT
“Asymptotics of HCZI integrals …” Tanaka (2008)
55. c|c
(TM)
(TM)
55
calculation | consulting why deep learning works
RMT: Annealed vs Quenched averages
“A First Course in Random Matrix Theory” Potters and Bouchaud (2020)
good outside spin-glass phases
where system is trained well
We imagine averaging over all (random) students DNNs
with (correlations that) look like the teacher DNN
56. c|c
(TM)
(TM)
56
calculation | consulting why deep learning works
New interpretation: HCIZ Matrix Integrals
Generating functional
R-Transform (inverse Green’s function, via Contour Integral)
in terms of Teacher's eigenvalues , and Student’s cumulants
57. c|c
(TM)
(TM)
57
calculation | consulting why deep learning works
Some basic RMT: Greens functions
The Green’s function is the Stieltjes transform of eigenvalue distribution
Given the empirical spectral density (average eigenvalue density)
and using:
58. c|c
(TM)
(TM)
58
calculation | consulting why deep learning works
Some basic RMT: Moment generating functions
The Green’s function has poles
at the actual eigenvalues
But is analytic in the complex plane
up and away from the real axis
z
Expand in a series around z = ∞
moment generating function
59. c|c
(TM)
(TM)
59
calculation | consulting why deep learning works
Some basic RMT: R-Transforms
Which gives a (similar) moment generating function
The free-cumulant-generating function (R-transform)
is related to the Green function as
Gaussian random and very Heavy Tailed (Levy) random matrices
but which takes a simple form for both
60. c|c
(TM)
(TM)
60
calculation | consulting why deep learning works
Results: Gaussian Random Weight Matrices
“Random Matrix Theory (book)” Bouchaud and Potters (2020)
Recover the Frobenius Norm (squared) as the metric
61. c|c
(TM)
(TM)
61
calculation | consulting why deep learning works
Results: (very) Heavy Tailed Weight Matrices
“Heavy-tailed random matrices” Burda and Jukiewicz (2009)
Recover a Shatten Norm, in terms of the Heavy Tailed exponent
62. c|c
(TM)
(TM)
62
calculation | consulting why deep learning works
Application to: Heavy Tailed Weight Matrices
Some reasonable approximations give the weighted alpha metric
Q.E.D.