This document introduces WeightWatcher, an open-source tool for analyzing the eigenvalue spectrum distributions (ESD) of deep neural network weight matrices. WeightWatcher finds that well-trained networks exhibit heavy-tailed ESDs, in line with predictions from random matrix theory and the theory of strongly correlated systems. The tool can predict trends in test accuracy based on the shape of ESDs, without access to training or test data. The document provides an overview of the theoretical foundations and capabilities of WeightWatcher.
Description: WeightWatcher (WW): is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data. It can be used to:analyze pre/trained PyTorch, Keras, DNN models (Conv2D and Dense layers) monitor models, and the model layers, to see if they are over-trained or over-parameterized, predict test accuracies across different models, with or without training data, and detect potential problems when compressing or fine-tuning pre-trained models. see https://weightwatcher.ai
This Week in Machine Learning and AI Feb 2019Charles Martin
This document summarizes research into implicit self-regularization in deep neural networks. It discusses how analyzing the eigenvalue spectrum of weight matrices can provide insights into the learning dynamics. Large, well-trained modern networks exhibit heavy-tailed eigenvalue distributions rather than Gaussian distributions. This heavy-tailed behavior acts as a form of self-regularization and may explain why large networks generalize well despite having many parameters. The document presents analysis of various networks showing this heavy-tailed behavior is universal across different architectures and datasets. It proposes that metrics based on the heavy-tailed behavior could predict a network's generalization performance without access to test data.
Stanford ICME Lecture on Why Deep Learning WorksCharles Martin
Random Matrix Theory (RMT) is applied to analyze the weight matrices
of Deep Neural Networks (DNNs), including production quality,
pre-trained models, and smaller models trained from scratch. Empirical
and theoretical results indicate that the DNN training process itself
implements a form of self-regularization, evident in the empirical
spectral density (ESD) of DNN layer matrices. To understand this, we
provide a phenomenology to identify 5+1 Phases of Training,
corresponding to increasing amounts of implicit self-regularization.
For smaller and/or older DNNs, this implicit self-regularization is
like traditional Tikhonov regularization, with a "size scale"
separating signal from noise. For state-of-the-art DNNs, however, we
identify a novel form of heavy-tailed self-regularization, similar to
the self-organization seen in the statistical physics of disordered systems.
To that end, building on the statistical mechanics of generalization,
and applying recent results from RMT, we derive a new VC-like
complexity metric that resembles the familiar product norms, but is
suitable for studying average-case generalization behavior in real
systems. We then demonstrate its effectiveness by testing how well
this new metric correlates with trends in the reported test accuracies
across models for over 450 pretrained DNNs covering a range of data
sets and architectures.
The document is a presentation by Dr. Charles H. Martin about data science leadership. It discusses who Dr. Martin is and his experience in data science. It then covers several topics related to building a successful data science team and practice, including understanding the maturity of an organization's data, the types of tools and infrastructure needed, and ensuring algorithmic accountability. The overall message is that effective data science requires strong leadership to develop strategies, acquire the right talent, and provide the proper resources to generate business value from data and machine learning.
Calculation Consulting provides machine learning and AI consulting services, specializing in search relevance and personalized recommendations. Dr. Charles Martin has over 20 years of experience developing machine learning algorithms and models for companies including eBay, Walmart Labs, and Fortune 500 companies. Calculation Consulting's services include developing learning to rank models, text feature engineering, and using neural embeddings and transfer learning to improve search and recommendation systems.
Why Deep Learning Works: Self Regularization in Deep Neural Networks Charles Martin
Talk given on June 8, 2018 at UC Berkeley / NERSC
In Collaboration with Michael Mahoney, UC Berkeley
National Energy Research Scientific Computing Center
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Description: WeightWatcher (WW): is an open-source, diagnostic tool for analyzing Deep Neural Networks (DNN), without needing access to training or even test data. It can be used to:analyze pre/trained PyTorch, Keras, DNN models (Conv2D and Dense layers) monitor models, and the model layers, to see if they are over-trained or over-parameterized, predict test accuracies across different models, with or without training data, and detect potential problems when compressing or fine-tuning pre-trained models. see https://weightwatcher.ai
This Week in Machine Learning and AI Feb 2019Charles Martin
This document summarizes research into implicit self-regularization in deep neural networks. It discusses how analyzing the eigenvalue spectrum of weight matrices can provide insights into the learning dynamics. Large, well-trained modern networks exhibit heavy-tailed eigenvalue distributions rather than Gaussian distributions. This heavy-tailed behavior acts as a form of self-regularization and may explain why large networks generalize well despite having many parameters. The document presents analysis of various networks showing this heavy-tailed behavior is universal across different architectures and datasets. It proposes that metrics based on the heavy-tailed behavior could predict a network's generalization performance without access to test data.
Stanford ICME Lecture on Why Deep Learning WorksCharles Martin
Random Matrix Theory (RMT) is applied to analyze the weight matrices
of Deep Neural Networks (DNNs), including production quality,
pre-trained models, and smaller models trained from scratch. Empirical
and theoretical results indicate that the DNN training process itself
implements a form of self-regularization, evident in the empirical
spectral density (ESD) of DNN layer matrices. To understand this, we
provide a phenomenology to identify 5+1 Phases of Training,
corresponding to increasing amounts of implicit self-regularization.
For smaller and/or older DNNs, this implicit self-regularization is
like traditional Tikhonov regularization, with a "size scale"
separating signal from noise. For state-of-the-art DNNs, however, we
identify a novel form of heavy-tailed self-regularization, similar to
the self-organization seen in the statistical physics of disordered systems.
To that end, building on the statistical mechanics of generalization,
and applying recent results from RMT, we derive a new VC-like
complexity metric that resembles the familiar product norms, but is
suitable for studying average-case generalization behavior in real
systems. We then demonstrate its effectiveness by testing how well
this new metric correlates with trends in the reported test accuracies
across models for over 450 pretrained DNNs covering a range of data
sets and architectures.
The document is a presentation by Dr. Charles H. Martin about data science leadership. It discusses who Dr. Martin is and his experience in data science. It then covers several topics related to building a successful data science team and practice, including understanding the maturity of an organization's data, the types of tools and infrastructure needed, and ensuring algorithmic accountability. The overall message is that effective data science requires strong leadership to develop strategies, acquire the right talent, and provide the proper resources to generate business value from data and machine learning.
Calculation Consulting provides machine learning and AI consulting services, specializing in search relevance and personalized recommendations. Dr. Charles Martin has over 20 years of experience developing machine learning algorithms and models for companies including eBay, Walmart Labs, and Fortune 500 companies. Calculation Consulting's services include developing learning to rank models, text feature engineering, and using neural embeddings and transfer learning to improve search and recommendation systems.
Why Deep Learning Works: Self Regularization in Deep Neural Networks Charles Martin
Talk given on June 8, 2018 at UC Berkeley / NERSC
In Collaboration with Michael Mahoney, UC Berkeley
National Energy Research Scientific Computing Center
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyCharles Martin
Talk given on Dec 13, 2018 at ICSI, UC Berkeley
http://www.icsi.berkeley.edu/icsi/events/2018/12/regularization-neural-networks
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit self-regularization. For smaller and/or older DNNs, this implicit self-regularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of heavy-tailed self-regularization, similar to the self-organization seen in the statistical physics of disordered systems. Moreover, we can use these heavy tailed results to form a VC-like average case complexity metric that resembles the product norm used in analyzing toy NNs, and we can use this to predict the test accuracy of pretrained DNNs without peeking at the test data.
Why Deep Learning Works: Self Regularization in Deep Neural NetworksCharles Martin
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Why Deep Learning Works: Self Regularization in Deep Neural NetworksCharles Martin
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
In Collaboration with Michael Mahoney, UC Berkeley
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
This document provides an overview of capsule networks as proposed by Geoff Hinton. It summarizes Hinton's criticisms of convolutional neural networks, including their lack of spatial equivariance and inability to distinguish pose. Hinton proposes capsule networks as an alternative, where capsules encode visual features through vector outputs and can represent the same entity at different poses through affine transformations. Capsule networks use a routing-by-agreement algorithm to determine relationships between capsules, implementing explaining away to aid in segmentation. They have shown improved performance over convolutional networks on tasks requiring pose discrimination and segmentation.
This document discusses theoretical perspectives from chemistry to explain why deep learning works. It outlines analogies between deep learning models and concepts from statistical physics such as spin glasses, the random energy model (REM), and energy landscapes. Temperature is described as a proxy for constraints on network weights. The glass transition and dynamics on energy landscapes are also discussed, as well as minimizing frustration in spin glasses and the idea of a "funneled" energy landscape with few local minima.
Integrating the TDBU-ETSAP models in MCP formatIEA-ETSAP
The document discusses integrating energy system optimization models like TIMES with macroeconomic models like MSA using the Mathematical Programming with Complementarity Constraints (MCP) format. Key points:
1) An operational version of MSA has been developed in MCP format and linked/tested with TIMES, producing the same results as the original optimization.
2) Formulating both TIMES and MSA in MCP allows them to be fully integrated while maintaining consistency. This has advantages like introducing multiple objective sectors.
3) Next steps include testing the integrated TIMES-MSA MCP model with other TIAM models and the decomposition algorithm, as well as exploring other integration approaches like linking CGE and T
Prof Alba shared parallel biological sequence alignment with the Smith-Waterman algorithm and present CUDAlign, our fine-grained multi-GPU strategy.. This project is part of Research project at University of Brasilia
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
In this paper the Delay Computation method for Common Sub expression Elimination algorithm is being implemented on Cyclotomic Fast Fourier Transform. The Common Sub Expression Elimination algorithm is combined with the delay computing method and is known as Gate Level Delay Computation with Common Sub expression Elimination Algorithm. Common sub expression elimination is effective
optimization method used to reduce adders in cyclotomic Fourier transform. The delay computing method is based on delay matrix and suitable for implementation with computers. The Gate level delay computation method is used to find critical path delay and it is analyzed on various finite field elements. The presented algorithm is established through a case study in Cyclotomic Fast Fourier Transform over finite field. If Cyclotomic Fast Fourier Transform is implemented directly then the system will have high additive complexities. So by using GLDC-CSE algorithm on cyclotomic fast Fourier transform, the additive
complexities will be reduced and also the area and area delay product will be reduced.
Practical tips for handling noisy data and annotaitonRyuichiKanoh
The document summarizes a KaggleDays workshop on techniques for handling noisy data and annotation. It includes an agenda covering an introduction, experiment setup, and techniques for learning with noisy datasets. The techniques discussed are mixup, using large batch sizes, and distillation. For mixup, virtual training samples are constructed by linearly interpolating real samples and labels. Large batch sizes help because noise from random labels cancels out within a batch. Distillation trains a student network using predictions from a pre-trained teacher network to ease training. Code links and examples of applying the techniques in competitions are also provided.
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...Yuko Kuroki (黒木祐子)
The document describes a new model called combinatorial pure exploration with partial linear feedback (CPE-PL) for decision making problems with combinatorial actions and limited feedback. CPE-PL generalizes previous models by allowing for nonlinear rewards and more limited feedback through a transformation matrix. The document proposes the first static algorithm for CPE-PL that provides sample complexity guarantees and runs faster than existing approaches. It also introduces a two-phased adaptive algorithm for the special case of CPE-BL with full-bandit linear feedback and proves its sample complexity is optimal up to logarithmic factors.
FixMatch:simplifying semi supervised learning with consistency and confidenceLEE HOSEONG
This document summarizes the FixMatch paper, which proposes a simple semi-supervised learning method that achieves state-of-the-art results. FixMatch combines pseudo-labeling and consistency regularization by generating pseudo-labels for unlabeled data using a model's prediction on a weakly augmented version and enforcing consistency on a strongly augmented version. Extensive ablation studies show that FixMatch outperforms previous methods on standard benchmarks even with limited labeled data and identifies consistency regularization and pseudo-labeling as the most important factors for its success.
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...Mokhtar SELLAMI
The document describes a minimalistic spatial model of vegetation structuring in humid savanna environments. It aims to illustrate the emergence of spatial patterns and highlight fire resistance strategies of trees. The model builds on previous work, incorporating a spatial component to represent how tree and grass biomass change over space and time based on logistic growth, nonlocal competition/cooperation, and fire mortality probabilities dependent on tree density. Mathematical analysis and numerical illustration of the model will provide insights into savanna vegetation structuring.
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
Modern software systems are now being built to be used in dynamic environments utilizing configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and, therefore, we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost.
This document introduces WeightWatcher, a tool for analyzing the eigenvalues of weight matrices in deep neural networks. It was created by Dr. Charles H. Martin and Calculation Consulting to provide "data free diagnostics" for deep learning models using insights from random matrix theory and statistical mechanics. WeightWatcher can analyze pre-trained models to evaluate layer quality, predict generalization performance, and compare different network architectures, without access to the training data. The document provides an overview of the theoretical foundations and empirical evidence supporting WeightWatcher's methods.
This document introduces Calculation Consulting, a firm that provides expertise in applied machine learning and artificial intelligence. The firm was founded by Dr. Charles H. Martin and Michael W. Mahoney, who have extensive academic and industry experience in machine learning algorithms. The document then discusses Calculation Consulting's work on developing a new semi-empirical theory called WeightWatcher to better understand why deep learning is effective, such as their tool that can predict trends in test accuracies for common deep learning models without training or test data.
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyCharles Martin
Talk given on Dec 13, 2018 at ICSI, UC Berkeley
http://www.icsi.berkeley.edu/icsi/events/2018/12/regularization-neural-networks
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models and smaller models trained from scratch. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of self-regularization, implicitly sculpting a more regularized energy or penalty landscape. In particular, the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, and applying them to these empirical results, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of implicit self-regularization. For smaller and/or older DNNs, this implicit self-regularization is like traditional Tikhonov regularization, in that there appears to be a ``size scale'' separating signal from noise. For state-of-the-art DNNs, however, we identify a novel form of heavy-tailed self-regularization, similar to the self-organization seen in the statistical physics of disordered systems. Moreover, we can use these heavy tailed results to form a VC-like average case complexity metric that resembles the product norm used in analyzing toy NNs, and we can use this to predict the test accuracy of pretrained DNNs without peeking at the test data.
Why Deep Learning Works: Self Regularization in Deep Neural NetworksCharles Martin
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
Why Deep Learning Works: Self Regularization in Deep Neural NetworksCharles Martin
Talk (to be given) June 8, 2018 at UC Berkeley / NERSC
In Collaboration with Michael Mahoney, UC Berkeley
Empirical results, using the machinery of Random Matrix Theory (RMT), are presented that are aimed at clarifying and resolving some of the puzzling and seemingly-contradictory aspects of deep neural networks (DNNs). We apply RMT to several well known pre-trained models: LeNet5, AlexNet, and Inception V3, as well as 2 small, toy models. We show that the DNN training process itself implicitly implements a form of self-regularization associated with the entropy collapse / information bottleneck. We find that the self-regularization in small models like LeNet5, resembles the familar Tikhonov regularization, whereas large, modern deep networks display a new kind of heavy tailed self-regularization. We characterize self-regularization using RMT by identifying a taxonomy of the 5+1 phases of training. Then, with our toy models, we show that even in the absence of any explicit regularization mechanism, the DNN training process itself leads to more and more capacity-controlled models. Importantly, this phenomenon is strongly affected by the many knobs that are used to optimize DNN training. In particular, we can induce heavy tailed self-regularization by adjusting the batch size in training, thereby exploiting the generalization gap phenomena unique to DNNs. We argue that this heavy tailed self-regularization has practical implications both designing better DNNs and deep theoretical implications for understanding the complex DNN Energy landscape / optimization problem.
This document provides an overview of capsule networks as proposed by Geoff Hinton. It summarizes Hinton's criticisms of convolutional neural networks, including their lack of spatial equivariance and inability to distinguish pose. Hinton proposes capsule networks as an alternative, where capsules encode visual features through vector outputs and can represent the same entity at different poses through affine transformations. Capsule networks use a routing-by-agreement algorithm to determine relationships between capsules, implementing explaining away to aid in segmentation. They have shown improved performance over convolutional networks on tasks requiring pose discrimination and segmentation.
This document discusses theoretical perspectives from chemistry to explain why deep learning works. It outlines analogies between deep learning models and concepts from statistical physics such as spin glasses, the random energy model (REM), and energy landscapes. Temperature is described as a proxy for constraints on network weights. The glass transition and dynamics on energy landscapes are also discussed, as well as minimizing frustration in spin glasses and the idea of a "funneled" energy landscape with few local minima.
Integrating the TDBU-ETSAP models in MCP formatIEA-ETSAP
The document discusses integrating energy system optimization models like TIMES with macroeconomic models like MSA using the Mathematical Programming with Complementarity Constraints (MCP) format. Key points:
1) An operational version of MSA has been developed in MCP format and linked/tested with TIMES, producing the same results as the original optimization.
2) Formulating both TIMES and MSA in MCP allows them to be fully integrated while maintaining consistency. This has advantages like introducing multiple objective sectors.
3) Next steps include testing the integrated TIMES-MSA MCP model with other TIAM models and the decomposition algorithm, as well as exploring other integration approaches like linking CGE and T
Prof Alba shared parallel biological sequence alignment with the Smith-Waterman algorithm and present CUDAlign, our fine-grained multi-GPU strategy.. This project is part of Research project at University of Brasilia
DESIGN OF DELAY COMPUTATION METHOD FOR CYCLOTOMIC FAST FOURIER TRANSFORMsipij
In this paper the Delay Computation method for Common Sub expression Elimination algorithm is being implemented on Cyclotomic Fast Fourier Transform. The Common Sub Expression Elimination algorithm is combined with the delay computing method and is known as Gate Level Delay Computation with Common Sub expression Elimination Algorithm. Common sub expression elimination is effective
optimization method used to reduce adders in cyclotomic Fourier transform. The delay computing method is based on delay matrix and suitable for implementation with computers. The Gate level delay computation method is used to find critical path delay and it is analyzed on various finite field elements. The presented algorithm is established through a case study in Cyclotomic Fast Fourier Transform over finite field. If Cyclotomic Fast Fourier Transform is implemented directly then the system will have high additive complexities. So by using GLDC-CSE algorithm on cyclotomic fast Fourier transform, the additive
complexities will be reduced and also the area and area delay product will be reduced.
Practical tips for handling noisy data and annotaitonRyuichiKanoh
The document summarizes a KaggleDays workshop on techniques for handling noisy data and annotation. It includes an agenda covering an introduction, experiment setup, and techniques for learning with noisy datasets. The techniques discussed are mixup, using large batch sizes, and distillation. For mixup, virtual training samples are constructed by linearly interpolating real samples and labels. Large batch sizes help because noise from random labels cancels out within a batch. Distillation trains a student network using predictions from a pre-trained teacher network to ease training. Code links and examples of applying the techniques in competitions are also provided.
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...Yuko Kuroki (黒木祐子)
The document describes a new model called combinatorial pure exploration with partial linear feedback (CPE-PL) for decision making problems with combinatorial actions and limited feedback. CPE-PL generalizes previous models by allowing for nonlinear rewards and more limited feedback through a transformation matrix. The document proposes the first static algorithm for CPE-PL that provides sample complexity guarantees and runs faster than existing approaches. It also introduces a two-phased adaptive algorithm for the special case of CPE-BL with full-bandit linear feedback and proves its sample complexity is optimal up to logarithmic factors.
FixMatch:simplifying semi supervised learning with consistency and confidenceLEE HOSEONG
This document summarizes the FixMatch paper, which proposes a simple semi-supervised learning method that achieves state-of-the-art results. FixMatch combines pseudo-labeling and consistency regularization by generating pseudo-labels for unlabeled data using a model's prediction on a weakly augmented version and enforcing consistency on a strongly augmented version. Extensive ablation studies show that FixMatch outperforms previous methods on standard benchmarks even with limited labeled data and identifies consistency regularization and pseudo-labeling as the most important factors for its success.
Cari 2020: A minimalistic model of spatial structuration of humid savanna veg...Mokhtar SELLAMI
The document describes a minimalistic spatial model of vegetation structuring in humid savanna environments. It aims to illustrate the emergence of spatial patterns and highlight fire resistance strategies of trees. The model builds on previous work, incorporating a spatial component to represent how tree and grass biomass change over space and time based on logistic growth, nonlocal competition/cooperation, and fire mortality probabilities dependent on tree density. Mathematical analysis and numerical illustration of the model will provide insights into savanna vegetation structuring.
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
Modern software systems are now being built to be used in dynamic environments utilizing configuration capabilities to adapt to changes and external uncertainties. In a self-adaptation context, we are often interested in reasoning about the performance of the systems under different configurations. Usually, we learn a black-box model based on real measurements to predict the performance of the system given a specific configuration. However, as modern systems become more complex, there are many configuration parameters that may interact and, therefore, we end up learning an exponentially large configuration space. Naturally, this does not scale when relying on real measurements in the actual changing environment. We propose a different solution: Instead of taking the measurements from the real system, we learn the model using samples from other sources, such as simulators that approximate performance of the real system at low cost.
This document introduces WeightWatcher, a tool for analyzing the eigenvalues of weight matrices in deep neural networks. It was created by Dr. Charles H. Martin and Calculation Consulting to provide "data free diagnostics" for deep learning models using insights from random matrix theory and statistical mechanics. WeightWatcher can analyze pre-trained models to evaluate layer quality, predict generalization performance, and compare different network architectures, without access to the training data. The document provides an overview of the theoretical foundations and empirical evidence supporting WeightWatcher's methods.
This document introduces Calculation Consulting, a firm that provides expertise in applied machine learning and artificial intelligence. The firm was founded by Dr. Charles H. Martin and Michael W. Mahoney, who have extensive academic and industry experience in machine learning algorithms. The document then discusses Calculation Consulting's work on developing a new semi-empirical theory called WeightWatcher to better understand why deep learning is effective, such as their tool that can predict trends in test accuracies for common deep learning models without training or test data.
1) Calculation Consulting is led by Dr. Charles H. Martin, who has over 10 years of experience in applied machine learning.
2) They developed a technique called "weightwatcher" to predict test accuracy on over 100 neural network models using only the training data.
3) Weightwatcher was able to predict generalization by addressing Simpson's Paradox and accounting for how network depth and solver hyperparameters interact in complex ways.
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
This document presents a study comparing Long Short-Term Memory (LSTM) architectures for next frame forecasting in satellite image time series data. Three models - ConvLSTM, Stack-LSTM and CNN-LSTM - were implemented and evaluated based on training loss, time and structural similarity between predicted and actual images. The CNN-LSTM architecture was found to provide the best performance, achieving accurate predictions while requiring less processing time than ConvLSTM for higher resolution images. Overall, the study demonstrates the suitability of deep learning models like CNN-LSTM for predictive tasks using earth observation satellite imagery time series data.
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
Presentation for "Heavy Flavour Data Mining workshop", February 18-19, University of Zurich. I discuss the solution that won Physics Prize of Flavours of Physics challenge organized by CERN, Yandex, Intel at Kaggle.
The document discusses research into analyzing the eigenvalue spectrum distribution (ESD) of deep neural network layer weight matrices. It is proposed that well-trained networks exhibit "heavy-tailed self-regularization" where the ESD follows a heavy-tailed distribution like a power law. A tool called WeightWatcher is introduced that analyzes layer quality by fitting the ESD to theoretical heavy-tailed distributions inspired by random matrix theory and neuroscience. WeightWatcher can detect overfitting and help accelerate training by adjusting layer learning rates.
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPINGijdkp
An intrinsic problem of classifiers based on machine learning (ML) methods is that their learning time
grows as the size and complexity of the training dataset increases. For this reason, it is important to have
efficient computational methods and algorithms that can be applied on large datasets, such that it is still
possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper
a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is
combined with Expectation, Maximization (EM) algorithm to develop an efficient Hidden Markov Model
(HMM) training. The idea of the proposed process consists of two steps. In the first step, training instances
with similar inputs are clustered and a weight factor which represents the frequency of these instances is
assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity
function to cluster similar examples. In the second step, all formulas in the classical HMM training
algorithm (EM) associated with the number of training instances are modified to include the weight factor
in appropriate terms. This process significantly accelerates HMM training while maintaining the same
initial, transition and emission probabilities matrixes as those obtained with the classical HMM training
algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set,
speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach
is not limited to training HMMs, but it can be employed for a large variety of MLs methods.
This document provides an overview of machine learning concepts related to overfitting and model selection. It discusses overfitting in k-nearest neighbors and regression models. It introduces bias-variance decomposition and structural risk minimization. Methods for controlling overfitting like cross-validation, regularization, feature selection and model selection are covered. The concepts of consistency, model convergence speed, and strategies for controlling generalization capacity are explained.
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
The ability to mine and extract useful information automatically, from large datasets, is a
common concern for organizations (having large datasets), over the last few decades. Over the
internet, data is vastly increasing gradually and consequently the capacity to collect and store
very large data is significantly increasing.
Existing clustering algorithms are not always efficient and accurate in solving clustering
problems for large datasets.
However, the development of accurate and fast data classification algorithms for very large
scale datasets is still a challenge. In this paper, various algorithms and techniques especially,
approach using non-smooth optimization formulation of the clustering problem, are proposed
for solving the minimum sum-of-squares clustering problems in very large datasets. This
research also develops accurate and real time L2-DC algorithm based with the incremental
approach to solve the minimum
Combinatorial optimization and deep reinforcement learning민재 정
The document discusses using deep learning approaches for solving combinatorial optimization problems like task allocation. It reviews different reinforcement learning methods that have been applied to problems like the vehicle routing problem using pointer networks, transformers, and graph neural networks. Future work opportunities are identified in applying these deep learning techniques to multi-vehicle routing problems and using them to solve specific task allocation scenarios.
Accelerated life testing plans are designed under multiple objective consideration, with the resulting Pareto optimal solutions classified and reduced using neural network and data envelopement analysis, respectively.
This document presents a framework for verifying the safety of classification decisions made by deep neural networks. It defines safety as the network producing the same output classification for an input and any perturbations of that input within a bounded region. The framework uses satisfiability modulo theories (SMT) to formally verify safety by attempting to find an adversarial perturbation that causes misclassification. It has been tested on several image classification networks and datasets. The framework provides a method to automatically verify safety properties of deep neural networks.
How might machine learning help advance solar PV research?Anubhav Jain
Machine learning techniques can help optimize solar PV systems in several ways:
1) Clear sky detection algorithms using ML were developed to more accurately classify sky conditions from irradiance data, improving degradation rate calculations.
2) Site-specific modeling of module voltages over time, validated with field data, allows more optimal string sizing compared to traditional worst-case assumptions.
3) ML and data-driven approaches may help optimize other aspects of solar plant design like climate zone definitions and extracting module parameters from production data.
The document presents an algorithm for cooperative particle filtering for sensor network localization. It describes a distributed cooperative particle filter (CoopPF) that allows nodes to estimate their unknown locations by exploiting inter-node ranging measurements and communicating location probability distributions. The algorithm factorizes weight calculations to allow an iterative distributed implementation. It also proposes parametric distribution approximations to further reduce communication costs. Simulation results show the CoopPF and variants achieve accurate localization and perform better than existing methods in terms of mean square error over time and ranging noise levels.
2021 itu challenge_reinforcement_learningLASSEMedia
This document discusses using reinforcement learning for beam selection in wireless communication networks. It proposes a simulation environment called "RadioStrike" built in Unreal Engine to generate data and train reinforcement learning agents. The document provides background on machine learning for communications, beam selection techniques, and introduces some basic reinforcement learning concepts. It also outlines strategies for participants in the ITU ML5G challenge to approach the beam selection reinforcement learning problem, including providing sample code and simpler baseline problems to get started.
A simple framework for contrastive learning of visual representationsDevansh16
Link: https://machine-learning-made-simple.medium.com/learnings-from-simclr-a-framework-contrastive-learning-for-visual-representations-6c145a5d8e99
If you'd like to discuss something, text me on LinkedIn, IG, or Twitter. To support me, please use my referral link to Robinhood. It's completely free, and we both get a free stock. Not using it is literally losing out on free money.
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let's connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.
Comments: ICML'2020. Code and pretrained models at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as: arXiv:2002.05709 [cs.LG]
(or arXiv:2002.05709v3 [cs.LG] for this version)
Submission history
From: Ting Chen [view email]
[v1] Thu, 13 Feb 2020 18:50:45 UTC (5,093 KB)
[v2] Mon, 30 Mar 2020 15:32:51 UTC (5,047 KB)
[v3] Wed, 1 Jul 2020 00:09:08 UTC (5,829 KB)
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track. In particular, we study the impact of incorporating: (i) Explicit term matching to complement matching based on learned representations (i.e., the “Duet principle”), (ii) query term independence (i.e., the “QTI assumption”) to scale the model to the full retrieval setting, and (iii) the ORCAS click data as an additional document description field. We find evidence which supports that all three aforementioned strategies can lead to improved retrieval quality.
This document provides a practical guide for using support vector machines (SVMs) for classification tasks. It recommends beginners follow a simple procedure: 1) preprocess data by converting categorical features to numeric and scaling attributes, 2) use a radial basis function kernel, 3) perform cross-validation to select optimal values for hyperparameters C and γ, and 4) train the full model on the training set using the best hyperparameters. The guide explains why this procedure often provides reasonable results for novices and illustrates it using examples of real-world classification problems.
This document proposes a simple procedure for beginners to obtain reasonable results when using support vector machines (SVMs) for classification tasks. The procedure involves preprocessing data through scaling, using a radial basis function kernel, selecting model parameters through cross-validation grid search, and training the full model on the preprocessed data. The document provides examples applying this procedure to real-world datasets, demonstrating improved accuracy over approaches without careful preprocessing and parameter selection.
This document appears to be a presentation about WeightWatcher, an open-source tool for data-free model monitoring of deep learning models. It discusses how WeightWatcher analyzes the internal structure of models using metrics like "alpha" and power law fits, which can evaluate properties like learning capacity. WeightWatcher is presented as a way to compare different large language models without access to their training data, and its code is available on GitHub for early adopters and collaborators to use.
Building AI Products: Delivery Vs Discovery Charles Martin
This document discusses the differences between data science and other technical roles like IT and software engineering. It notes that data scientists are focused on discovering unknown patterns in data through experimentation and hypothesis testing, rather than software deployment or coding. The document outlines challenges data scientists face related to new technologies, processes, and testing models, and provides examples of how to take a lean startup approach to data science through rapid prototyping and getting models into production quickly.
AI and Machine Learning for the Lean Start UpCharles Martin
This document discusses machine learning and artificial intelligence for startups. It compares lean startups like Aardvark, which was acquired by Google, to larger startups like eHow that had a $1 billion IPO. It discusses how funding environments shape different startup models and how machine learning was implemented differently. It also covers lessons from consulting on rapid prototyping and gaining improvements incrementally over time.
Palo alto university rotary club talk Sep 29, 2107Charles Martin
- Dr. Charles H. Martin has over 15 years of experience applying machine learning and artificial intelligence, developing algorithms for companies like Demand Media, eBay, and BlackRock.
- He discusses the history and academic roots of neural networks and deep learning, including pioneering work by researchers in the 1960s-1980s.
- The document outlines several problems and applications that deep learning is well-suited for, such as image classification, speech recognition, self-driving cars, and improving medical diagnosis. It also discusses implications for jobs, education, and data-driven decision making.
Applied machine learning for search engine relevance 3Charles Martin
The document discusses using support vector machines (SVMs) for ranking web search results, where SVMs learn weight vectors to maximize the relevance score of correct results based on training data while minimizing a multivariate loss function between item pairs. It mentions that a ranking SVM consistently improved the click rank performance on Shopping.com by a certain percentage, indicating SVMs are effective for learning document relevance in web search ranking. Large-scale linear SVMs for ranking can be solved using conjugate gradient or a cutting plane algorithm.
Calculation Consulting provides data science leadership and machine learning consulting services, with a focus on developing algorithms that can generate sustainable revenue. The company is led by Dr. Charles Martin, who has over 10 years of experience in applied machine learning and developing algorithms for companies like Demand Media. Calculation Consulting helps clients address challenges like measuring the impact of data science work, managing the data science process, and ensuring algorithmic accountability and transparency.
Calculation Consulting provides data science leadership and expertise. Led by Dr. Charles Martin, who has over 10 years of experience developing machine learning algorithms for companies like Demand Media, BlackRock, and eBay. Demand Media's machine learning algorithms created a $1 billion company but later collapsed due to overdependence on search traffic and lack of adaptation to Google's algorithm updates. Effective data science requires senior leadership, cross-functional collaboration, experimental methodology, and accountability to generate sustainable long-term revenue rather than just cost savings.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
3. calculation | consulting why deep learning works
Who Are We?
c|c
(TM)
Dr. Charles H. Martin, PhD
University of Chicago, Chemical Physics
NSF Fellow in Theoretical Chemistry, UIUC
Over 15 years experience in applied Machine Learning and AI
ML algos for: Aardvark, acquired by Google (2010)
Demand Media (eHow); first $1B IPO since Google
Wall Street: Barclays, BlackRock
Fortune 500: Roche, France Telecom,Walmart
BigTech: eBay, Aardvark (Google), GoDaddy
Private Equity: Griffin Advisors
Alt. Energy: Anthropocene Institute (Page Family)
www.calculationconsulting.com
charles@calculationconsulting.com
(TM)
3
4. calculation | consulting why deep learning works
c|c
(TM)
(TM)
4
Michael W. Mahoney
ICSI, RISELab, Dept. of Statistics UC Berkeley
Algorithmic and statistical aspects of modern large-scale data analysis.
large-scale machine learning | randomized linear algebra
geometric network analysis | scalable implicit regularization
PhD, Yale University, computational chemical physics
SAMSI National Advisory Committee
NRC Committee on the Analysis of Massive Data
Simons Institute Fall 2013 and 2018 program on the Foundations of Data
Biennial MMDS Workshops on Algorithms for Modern Massive Data Sets
NSF/TRIPODS-funded Foundations of Data Analysis Institute at UC Berkeley
https://www.stat.berkeley.edu/~mmahoney/
mmahoney@stat.berkeley.edu
Who Are We?
6. c|c
(TM)
(TM)
6
calculation | consulting why deep learning works
Understanding deep learning requires rethinking generalization
Motivations: WeightWatcher Theory
The weightwatcher theory is a Semi-Empirical theory based on:
the Statistical Mechanics of Generalization,
Random MatrixTheory, and
the theory of Strongly Correlated Systems
7. c|c
(TM)
Research: Implicit Self-Regularization in Deep Learning
(TM)
7
calculation | consulting why deep learning works
• Implicit Self-Regularization in Deep Neural Networks: Evidence from
Random Matrix Theory and Implications for Learning.
• Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large
Pre-Trained Deep Neural Networks
• Workshop: Statistical Mechanics Methods for Discovering Knowledge from
Production-Scale Neural Networks
• Predicting trends in the quality of state-of-the-art neural networks without
access to training or testing data
• More in press today:
• Some unpublished, experimental results also discussed
(JMLR 2021)
(ICML 2019, SDM 2020)
(KDD 2020)
(Nature Communications 2021)
Selected publications
[Contest post-mortem] [Training transformers]
8. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
8
calculation | consulting why deep learning works
The tail of the ESD contains the information
9. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
9
calculation | consulting why deep learning works
Well trained layers are heavy-tailed and well shaped
GPT-2 Fits a Power Law
(or Truncated Power Law)
alpha in [2, 6]
watcher.analyze(plot=True)
Good quality of fit (D is small)
10. c|c
(TM)
WeightWatcher: analyzes the ESD
(eigenvalues) of the layer weight matrices
(TM)
10
calculation | consulting why deep learning works
Better trained layers are more heavy-tailed and better shaped
GPT GPT-2
11. c|c
(TM)
(TM)
11
calculation | consulting why deep learning works
Random Matrix Theory: Marcenko Pastur
plus Tracy-Widom fluctuations
very crisp edges
Q
RMT says if W is a simple random Gaussian matrix,
then the ESD will have a very simple , known form
Shape depends on Q=N/M
(and variance ~ 1)
Eigenvalues tightly bounded
a few spikes may appear
13. c|c
(TM)
(TM)
13
calculation | consulting why deep learning works
Random Matrix Theory: Heavy Tailed
But if W is heavy tailed, the ESD will also have heavy tails
(i.e. its all spikes, bulk vanishes)
If W is strongly correlated , then the ESD can be modeled as if W is drawn
from a heavy tailed distribution
Nearly all pre-trained DNNs display heavy tails…as shall soon see
14. c|c
(TM)
(TM)
14
calculation | consulting why deep learning works
Experiments: just apply to pre-trained Models
LeNet5 (1998)
AlexNet (2012)
InceptionV3 (2014)
ResNet (2015)
…
DenseNet201 (2018)
Conv2D MaxPool Conv2D MaxPool FC FC
15. c|c
(TM)
(TM)
15
calculation | consulting why deep learning works
AlexNet,
VGG11,VGG13, …
ResNet, …
Inception,
DenseNet,
BERT, RoBERT, …
GPT, GPT2, …
…
Heavy-Tailed: Self-Regularization
All large, well trained, modern DNNs exhibit heavy tailed self-regularization
scale free
HTSR
16. c|c
(TM)
(TM)
16
calculation | consulting why deep learning works
Heavy Tailed Metrics: GPT vs GPT2
The original GPT is poorly trained on purpose; GPT2 is well trained
alpha for every layer
smaller alpha is better
large alpha are bad fits
17. c|c
(TM)
(TM)
17
calculation | consulting why deep learning works
Power Law Universality: ImageNet
All ImageNet models display remarkable Heavy Tailed Universality
500 matrices
~50 architectures
Linear layers &
Conv2D feature maps
80-90% < 4
18. c|c
(TM)
(TM)
18
calculation | consulting why deep learning works
Random Matrix Theory: detailed insight into WL
DNN training induces breakdown of Gaussian random structure
and the onset of a new kind of heavy tailed self-regularization
Gaussian
random
matrix
Bulk+
Spikes
Heavy
Tailed
Small, older NNs
Large, modern DNNs
and/or
Small batch sizes
19. c|c
(TM)
(TM)
19
calculation | consulting why deep learning works
HT-SR Theory: 5+1 Phases of Training
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin, Michael W. Mahoney; JMLR 22(165):1−73, 2021.
20. c|c
(TM)
(TM)
20
calculation | consulting why deep learning works
Heavy Tailed RMT: Universality Classes
The familiar Wigner/MP Gaussian class is not the only Universality class in RMT
Charles H. Martin, Michael W. Mahoney; JMLR 22(165):1−73, 2021.
21. c|c
(TM)
WeightWatcher: predict trends in generalization
(TM)
21
calculation | consulting why deep learning works
Predict test accuracies across variations in hyper-parameters
The average Power Law exponent alpha
predicts generalization—at fixed depth
Smaller average-alpha is better
Better models are easier to treat
Charles H. Martin, Michael W. Mahoney;
[Contest post-mortem paper]
22. c|c
(TM)
WeightWatcher: Shape vs Scale metrics
(TM)
22
calculation | consulting why deep learning works
Purely norm-based (scale) metrics (from SLT) can be correlated with depth
but anti-correlated with hyper-parameter changes
23. c|c
(TM)
WeightWatcher: treat architecture changes
(TM)
23
calculation | consulting why deep learning works
Predict test accuracies across variations in hyper-parameters and depth
The alpha-hat metric combines
shape and scale metrics
and corrects
for different depths (grey line)
can be derived from theory…
24. c|c
(TM)
WeightWatcher: predict test accuracies
(TM)
24
calculation | consulting why deep learning works
alpha-hat works for 100s of different CV and NLP models
(Nature Communications 2021)
We do not have access to
The training or test data
But we can still predict
trends in the generalization
25. c|c
(TM)
WeightWatcher: predict test accuracies
(TM)
25
calculation | consulting why deep learning works
ResNet, DenseNet, etc.
(Nature Communications 2021)
26. c|c
(TM)
(TM)
26
calculation | consulting why deep learning works
Predicting test accuracies: 100 pretrained models
The heavy tailed (shape) metrics perform best
https://github.com/osmr/imgclsmob
From an open source sandbox of
nearly 500 pretrained CV models
(picked >=5 models / regression)
(Nature Communications 2021)
27. c|c
(TM)
(TM)
27
calculation | consulting why deep learning works
Correlation Flow: CV Models
We can study correlation flow looking at vs. depth
VGG ResNet DenseNet
(Nature Communications 2021)
29. c|c
(TM)
WeightWatcher: more Power Law shape metrics
(TM)
29
calculation | consulting why deep learning works
watcher.analyze(…, fit=‘TPL’)
Truncated Power Law fits
fit=‘E_TPL’)
weightwatcher provides several shape (and scale) metrics
plus several more unpublished experimental options
30. c|c
(TM)
WeightWatcher: E_TPL shape metric
(TM)
30
calculation | consulting why deep learning works
the E_TPL (and rand_distance)
shape metrics track the
learning curve epoch-by-epoch
Training MT transformers
from scratch to SOTA
Extended Truncated Power Law
highly accurate results leverage the advanced shape metrics
Here, (Lambda) is the shape metric
[Training transformers paper]
31. c|c
(TM)
WeightWatcher: why Power Law fits ?
(TM)
31
calculation | consulting why deep learning works
Spiking (i.e real) neurons exhibit power law behavior
weightwatcher supports several PL fits
from experimental neuroscience
plus totally new shape metrics
we have invented (and published)
32. c|c
(TM)
WeightWatcher: why Power Law fits ?
(TM)
32
calculation | consulting why deep learning works
Spiking (i.e real) neurons exhibit (truncated) power law behavior
The Critical Brain Hypothesis
Evidence of Self-Organized Criticality (SOC)
Per Bak (How Nature Works)
As neural systems become more complex
they exhibit power law behavior
and then truncated power law behavior
We see exactly this behavior in DNNs
and it is predictive of learning capacity
33. c|c
(TM)
WeightWatcher: open-source, open-science
(TM)
33
calculation | consulting why deep learning works
We are looking for early adopters and collaborators
github.com/CalculatedContent/WeightWatcher
We have a Slack channel to support the tool
Please file issues
Ping me to join
39. c|c
(TM)
(TM)
39
calculation | consulting why deep learning works
Classic Set Up: Student-Teacher model
Average version space volume over Gaussian data and uniform random Teachers
Final expression has 2 parts, parameterized by the error ( ), size of data set (p)
40. c|c
(TM)
(TM)
40
calculation | consulting why deep learning works
Classic Set Up: Student-Teacher model
Average over random teachers
introduces overlap R
Key idea in matrix generalization
41. c|c
(TM)
(TM)
41
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
“ .. Matrix Generalization of S-T …” Martin, Milletari, & Mahoney (in preparation)
real DNN matrices:
NxM
Strongly correlated
Heavy-Tailed
correlation matrices
Solve for total integrated version space
42. c|c
(TM)
(TM)
42
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
Gibbs Learning / Canonical Ensemble
Consider T-S Mean Squared Error (MSE)
43. c|c
(TM)
(TM)
43
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
Integrate the canonical measure over Gaussian data
Matrix-generalized Student-Teacher overlap
44. c|c
(TM)
(TM)
44
calculation | consulting why deep learning works
New Set Up: Matrix-generalized Student-Teacher
Integrate the version space volume over the Students J
Expand delta function
Again, break into 2 parts
45. c|c
(TM)
(TM)
45
calculation | consulting why deep learning works
New approach: HCIZ Matrix Integrals
Fix the Teacher: average over Student Correlation Matrices
Wick
rotation
Represent as an HCIZ integral
Note:
which resemble the Teacher
46. c|c
(TM)
(TM)
46
calculation | consulting why deep learning works
New approach: SemiEmpirical Theory
“Generalized Norm”
simple, functional form
can infer from empirical fit
Eigenvalues of Teacher
empirical fit to:
“Asymptotics of HCZI integrals …” Tanaka (2008)
WeightWatcher
PowerLaw metric
47. c|c
(TM)
WeightWatcher: global and local convexity metrics
(TM)
47
calculation | consulting why deep learning works
Smaller alpha corresponds to more convex energy landscapes
Transformers (alpha ~ 3-4 or more)
alpha 2-3 (or less)
Rational Decisions, Random Matrices and Spin Glasses" (1998)
by Galluccio, Bouchaud, and Potters:
48. c|c
(TM)
WeightWatcher: global and local convexity metrics
(TM)
48
calculation | consulting why deep learning works
When the layer alpha < 2, we think this means the layer is overfit
We suspect that the early layers
of some Convolutional Nets
may be slightly overtrained
Some alpha < 2
This is predicted from our HTSR theory
51. c|c
(TM)
(TM)
51
calculation | consulting why deep learning works
New interpretation: HCIZ Matrix Integrals
Generating functional
R-Transform (inverse Green’s function, via Contour Integral)
in terms of Teacher's eigenvalues , and Student’s cumulants
52. c|c
(TM)
(TM)
52
calculation | consulting why deep learning works
Results: Gaussian Random Weight Matrices
“Random Matrix Theory (book)” Bouchaud and Potters (2020)
Recover the Frobenius Norm (squared) as the metric
53. c|c
(TM)
(TM)
53
calculation | consulting why deep learning works
Results: (very) Heavy Tailed Weight Matrices
“Heavy-tailed random matrices” Burda and Jukiewicz (2009)
Recover a Shatten Norm, in terms of the Heavy Tailed exponent
54. c|c
(TM)
(TM)
54
calculation | consulting why deep learning works
Application to: Heavy Tailed Weight Matrices
Some reasonable approximations give the weighted alpha metric
Q.E.D.