This document provides an introduction to support vector machines (SVMs). It discusses how SVMs can be used for binary classification, regression, and multi-class problems. SVMs find the optimal separating hyperplane that maximizes the margin between classes. Soft margins allow for misclassified points by introducing slack variables. Kernels are discussed for mapping data into higher dimensional feature spaces to perform linear separation. The document outlines the formulation of SVMs for classification and regression and discusses model selection and different kernel functions.
An introductory-to-mid level to presentation to complex network analysis: network metrics, analysis of online social networks, approximated algorithms, memorization issues, storage.
The document summarizes key concepts in social network analysis including metrics like degree distribution, path lengths, transitivity, and clustering coefficients. It also discusses models of network growth and structure like random graphs, small-world networks, and preferential attachment. Computational aspects of analyzing large networks like calculating shortest paths and the diameter are also covered.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
This document discusses techniques for training deep variational autoencoders and probabilistic ladder networks. It proposes three advances: 1) Using an inference model similar to ladder networks with multiple stochastic layers, 2) Adding a warm-up period to keep units active early in training, and 3) Using batch normalization. These advances allow training models with up to five stochastic layers and achieve state-of-the-art log-likelihood results on benchmark datasets. The document explains variational autoencoders, probabilistic ladder networks, and how the proposed techniques parameterize the generative and inference models.
This document summarizes a seminar on kernels and support vector machines. It begins by explaining why kernels are useful for increasing flexibility and speed compared to direct inner product calculations. It then covers definitions of positive definite kernels and how to prove a function is a kernel. Several kernel families are discussed, including translation invariant, polynomial, and non-Mercer kernels. Finally, the document derives the primal and dual problems for support vector machines and explains how the kernel trick allows non-linear classification.
This document provides an introduction to support vector machines (SVMs). It discusses how SVMs can be used for binary classification, regression, and multi-class problems. SVMs find the optimal separating hyperplane that maximizes the margin between classes. Soft margins allow for misclassified points by introducing slack variables. Kernels are discussed for mapping data into higher dimensional feature spaces to perform linear separation. The document outlines the formulation of SVMs for classification and regression and discusses model selection and different kernel functions.
An introductory-to-mid level to presentation to complex network analysis: network metrics, analysis of online social networks, approximated algorithms, memorization issues, storage.
The document summarizes key concepts in social network analysis including metrics like degree distribution, path lengths, transitivity, and clustering coefficients. It also discusses models of network growth and structure like random graphs, small-world networks, and preferential attachment. Computational aspects of analyzing large networks like calculating shortest paths and the diameter are also covered.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
This document discusses techniques for training deep variational autoencoders and probabilistic ladder networks. It proposes three advances: 1) Using an inference model similar to ladder networks with multiple stochastic layers, 2) Adding a warm-up period to keep units active early in training, and 3) Using batch normalization. These advances allow training models with up to five stochastic layers and achieve state-of-the-art log-likelihood results on benchmark datasets. The document explains variational autoencoders, probabilistic ladder networks, and how the proposed techniques parameterize the generative and inference models.
This document summarizes a seminar on kernels and support vector machines. It begins by explaining why kernels are useful for increasing flexibility and speed compared to direct inner product calculations. It then covers definitions of positive definite kernels and how to prove a function is a kernel. Several kernel families are discussed, including translation invariant, polynomial, and non-Mercer kernels. Finally, the document derives the primal and dual problems for support vector machines and explains how the kernel trick allows non-linear classification.
This document discusses multi-class support vector machines (SVMs). It outlines three main strategies for multi-class SVMs: decomposition approaches like one-vs-all and one-vs-one, a global approach, and an approach using pairwise coupling of convex hulls. It also discusses using SVMs to estimate class probabilities and describes two variants of multi-class SVMs that incorporate slack variables to allow misclassified examples.
This document provides an introduction to deep neural networks (DNNs) by a Dr. Liwei Ren. It defines DNNs from both technical and mathematical perspectives. DNNs are composed of three main elements - architecture, activity rule, and learning rule. The architecture determines the network's capability and is typically a directed graph with weights, biases, and activation functions. Gradient descent and backpropagation are commonly used as the learning rule to update weights and minimize error. Universal approximation theorems show that both shallow and deep neural networks can approximate functions, with deep networks potentially being more efficient. Examples of DNN applications include image recognition. Security issues are also briefly mentioned.
The document discusses the perceptron, which is a binary classifier that outputs either 1 or -1. It introduces key elements of the perceptron like inputs, weights, bias, and activation functions. It also covers the perceptron convergence theorem, which states that perceptrons can learn to correctly classify linearly separable patterns through iterative weight updates. Additionally, it discusses types of activation functions and provides an example of finding a decision boundary. Machine learning plays a role in finding accurate weights through multiple iterations of training perceptrons on data.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Tensor Methods: A New Paradigm for Training Probabilistic Models and Feature Learning: Tensors are rich structures for modeling complex higher order relationships in data rich domains such as social networks, computer vision, internet of things, and so on. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates of parameters for many probabilistic models such as topic models, community models, hidden Markov models, and so on. I will show the results of these methods for learning topics from text data, communities in social networks, disease hierarchies from healthcare records, cell types from mouse brain data, etc. I will also demonstrate how tensor methods can yield rich discriminative features for classification tasks and can serve as an alternative method for training neural networks.
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
The paper proposes Factor-VAE, which aims to learn disentangled representations in an unsupervised manner. Factor-VAE enhances disentanglement over the β-VAE by encouraging the latent distribution to be factorial (independent across dimensions) using a total correlation penalty. This penalty is optimized using a discriminator network. Experiments on various datasets show that Factor-VAE achieves better disentanglement than β-VAE, as measured by a proposed disentanglement metric, while maintaining good reconstruction quality. Latent traversals qualitatively demonstrate disentangled factors of variation.
This document discusses important issues in machine learning for data mining, including the bias-variance dilemma. It explains that the difference between the optimal regression and a learned model can be measured by looking at bias and variance. Bias measures the error between the expected output of the learned model and the optimal regression, while variance measures the error between the learned model's output and its expected output. There is a tradeoff between bias and variance - increasing one decreases the other. This is known as the bias-variance dilemma. Cross-validation and confusion matrices are also introduced as evaluation techniques.
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
This document discusses variational inference with Rényi divergence. It summarizes variational autoencoders (VAEs), which are deep generative models that parametrize a variational approximation with a recognition network. VAEs define a generative model as a hierarchical latent variable model and approximate the intractable true posterior using variational inference. The document explores using Rényi divergence as an alternative to the evidence lower bound objective of VAEs, as it may provide tighter variational bounds.
This document contains lecture notes on sparse autoencoders. It begins with an introduction describing the limitations of supervised learning and the need for algorithms that can automatically learn feature representations from unlabeled data. The notes then state that sparse autoencoders are one approach to learn features from unlabeled data, and describe the organization of the rest of the notes. The notes will cover feedforward neural networks, backpropagation for supervised learning, autoencoders for unsupervised learning, and how sparse autoencoders are derived from these concepts.
The document discusses Boolean equi-propagation, an approach for optimizing SAT encodings of constraint satisfaction problems (CSPs). It involves inferring new equalities from constraints and simplifying the model. This allows representing problem structures directly and compactly in conjunctive normal form (CNF). Examples show how equi-propagation simplifies models by removing variables for all-different and bit-sum constraints. Experiments demonstrate the approach generates small CNFs for balanced incomplete block designs and Nonogram puzzles.
The document compares the performance of extreme learning machine (ELM) and support vector machines (SVM) in classification tasks. It finds that ELM achieves comparable or better accuracy than SVM and LS-SVM on cancer classification data, while being significantly faster. ELM requires less human intervention than SVMs since its hidden node parameters are randomly assigned rather than optimized. While ELM often has lower computational complexity than SVMs for large datasets, more analysis is needed to understand its behavior with different network sizes and datasets.
This document discusses functions, limits, and continuity. It begins by defining functions, domains, ranges, and some standard real functions like constant, identity, modulus, and greatest integer functions. It then covers limits of functions including one-sided limits and properties of limits. Examples are provided to illustrate evaluating limits using substitution and factorization methods. The overall objectives are to understand functions, domains, ranges, limits of functions and methods to evaluate limits.
The document summarizes the paper "Matching Networks for One Shot Learning". It discusses one-shot learning, where a classifier can learn new concepts from only one or a few examples. It introduces matching networks, a new approach that trains an end-to-end nearest neighbor classifier for one-shot learning tasks. The matching networks architecture uses an attention mechanism to compare a test example to a small support set and achieve state-of-the-art one-shot accuracy on Omniglot and other datasets. The document provides background on one-shot learning challenges and related work on siamese networks, memory augmented neural networks, and attention mechanisms.
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
This document discusses neural message passing networks for modeling quantum chemistry. It defines message passing networks as having message functions that update node states based on neighboring node states, vertex update functions that update node states based to accumulated messages, and a readout function that produces an output for the full graph. It provides examples of specific message, update, and readout functions used in existing message passing models like interaction networks and molecular graph convolutions.
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
Bayes by Backprop is a method for introducing weight uncertainty into neural networks using variational Bayesian learning. It represents each weight as a probability distribution rather than a fixed value. This allows the model to better assess uncertainty. The paper proposes Bayes by Backprop, which uses a simple approximate learning algorithm similar to backpropagation to learn the distributions over weights. Experiments show it achieves good results on classification, regression, and contextual bandit problems, outperforming standard regularization methods by capturing weight uncertainty.
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
This document discusses radial basis function networks and forward selection heuristics for neural networks. It begins by outlining topics to be covered, including predicting the variance of weights and outputs, selecting the regularization parameter, and forward selection algorithms. It then derives an expression for the variance of the weight vector w when noise is assumed to be normally distributed. Next, it discusses how to calculate the variance matrix and selects the regularization parameter λ. Finally, it introduces how to determine the number of dimensions and provides an overview of forward selection algorithms.
This document summarizes a lecture on multi-kernel support vector machines (SVM). It introduces multiple kernel learning (MKL), which allows using a combination of multiple kernel functions instead of a single kernel for SVM classification and regression. MKL learns the optimal combination of kernels by solving a convex optimization problem to find the kernel weights while training the SVM. The SimpleMKL algorithm is presented for efficiently solving the MKL problem using a reduced gradient approach. Experimental results on regression datasets demonstrate that MKL can improve performance over single kernel SVMs.
The document summarizes deep learning concepts including deep neural network (DNN) structure, gradient descent, and backpropagation. It explains that DNNs use multiple hidden layers to construct mathematical models that transform input values to output values. Gradient descent is used to minimize error by adjusting weights, and backpropagation efficiently calculates gradients by propagating error backwards from the output layer.
The document discusses various math and string classes in Java. It covers:
- Constructing objects using the new operator and passing parameters.
- Using the Random class to generate random numbers.
- Declaring constants using final and static final.
- Basic arithmetic, increment/decrement, and math methods.
- Creating and manipulating strings using methods like length(), substring(), and concatenation.
- Drawing shapes on a frame using Graphics2D methods in a JComponent's paintComponent method.
Deep learning and neural networks are inspired by biological neurons. Artificial neural networks (ANN) can have multiple layers and learn through backpropagation. Deep neural networks with multiple hidden layers did not work well until recent developments in unsupervised pre-training of layers. Experiments on MNIST digit recognition and NORB object recognition datasets showed deep belief networks and deep Boltzmann machines outperform other models. Deep learning is now widely used for applications like computer vision, natural language processing, and information retrieval.
Predicting Thyroid Disorder with Deep Neural NetworksAnaelia Ovalle
This document discusses using demographic data and deep neural networks to predict thyroid disorders. It introduces neural networks and TensorFlow, describing how deep neural networks can extract patterns from complex data. The document details methods used to handle imbalanced data, including downsampling, upsampling, and SMOTE sampling. Sample code is shown for SMOTE sampling and creating a dynamic neural network. Results found that sampling methods and activation functions impacted 28 models, and precision had to be balanced with sensitivity. Lift was introduced to measure model effectiveness versus random selection, with results showing improved lift over a random forest baseline model. The document concludes that addressing imbalanced data, parameter tuning, and evaluating multiple models can provide insight into detecting thyroid disorders in a population using demographic data
This document discusses multi-class support vector machines (SVMs). It outlines three main strategies for multi-class SVMs: decomposition approaches like one-vs-all and one-vs-one, a global approach, and an approach using pairwise coupling of convex hulls. It also discusses using SVMs to estimate class probabilities and describes two variants of multi-class SVMs that incorporate slack variables to allow misclassified examples.
This document provides an introduction to deep neural networks (DNNs) by a Dr. Liwei Ren. It defines DNNs from both technical and mathematical perspectives. DNNs are composed of three main elements - architecture, activity rule, and learning rule. The architecture determines the network's capability and is typically a directed graph with weights, biases, and activation functions. Gradient descent and backpropagation are commonly used as the learning rule to update weights and minimize error. Universal approximation theorems show that both shallow and deep neural networks can approximate functions, with deep networks potentially being more efficient. Examples of DNN applications include image recognition. Security issues are also briefly mentioned.
The document discusses the perceptron, which is a binary classifier that outputs either 1 or -1. It introduces key elements of the perceptron like inputs, weights, bias, and activation functions. It also covers the perceptron convergence theorem, which states that perceptrons can learn to correctly classify linearly separable patterns through iterative weight updates. Additionally, it discusses types of activation functions and provides an example of finding a decision boundary. Machine learning plays a role in finding accurate weights through multiple iterations of training perceptrons on data.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Tensor Methods: A New Paradigm for Training Probabilistic Models and Feature Learning: Tensors are rich structures for modeling complex higher order relationships in data rich domains such as social networks, computer vision, internet of things, and so on. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates of parameters for many probabilistic models such as topic models, community models, hidden Markov models, and so on. I will show the results of these methods for learning topics from text data, communities in social networks, disease hierarchies from healthcare records, cell types from mouse brain data, etc. I will also demonstrate how tensor methods can yield rich discriminative features for classification tasks and can serve as an alternative method for training neural networks.
Paper Summary of Disentangling by Factorising (Factor-VAE)준식 최
The paper proposes Factor-VAE, which aims to learn disentangled representations in an unsupervised manner. Factor-VAE enhances disentanglement over the β-VAE by encouraging the latent distribution to be factorial (independent across dimensions) using a total correlation penalty. This penalty is optimized using a discriminator network. Experiments on various datasets show that Factor-VAE achieves better disentanglement than β-VAE, as measured by a proposed disentanglement metric, while maintaining good reconstruction quality. Latent traversals qualitatively demonstrate disentangled factors of variation.
This document discusses important issues in machine learning for data mining, including the bias-variance dilemma. It explains that the difference between the optimal regression and a learned model can be measured by looking at bias and variance. Bias measures the error between the expected output of the learned model and the optimal regression, while variance measures the error between the learned model's output and its expected output. There is a tradeoff between bias and variance - increasing one decreases the other. This is known as the bias-variance dilemma. Cross-validation and confusion matrices are also introduced as evaluation techniques.
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
This document discusses variational inference with Rényi divergence. It summarizes variational autoencoders (VAEs), which are deep generative models that parametrize a variational approximation with a recognition network. VAEs define a generative model as a hierarchical latent variable model and approximate the intractable true posterior using variational inference. The document explores using Rényi divergence as an alternative to the evidence lower bound objective of VAEs, as it may provide tighter variational bounds.
This document contains lecture notes on sparse autoencoders. It begins with an introduction describing the limitations of supervised learning and the need for algorithms that can automatically learn feature representations from unlabeled data. The notes then state that sparse autoencoders are one approach to learn features from unlabeled data, and describe the organization of the rest of the notes. The notes will cover feedforward neural networks, backpropagation for supervised learning, autoencoders for unsupervised learning, and how sparse autoencoders are derived from these concepts.
The document discusses Boolean equi-propagation, an approach for optimizing SAT encodings of constraint satisfaction problems (CSPs). It involves inferring new equalities from constraints and simplifying the model. This allows representing problem structures directly and compactly in conjunctive normal form (CNF). Examples show how equi-propagation simplifies models by removing variables for all-different and bit-sum constraints. Experiments demonstrate the approach generates small CNFs for balanced incomplete block designs and Nonogram puzzles.
The document compares the performance of extreme learning machine (ELM) and support vector machines (SVM) in classification tasks. It finds that ELM achieves comparable or better accuracy than SVM and LS-SVM on cancer classification data, while being significantly faster. ELM requires less human intervention than SVMs since its hidden node parameters are randomly assigned rather than optimized. While ELM often has lower computational complexity than SVMs for large datasets, more analysis is needed to understand its behavior with different network sizes and datasets.
This document discusses functions, limits, and continuity. It begins by defining functions, domains, ranges, and some standard real functions like constant, identity, modulus, and greatest integer functions. It then covers limits of functions including one-sided limits and properties of limits. Examples are provided to illustrate evaluating limits using substitution and factorization methods. The overall objectives are to understand functions, domains, ranges, limits of functions and methods to evaluate limits.
The document summarizes the paper "Matching Networks for One Shot Learning". It discusses one-shot learning, where a classifier can learn new concepts from only one or a few examples. It introduces matching networks, a new approach that trains an end-to-end nearest neighbor classifier for one-shot learning tasks. The matching networks architecture uses an attention mechanism to compare a test example to a small support set and achieve state-of-the-art one-shot accuracy on Omniglot and other datasets. The document provides background on one-shot learning challenges and related work on siamese networks, memory augmented neural networks, and attention mechanisms.
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
This document discusses neural message passing networks for modeling quantum chemistry. It defines message passing networks as having message functions that update node states based on neighboring node states, vertex update functions that update node states based to accumulated messages, and a readout function that produces an output for the full graph. It provides examples of specific message, update, and readout functions used in existing message passing models like interaction networks and molecular graph convolutions.
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
Bayes by Backprop is a method for introducing weight uncertainty into neural networks using variational Bayesian learning. It represents each weight as a probability distribution rather than a fixed value. This allows the model to better assess uncertainty. The paper proposes Bayes by Backprop, which uses a simple approximate learning algorithm similar to backpropagation to learn the distributions over weights. Experiments show it achieves good results on classification, regression, and contextual bandit problems, outperforming standard regularization methods by capturing weight uncertainty.
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
This document discusses radial basis function networks and forward selection heuristics for neural networks. It begins by outlining topics to be covered, including predicting the variance of weights and outputs, selecting the regularization parameter, and forward selection algorithms. It then derives an expression for the variance of the weight vector w when noise is assumed to be normally distributed. Next, it discusses how to calculate the variance matrix and selects the regularization parameter λ. Finally, it introduces how to determine the number of dimensions and provides an overview of forward selection algorithms.
This document summarizes a lecture on multi-kernel support vector machines (SVM). It introduces multiple kernel learning (MKL), which allows using a combination of multiple kernel functions instead of a single kernel for SVM classification and regression. MKL learns the optimal combination of kernels by solving a convex optimization problem to find the kernel weights while training the SVM. The SimpleMKL algorithm is presented for efficiently solving the MKL problem using a reduced gradient approach. Experimental results on regression datasets demonstrate that MKL can improve performance over single kernel SVMs.
The document summarizes deep learning concepts including deep neural network (DNN) structure, gradient descent, and backpropagation. It explains that DNNs use multiple hidden layers to construct mathematical models that transform input values to output values. Gradient descent is used to minimize error by adjusting weights, and backpropagation efficiently calculates gradients by propagating error backwards from the output layer.
The document discusses various math and string classes in Java. It covers:
- Constructing objects using the new operator and passing parameters.
- Using the Random class to generate random numbers.
- Declaring constants using final and static final.
- Basic arithmetic, increment/decrement, and math methods.
- Creating and manipulating strings using methods like length(), substring(), and concatenation.
- Drawing shapes on a frame using Graphics2D methods in a JComponent's paintComponent method.
Deep learning and neural networks are inspired by biological neurons. Artificial neural networks (ANN) can have multiple layers and learn through backpropagation. Deep neural networks with multiple hidden layers did not work well until recent developments in unsupervised pre-training of layers. Experiments on MNIST digit recognition and NORB object recognition datasets showed deep belief networks and deep Boltzmann machines outperform other models. Deep learning is now widely used for applications like computer vision, natural language processing, and information retrieval.
Predicting Thyroid Disorder with Deep Neural NetworksAnaelia Ovalle
This document discusses using demographic data and deep neural networks to predict thyroid disorders. It introduces neural networks and TensorFlow, describing how deep neural networks can extract patterns from complex data. The document details methods used to handle imbalanced data, including downsampling, upsampling, and SMOTE sampling. Sample code is shown for SMOTE sampling and creating a dynamic neural network. Results found that sampling methods and activation functions impacted 28 models, and precision had to be balanced with sensitivity. Lift was introduced to measure model effectiveness versus random selection, with results showing improved lift over a random forest baseline model. The document concludes that addressing imbalanced data, parameter tuning, and evaluating multiple models can provide insight into detecting thyroid disorders in a population using demographic data
P03 neural networks cvpr2012 deep learning methods for visionzukun
This document provides an overview of neural networks for computer vision tasks. It discusses using neural networks to build an object recognition system from raw pixels to labels in an end-to-end manner with no distinction between feature extraction and classification. The key ideas are to learn features from data, use differentiable functions to efficiently compute and train features, and use a "deep" architecture of simpler non-linear modules. Building complex functions from simple building blocks like logistic regression allows constructing highly non-linear systems for tasks like vision.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Neural Networks, Spark MLlib, Deep LearningAsim Jalis
What are neural networks? How to use the neural networks algorithm in Apache Spark MLlib? What is Deep Learning? Presented at Data Science Meetup at Galvanize on 2/17/2016.
For code see IPython/Jupyter/Toree notebook at http://nbviewer.jupyter.org/gist/asimjalis/4f911882a1ab963859ce
Using deep neural networks for fashion applicationsAhmad Qamar
Talk abstract:
Deep learning has been a popular and powerful approach for solving computer vision problems in recent years. As web and social media content shifts towards rich-media, deep learning can be used to tackle the problem of understanding images to better capture user's fashion preferences. In this talk we take a closer look at convolutional neural networks used for detecting, tagging, and indexing fashion images. We'll also cover related work in the area, illustrate a wide range of applications, discuss challenges and merits of domain-specific deep learning models, and touch upon future work.
Thread Genius is a NYC-based Techstars-backed visual search and recommendation platform for fashion content. Use the full suite of Thread Genius APIs to index and identify clothing within UGC photos, find visually similar alternatives, or recommendations on how to complete the look. Find out more at threadgenius.co
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
Image classification with Deep Neural NetworksYogendra Tamang
This document discusses image classification using deep neural networks. It provides background on image classification and convolutional neural networks. The document outlines techniques like activation functions, pooling, dropout and data augmentation to prevent overfitting. It summarizes a paper on ImageNet classification using CNNs with multiple convolutional and fully connected layers. The paper achieved state-of-the-art results on ImageNet in 2010 and 2012 by training CNNs on a large dataset using multiple GPUs.
This document provides a summary of topics covered in a deep neural networks tutorial, including:
- A brief introduction to artificial intelligence, machine learning, and artificial neural networks.
- An overview of common deep neural network architectures like convolutional neural networks, recurrent neural networks, autoencoders, and their applications in areas like computer vision and natural language processing.
- Advanced techniques for training deep neural networks like greedy layer-wise training, regularization methods like dropout, and unsupervised pre-training.
- Applications of deep learning beyond traditional discriminative models, including image synthesis, style transfer, and generative adversarial networks.
Tutorial on Deep learning and ApplicationsNhatHai Phan
In this presentation, I would like to review basis techniques, models, and applications in deep learning. Hope you find the slides are interesting. Further information about my research can be found at "https://sites.google.com/site/ihaiphan/."
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
This document provides an agenda for a presentation on deep learning, neural networks, convolutional neural networks, and interesting applications. The presentation will include introductions to deep learning and how it differs from traditional machine learning by learning feature representations from data. It will cover the history of neural networks and breakthroughs that enabled training of deeper models. Convolutional neural network architectures will be overviewed, including convolutional, pooling, and dense layers. Applications like recommendation systems, natural language processing, and computer vision will also be discussed. There will be a question and answer section.
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
Anima Anandkumar is a faculty at the EECS Dept. at U.C.Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group at MIT between 2009-2010. She is the recipient of the Microsoft Faculty Fellowship, ARO Young Investigator Award, NSF CAREER Award, and IBM Fran Allen PhD fellowship.
This document discusses network representation and analysis. It defines networks as consisting of nodes (vertices) and edges, and describes different ways to represent networks mathematically using adjacency matrices, incidence matrices, and Laplacian matrices. It also discusses visualizing networks using multidimensional scaling and plotting them in R. Special types of networks like complete graphs and random graphs are briefly introduced.
Data science involves extracting insights from large volumes of data. It is an interdisciplinary field that uses techniques from statistics, machine learning, and other domains. The document provides examples of classification algorithms like k-nearest neighbors, naive Bayes, and perceptrons that are commonly used in data science to build models for tasks like spam filtering or sentiment analysis. It also discusses clustering, frequent pattern mining, and other machine learning concepts.
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
Despite achieving state-of-the-art performance across many domains, machine learning systems are highly vulnerable to subtle adversarial perturbations. Although defense approaches have been proposed in recent years, many have been bypassed by even weak adversarial attacks. Previous studies showed that ensembles created by combining multiple weak defenses (i.e., input data transformations) are still weak. In this talk, I will show that it is indeed possible to construct effective ensembles using weak defenses to block adversarial attacks. However, to do so requires a diverse set of such weak defenses. Based on this motivation, I will present Athena, an extensible framework for building effective defenses to adversarial attacks against machine learning systems. I will talk about the effectiveness of ensemble strategies with a diverse set of many weak defenses that comprise transforming the inputs (e.g., rotation, shifting, noising, denoising, and many more) before feeding them to target deep neural network classifiers. I will also discuss the effectiveness of the ensembles with adversarial examples generated by various adversaries in different threat models. In the second half of the talk, I will explain why building defenses based on the idea of many diverse weak defenses works, when it is most effective, and what its inherent limitations and overhead are.
Kakuro: Solving the Constraint Satisfaction ProblemVarad Meru
This work was done as a part of the project for the course CS 271: Introduction to Artificial Intelligence (http://www.ics.uci.edu/~kkask/Fall-2014%20CS271/index.html), taught in Fall 2014.
This document provides an overview of machine learning concepts. It discusses big data and the need for machine learning to extract structure from data. It explains that machine learning involves programming computers to optimize performance using examples or past experience. Learning is useful when human expertise is limited or changes over time. The document also summarizes applications of machine learning like classification, regression, clustering, and reinforcement learning. It provides examples of each type of learning and discusses concepts like bias-variance tradeoff, overfitting, underfitting and more.
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
20101017 program analysis_for_security_livshits_lecture02_compilersComputer Science Club
This document provides an introduction and overview of compiler optimization techniques, including:
1) Flow graphs, constant folding, global common subexpressions, induction variables, and reduction in strength.
2) Data-flow analysis basics like reaching definitions, gen/kill frameworks, and solving data-flow equations iteratively.
3) Pointer analysis using Andersen's formulation to model references between local variables and heap objects. Rules are provided to represent points-to relationships.
ECE 2103_L6 Boolean Algebra Canonical Forms [Autosaved].pptxMdJubayerFaisalEmon
This document discusses digital system design and Boolean algebra concepts. It covers canonical and standard forms, minterms and maxterms, conversions between forms, sum of minterms, product of maxterms, and other logic operations. Examples are provided to demonstrate minimizing Boolean functions using K-maps and converting between standard forms. DeMorgan's laws and other Boolean algebra properties are also explained. Tutorial problems are given at the end to practice simplifying Boolean expressions and converting between standard forms.
Here are the portions of the state space tree generated by LCBB and FIFOBB for the given knapsack problems:
a) n=5, (p1,p2,p3,p4,p5)=(10,15,6,8,4), (w1,w2,w3,w4,w5)=(4,6,3,4,2) and m=12
LCBB:
1
2 3 4
7 8 9 10
5 6
FIFOBB:
1
2
3
7
4
5 6
8 9
b) n=5, (p1,p2,p3,
This presentation begins with explaining the basic algorithms of machine learning and using the same concepts, discusses in detail 2 supervised learning/deep learning algorithms - Artificial neural nets and Convolutional Neural Nets. The relationship between Artificial neural nets and basic machine learning algorithms such as logistic regression and soft max is also explored. For hands on the implementation of ANN's and CNN's on MNIST dataset is also explained.
The document discusses using unusual data sources in insurance. It provides examples of using pictures, text, social media data, telematics, and satellite imagery in insurance. It also discusses challenges in analyzing complex and high-dimensional data from these sources and introduces machine learning tools like PCA, generalized linear models, and evaluating models using loss, risk, and cross-validation.
The document discusses shortest path problems in graphs. It introduces the shortest path problem and describes algorithms for finding shortest paths from a single source (Dijkstra's and Bellman-Ford algorithms) and for all vertex pairs (Bellman-Ford algorithm). It also discusses dynamic programming, linear programming formulations, and properties like Bellman's principle of optimality and the existence of a shortest path tree from any starting vertex. Examples are provided to illustrate the algorithms.
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
This document discusses consensus optimization and its applications to machine learning model fitting. Convex optimization problems can be solved effectively using interior point methods or customized algorithms. Model fitting is commonly formulated as regularized loss minimization, which is convex for many useful cases like linear regression. Consensus optimization allows distributed model fitting by splitting the data across nodes and coordinating local model parameters with consensus constraints. The alternating direction method of multipliers (ADMM) solves the consensus problem iteratively. Applications demonstrate distributed training of support vector machines and logistic regression models using ADMM consensus optimization.
A simple framework for contrastive learning of visual representationsDevansh16
Link: https://machine-learning-made-simple.medium.com/learnings-from-simclr-a-framework-contrastive-learning-for-visual-representations-6c145a5d8e99
If you'd like to discuss something, text me on LinkedIn, IG, or Twitter. To support me, please use my referral link to Robinhood. It's completely free, and we both get a free stock. Not using it is literally losing out on free money.
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let's connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.
Comments: ICML'2020. Code and pretrained models at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as: arXiv:2002.05709 [cs.LG]
(or arXiv:2002.05709v3 [cs.LG] for this version)
Submission history
From: Ting Chen [view email]
[v1] Thu, 13 Feb 2020 18:50:45 UTC (5,093 KB)
[v2] Mon, 30 Mar 2020 15:32:51 UTC (5,047 KB)
[v3] Wed, 1 Jul 2020 00:09:08 UTC (5,829 KB)
This document provides an overview and introduction to deep learning concepts including linear regression, activation functions, gradient descent, backpropagation, hyperparameters, convolutional neural networks (CNNs), recurrent neural networks (RNNs), and TensorFlow. It discusses clustering examples to illustrate neural networks, explores different activation functions and cost functions, and provides code examples of TensorFlow operations, constants, placeholders, and saving graphs.
A fast-paced introduction to Deep Learning that starts with a simple yet complete neural network (no frameworks), followed by an overview of activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Next we'll create a neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. For best results, familiarity with basic vectors and matrices, inner (aka "dot") products of vectors, and rudimentary Python is definitely helpful.
Abstract : For many years, Machine Learning has focused on a key issue: the design of input features to solve prediction tasks. In this presentation, we show that many learning tasks from structured output prediction to zero-shot learning can benefit from an appropriate design of output features, broadening the scope of regression. As an illustration, I will briefly review different examples and recent results obtained in my team.
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
Slides from the introductory lecture I gave for students at Camp IT 2019. I tried to cover artificial inteligence, machine learning, most popular algorithms and their applications to business as broadly as possible - for in-depth materials on the given topics, see links and references in the presentation.
An introduction to Deep Learning (DL) concepts, starting with a simple yet complete neural network (no frameworks), followed by aspects of deep neural networks, such as back propagation, activation functions, CNNs, and the AUT theorem. Next, a quick introduction to TensorFlow and Tensorboard, and then some code samples with Scala and TensorFlow.
Similar to Output Units and Cost Function in FNN (20)
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
1. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Deep Neural Network
Cost Functions and Output Units
Jiaming Lin
jmlin@arbor.ee.ntu.edu.tw
DATALab@III
NetDBLab@NTU
January 9, 2017
1 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
2. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
2 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
3. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Introduction
In the neural network learning...
The selection of output unit depends on the learning
problems.
– Classification: sigmoid, softmax or linear.
– Linear Regression: linear.
3 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
4. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Introduction
In the neural network learning...
The selection of output unit depends on the learning
problems.
– Classification: sigmoid, softmax or linear.
– Linear Regression: linear.
Determine and analyse the cost function.
– Is the cost function †analytic?
– Can the learning progress well(first order derivative)?
3 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
5. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Introduction
In the neural network learning...
The selection of output unit depends on the learning
problems.
– Classification: sigmoid, softmax or linear.
– Linear Regression: linear.
Determine and analyse the cost function.
– Is the cost function †analytic?
– Can the learning progress well(first order derivative)?
Deterministic and Generic Model.
– Data is more complicated in many cases.
Note: †For simplicity, we mean analytic to say a function is
infinitely differentiable on the domain.
3 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
6. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
4 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
7. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
5 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
8. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Binary
index x1 · · · xn target
1 0 · · · 1 Class A
2 1 · · · 0 Class B
3 1 · · · 1 Class A
· · · · · · · · · · · · · · ·
m 0 · · · 0 Class B
6 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
9. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Binary
where
S is the sigmoid function,
z is the input of output layer
z = w h + b (1)
with w is weight, h is output of hidden layer and b is bias.
6 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
10. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Cost Function
Cost function can be derived from many methods, we discuss
two of the most common
Mean Square Error
Cross Entropy
7 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
11. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Cost Function
Cost function can be derived from many methods, we discuss
two of the most common
Mean Square Error
Let y(i)
denotes the data label, and ˆy(i)
= S(z(i)
) as the
prediction. We may define the cost function Cmse by
Cmse =
1
m
m
i=1
(ˆy(i)
− y(i)
)2
(2)
where m is the data size, and z(i)
, ˆy(i)
and y(i)
are real
numbers.
7 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
12. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Cost Function
Cost function can be derived from many methods, we discuss
two of the most common
Cross Entropy
Adapting the symbols above, the cost function defined by
Cross Entropy is
Cce =
1
m
m
i=1
y(i)
ln(ˆy(i)
) + (1 − y(i)
) ln(1 − ˆy(i)
) (2)
where m is the data size, and z(i)
, ˆy(i)
and y(i)
are real
numbers.
7 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
13. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
Problem: Which one is better?
Analyticity(infinitely differentiable)
Learning ability(first order derivatives)
8 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
14. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
Analyticity:
Cmse =
1
m
m
i=1
(ˆy(i)
− y(i)
)2
Cce =
1
m
m
i=1
y(i)
ln(ˆy(i)
) + (1 − y(i)
) ln(1 − ˆy(i)
)
Computationally, the value of ˆy(i)
= S(z(i)
) could overflow to
1 or underflow to 0 when z(i)
is very positive or very negative.
Therefore, given a fixed y(i)
∈ {0, 1},
Cce is undefined at ˆy(i)
is 0 or 1.
Cmse is polynomial and thus analytic every where.
8 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
15. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
Learning Ability: compare the gradients
∂Cmse
∂w
= [S(z) − y] [1 − S(z)] S(z)h, (3)
∂Cce
∂w
= [y − S(z)] h (4)
respectively, where S is sigmoid, z = w h + b.
8 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
16. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
MSE Cross Entropy
[S(z) − y] [1 − S(z)] S(z)h [y − S(z)] h
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 0
If y = 0 and ˆy → 1,
steps → 0
If y = 0 and ˆy → 0,
steps → 0
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 1
If y = 0 and ˆy → 1,
steps → −1
If y = 0 and ˆy → 0,
steps → 0
9 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
17. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Comparison between MSE and Cross Entropy
MSE Cross Entropy
[S(z) − y] [1 − S(z)] S(z)h [y − S(z)] h
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 0
If y = 0 and ˆy → 1,
steps → 0
If y = 0 and ˆy → 0,
steps → 0
If y = 1 and ˆy → 1,
steps → 0
If y = 1 and ˆy → 0,
steps → 1
If y = 0 and ˆy → 1,
steps → −1
If y = 0 and ˆy → 0,
steps → 0
In the ceas of Mean Square Error, the progress get stuck when
z is very positive or very negative.
9 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
18. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
We have mentioned about the unstable issue of cross
entropy.
10 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
19. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
We have mentioned about the unstable issue of cross
entropy.
Precisely,
ˆy = S(z) underflow to 0 when z is very negative,
ˆy = S(z) overflow to 1 when z is very positive.
Therefore, given a fixed y ∈ {0, 1}, then the function
C = y ln ˆy + (1 − y) ln(1 − ˆy)
could be undefined when z is very positive or very
negative.
10 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
20. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
Alternatively, regarding z as the variable of cross entropy
C = y ln S(z) + (1 − y) ln(1 − S(z)) (5)
= −ζ(−z) + z(y − 1), (6)
where ζ is the softplus and z is real number.
11 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
21. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
Alternatively, regarding z as the variable of cross entropy
C = y ln S(z) + (1 − y) ln(1 − S(z)) (5)
= −ζ(−z) + z(y − 1), (6)
where ζ is the softplus and z is real number.
We may obtain the analyticity of C by showing the dC
dz
is
multiple of analytic functions.
11 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
22. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
The Unstable Issue in Cross Entropy
Alternatively, regarding z as the variable of cross entropy
C = y ln S(z) + (1 − y) ln(1 − S(z)) (5)
= −ζ(−z) + z(y − 1), (6)
where ζ is the softplus and z is real number.
In the cases of right answer
y = 1 and ˆy = S(z) → 1 ⇒ z → ∞, C → 0,
y = 0 and ˆy = S(z) → 0 ⇒ z → −∞, C → 0.
In the cases of wrong answer
y = 1 and ˆy = S(z) → 0 ⇒ z → −∞, C → −1,
y = 0 and ˆy = S(z) → 1 ⇒ z → ∞, C → −1.
11 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
23. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
12 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
24. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Multinoulli: Output Unit and Cost Function
Generalize the binary case to multiple classes.
Linear output units and #(output units) = #(classes).
Cost function evaluated by cross entropy.
13 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
25. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Multinoulli: Output Unit and Cost Function
Generalize the binary case to multiple classes.
Linear output units and #(output units) = #(classes).
Cost function evaluated by cross entropy.
Cost Function in Multinoulli Problems
Suppose the size of dataset is m and there are K classes, then
we can obtain the cost function from cross entropy
C(w) = −
m
i=1
K
k=1
1{y(i)
= k} ln
exp(z
(i)
k )
K
j=1 exp(z
(i)
j )
(7)
where z
(i)
k = wk h(i)
+ bk and h(i)
is the output of hidden layer
corresponding to example data xi.
13 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
26. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
A Lemma for Cost Function Simplify
Analyticity(infinitely differentiable)
Learning ability(first order derivatives)
14 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
27. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
A Lemma for Cost Function Simplify
Analyticity(infinitely differentiable)
Learning ability(first order derivatives)
To claim above properties, We should show a lemma at very
first,
Lemma 1
For the output z = w h + b and z = [z1, . . . , zK], we have
sup
z
ln
K
j=1
exp(zj) = max
j
{zj}. (8)
14 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
28. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
A Lemma for Cost Function Simplify
Proof.
Without loss of generality, we may assume z1 > . . . > zK,
then the remaining work is to show, for all > 0.
ln ez1
1 +
K
j=2
ezj−z1
= z1 + ln 1 +
K
j=2
ezj−z1
≤ z1 +
Intuitively, the ln
K
j=1
exp (zj) can be well approximated
by max
j
{zj}.
14 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
29. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Analyticity
We may rewrite the cost function as
C(w) = −
m
i=1
K
k=1
1{y(i)
= k} z
(i)
k − ln
K
j=1
exp(z
(i)
j ) .
For each summand, it is substraction of analytic function and
thus analytic, and the term 1{y(i)
= k} is acturally a constant.
The total cost is summation of analytic functions and thus
analytic.
15 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
30. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Property 2
By the rule of sum in derivatives, we may simplify the (7) as
following
C(i)
=
K
k=1
1{y = k} zk − ln
K
j=1
exp(zj) , (8)
this cost is contributed by the example xi in the total cost C.
1 Assume the model gives the right answer, then the
errors would close to 0.
2 Assume the model gives the wrong answer, then the
learning can prograss well.
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
31. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Proof (The Right Answer).
Suppose the true label is class n. By the assumption, we
know zn is the maxmal. Then
− ≤
K
k=1
1{y = k} zk − ln
K
j=1
exp(zj)
= zn − ln
K
j=1
exp(zj)
< zn − max
j
{zj} = 0.
This shows that − ≤ C(i)
< 0 for an arbitrary small .
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
32. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Proof (The Wrong Answer).
Suppose the true label is class n. By assumption, the
prediction zn given by model is not the maxmal. On the other
hand, using the fact
zn = max
j
{zj} ⇒ softmax(zn) 1.
This implies that there exist a sufficient large δ > 0 such that
| softmax(zn) − 1 |> δ.
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
33. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Binary
Multinoulli
Learning Ability
Proof (The Wrong Answer, Conti.)
Then
∂C(i)
∂zn
=
∂
∂zn
zn − ln
K
j=1
ezj
= 1 − softmax(zn)
> δ
This shows the gradient is sufficently large and also
predictable(bounded by 1), therefore the learning can progress
well.
16 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
34. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
17 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
35. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Learning Processes Overview
Deterministic Generic
Step1 Model function
Linear
Sigmoid
Probability distribution
Gaussian
Bernoulli
Step2 Design errors evals
MSE
Cross Entropy
Maximum Likelihood Es-
timate
Step3 Learning one statistic
Mean
Median
Learning full distribution
18 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
36. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Learning Processes Overview
Deterministic Generic
Step1 Model function
Linear
Sigmoid
Probability distribution
Gaussian
Bernoulli
Step2 Design errors evals
MSE
Cross Entropy
Maximum Likelihood Es-
timate
Step3 Learning one statistic
Mean
Median
Learning full distribution
To describe some complicate data, it’s easier to build model
with generic method.
18 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
37. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Binary Classification
Step1: Using Bernoulli distribution as likelihood function.
p(y | x) = py
(1 − p)1−y
= S(z)y
(1 − S(z))1−y
Step2: Minimizing negative log-likelihood
ln p(y | x(i)
) = y ln S(z) + (1 − y) ln(1 − S(z))
Step3: We an learn the full distribution.
p(y | x ) = S(z )y
(1 − S(z ))1−y
,
where we denote z = w x + b and S is sigmoid.
19 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
38. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step1
Given a training feature x, using Gaussian distribution as
likelihood function
20 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
39. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step1
Given a training feature x, using Gaussian distribution as
likelihood function
p(y | x) =
1
√
2σ2π
exp
−(µ − y)2
2σ2
,
where we denote the output of hidden layer as hx, weight
w = [w1, w2] and bias b = [b1, b2], then
µ = w1 hx + b1
σ = w2 hx + b2
Intuitively, µ and σ are two linear output units, they are
functions of x.
20 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
40. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
Recall that the maximum likelihood estimate is equivalent to
minimize the negative log-likelihood, that is
(ˆµ, ˆσ) = arg min
(µ,σ)
−
x
ln p(y | x) (8)
21 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
41. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
Recall that the maximum likelihood estimate is equivalent to
minimize the negative log-likelihood, that is
(ˆµ, ˆσ) = arg min
(µ,σ)
−
x
ln p(y | x) (8)
However, for each summand,
Cx = ln p(y | x) =
−1
2
ln(2πσ2
) +
(µ − y)2
σ2
∂Cx
∂σ
= (πσ)−1
− 2σ−3
(µ − y)
the gradients and errors become unstable when σ close 0.
21 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
42. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
To prevent the gradients and errors from being unstable, we
may substitute the term 1
2σ2 with v, then for each summand in
the negative log-likelihood
Cx = ln π − ln v − (µ − y)2
v,
∂Cx
∂µ
= −2v(µ − y),
∂Cx
∂v
=
1
v
− (µ − y)2
.
Note that, this substitution valid only when the variance isn’t
too large.
22 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
43. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step2
If the variance σ is fixed and chosen by user, then by
comparing the negative log-likelihood and MSE, we can see
that minimizing NLL is equivalent to minimizing MSE.
Cmse =
1
m
m
i=1
ˆy(i)
− y(y) 2
Cnll =
m
i=1
Cx(i)
=
−1
2
m ln(2πσ2
) +
m
i=1
µx(i) − y(i) 2
σ2
22 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
44. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step3
Full distribution from Generic, µ and σ in this case.
Single statistics from Deterministic, µ in this case.
23 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
45. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step3
Full distribution from Generic, µ and σ in this case.
Single statistics from Deterministic, µ in this case.
Experiment(ref): generate random data base on the formula
y = x + 7.0 sin(0.75x) +
where is the gaussian noise with µ = 0, σ = 1
23 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
46. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Generic Modeling for Linear Regression: Step3
Full distribution from Generic, µ and σ in this case.
Single statistics from Deterministic, µ in this case.
FNN config:
#(hidden layey) = 1, width = 20 and hidden unit is tanh.
Gerneric Deterministic
23 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
47. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
More Complicated Cases
Complicated data distributions.
In some cases, it’s almost impossible to describe data via
deterministic methods.
Generic methods might perform better in complicated
case.
24 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
48. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Generate random data based on the formula
x = y + 7.0 sin(0.75y) +
where is the gaussian noise with µ = 0, σ = 1
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
49. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Firstly, just try to using MSE to define cost function and one
hidden layer with width = 20, hidden unit is tanh.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
50. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Firstly, just try to using MSE to define cost function and one
hidden layer with width = 20, hidden unit is tanh.
The reason is, minimizing MSE is
equivalant to minimizing nagetive log-likelihood for simple
Gaussian.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
51. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
The mixture density network. The Gaussian mixture with n
components is defined by the conditional probability
distribution
p(y | x) =
n
i=1
p(c = i|x)ℵ(y; µ(i)
(x); Σ(i)
(x)). (9)
Network configuration,
1 Number of components n, need to be fine tuned(try and
error).
2 3 × n output units.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
52. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Mixture Density Network
Experiment(ref):
#(components) = 24,
two hidden layers with width = 24 and activation is tanh,
#(output units) = 3 × 24 and they are linear.
25 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
53. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Outline
1 Introduction
2 Output Units and Cost Functions
Binary
Multinoulli
3 Deterministic and Generic Model
4 Concludsions and Discussions
26 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
54. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
55. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
An cross entropy improvement to avoid numerically
unstable.
– The MNIST example from Tensorflow.
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
56. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
An cross entropy improvement to avoid numerically
unstable.
– The MNIST example from Tensorflow.
Determine the cost function is good or not.
– Is the cost function analytic?
– Can the learning progress well?
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
57. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
In classification problems, cross entropy is naturally
good to evaluate errors than other methods.
An cross entropy improvement to avoid numerically
unstable.
– The MNIST example from Tensorflow.
Determine the cost function is good or not.
– Is the cost function analytic?
– Can the learning progress well?
Deterministic v.s. Generic
– Deterministic learns single statistic while generic learn
full distribution.
– When data distribution is not normal(high kurtosis or fat
tail), generic might be better.
– Generic methods is easier to apply to complicated cases.
27 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network
58. Introduction
Output Units and Cost Functions
Deterministic and Generic Model
Concludsions and Discussions
Thank you.
28 / 28 Jiaming Lin jmlin@arbor.ee.ntu.edu.tw Deep Neural Network