Advanced regression and model selection - For Upgrad. This discussed various techniques of model selection and general methodology to select a machine learning algorithm
An overview of gradient descent optimization algorithms Hakky St
This document provides an overview of various gradient descent optimization algorithms that are commonly used for training deep learning models. It begins with an introduction to gradient descent and its variants, including batch gradient descent, stochastic gradient descent (SGD), and mini-batch gradient descent. It then discusses challenges with these algorithms, such as choosing the learning rate. The document proceeds to explain popular optimization algorithms used to address these challenges, including momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSprop, and Adam. It provides visualizations and intuitive explanations of how these algorithms work. Finally, it discusses strategies for parallelizing and optimizing SGD and concludes with a comparison of optimization algorithms.
Transfer learning aims to improve learning in a target domain by leveraging knowledge from a related source domain. It is useful when the target domain has limited labeled data. There are several approaches, including instance-based approaches that reweight or resample source instances, and feature-based approaches that learn a transformation to align features across domains. Spectral feature alignment is one technique that builds a graph of correlations between pivot features shared across domains and domain-specific features, then applies spectral clustering to derive new shared features.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
The document discusses transfer learning and building complex models using Keras and TensorFlow. It provides examples of using the functional API to build models with multiple inputs and outputs. It also discusses reusing pretrained layers from models like ResNet, Xception, and VGG to perform transfer learning for new tasks with limited labeled data. Freezing pretrained layers initially and then training the entire model is recommended for transfer learning.
This document discusses various regularization techniques for deep learning models. It defines regularization as any modification to a learning algorithm intended to reduce generalization error without affecting training error. It then describes several specific regularization methods, including weight decay, norm penalties, dataset augmentation, early stopping, dropout, adversarial training, and tangent propagation. The goal of regularization is to reduce overfitting and improve generalizability of deep learning models.
Overview on Optimization algorithms in Deep LearningKhang Pham
Overview on function optimization in general and in deep learning. The slides cover from basic algorithms like batch gradient descent, stochastic gradient descent to the state of art algorithm like Momentum, Adagrad, RMSprop, Adam.
An overview of gradient descent optimization algorithms Hakky St
This document provides an overview of various gradient descent optimization algorithms that are commonly used for training deep learning models. It begins with an introduction to gradient descent and its variants, including batch gradient descent, stochastic gradient descent (SGD), and mini-batch gradient descent. It then discusses challenges with these algorithms, such as choosing the learning rate. The document proceeds to explain popular optimization algorithms used to address these challenges, including momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSprop, and Adam. It provides visualizations and intuitive explanations of how these algorithms work. Finally, it discusses strategies for parallelizing and optimizing SGD and concludes with a comparison of optimization algorithms.
Transfer learning aims to improve learning in a target domain by leveraging knowledge from a related source domain. It is useful when the target domain has limited labeled data. There are several approaches, including instance-based approaches that reweight or resample source instances, and feature-based approaches that learn a transformation to align features across domains. Spectral feature alignment is one technique that builds a graph of correlations between pivot features shared across domains and domain-specific features, then applies spectral clustering to derive new shared features.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
The document discusses transfer learning and building complex models using Keras and TensorFlow. It provides examples of using the functional API to build models with multiple inputs and outputs. It also discusses reusing pretrained layers from models like ResNet, Xception, and VGG to perform transfer learning for new tasks with limited labeled data. Freezing pretrained layers initially and then training the entire model is recommended for transfer learning.
This document discusses various regularization techniques for deep learning models. It defines regularization as any modification to a learning algorithm intended to reduce generalization error without affecting training error. It then describes several specific regularization methods, including weight decay, norm penalties, dataset augmentation, early stopping, dropout, adversarial training, and tangent propagation. The goal of regularization is to reduce overfitting and improve generalizability of deep learning models.
Overview on Optimization algorithms in Deep LearningKhang Pham
Overview on function optimization in general and in deep learning. The slides cover from basic algorithms like batch gradient descent, stochastic gradient descent to the state of art algorithm like Momentum, Adagrad, RMSprop, Adam.
A brief presentation given on the basics of Ensemble Methods. Given as a 'Lightning Talk' during the 7th Cohort of General Assembly's Data Science Immersive Course
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
This document discusses Gaussian mixture models (GMMs) and the expectation-maximization (EM) algorithm. GMMs model data as coming from a mixture of Gaussian distributions, with each data point assigned soft responsibilities to the different components. EM is used to estimate the parameters of GMMs and other latent variable models. It iterates between an E-step, where responsibilities are computed based on current parameters, and an M-step, where new parameters are estimated to maximize the expected complete-data log-likelihood given the responsibilities. EM converges to a local optimum for fitting GMMs to data.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
Chromatic Number of a Graph (Graph Colouring)Adwait Hegde
A graph coloring is an assignment of labels, called colors, to the vertices of a graph such that no two adjacent vertices share the same color. The chromatic number χ(G) of a graph G is the minimal number of colors for which such an assignment is possible.
Self-supervised learning uses unlabeled data to learn visual representations through pretext tasks like predicting relative patch location, solving jigsaw puzzles, or image rotation. These tasks require semantic understanding to solve but only use unlabeled data. The features learned through pretraining on pretext tasks can then be transferred to downstream tasks like image classification and object detection, often outperforming supervised pretraining. Several papers introduce different pretext tasks and evaluate feature transfer on datasets like ImageNet and PASCAL VOC. Recent work combines multiple pretext tasks and shows improved generalization across tasks and datasets.
This document provides an overview of classification in machine learning. It discusses supervised learning and the classification process. It describes several common classification algorithms including k-nearest neighbors, Naive Bayes, decision trees, and support vector machines. It also covers performance evaluation metrics like accuracy, precision and recall. The document uses examples to illustrate classification tasks and the training and testing process in supervised learning.
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 217번째 논문 review입니다
이번 논문은 GoogleBrain에서 쓴 EfficientDet입니다. EfficientNet의 후속작으로 accuracy와 efficiency를 둘 다 잡기 위한 object detection 방법을 제안한 논문입니다. 이를 위하여 weighted bidirectional feature pyramid network(BiFPN)과 EfficientNet과 유사한 방법의 detection용 compound scaling 방법을 제안하고 있는데요, 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1911.09070
영상링크: https://youtu.be/11jDC8uZL0E
Gradient
Based Learning Applied to Document Recognition
Y
. LeCun , L. Bottou , Y. Bengio and P. Haffner
Proceedings of the IEEE, 86(11
):2278 ----‐2324 , November 1998
Minmax Algorithm In Artificial Intelligence slidesSamiaAziz4
Mini-max algorithm is a recursive or backtracking algorithm that is used in decision-making and game theory. Mini-Max algorithm uses recursion to search through the game-tree.
Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-tac-toe, go, and various tow-players game. This Algorithm computes the minimax decision for the current state.
This document provides an introduction to image segmentation. It discusses how image segmentation partitions an image into meaningful regions based on measurements like greyscale, color, texture, depth, or motion. Segmentation is often an initial step in image understanding and has applications in identifying objects, guiding robots, and video compression. The document describes thresholding and clustering as two common segmentation techniques and provides examples of segmentation based on greyscale, texture, motion, depth, and optical flow. It also discusses region-growing, edge-based, and active contour model approaches to segmentation.
HML: Historical View and Trends of Deep LearningYan Xu
The document provides a historical view and trends of deep learning. It discusses that deep learning models have evolved in several waves since the 1940s, with key developments including the backpropagation algorithm in 1986 and deep belief networks with pretraining in 2006. Current trends include growing datasets, increasing numbers of neurons and connections per neuron, and higher accuracy on tasks involving vision, NLP and games. Research trends focus on generative models, domain alignment, meta-learning, using graphs as inputs, and program induction.
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Overfitting and underfitting are modeling errors related to how well a model fits training data. Overfitting occurs when a model is too complex and fits the training data too closely, resulting in poor performance on new data. Underfitting occurs when a model is too simple and does not fit the training data well. The bias-variance tradeoff aims to balance these issues by finding a model complexity that minimizes total error.
This document provides an introduction to machine learning and inductive inference. It discusses what machine learning is, common learning tasks like concept learning and function learning, different data representations, and example applications such as knowledge discovery and building adaptive systems. The course will cover generalizing from specific examples to broader concepts through inductive inference and different learning approaches.
Asymptotic analysis of parallel programsSumita Das
The document compares four algorithms for sorting a list of numbers in parallel. It presents a table showing the number of processing elements, parallel runtime, speedup, efficiency, and processing element-time product for each algorithm. It analyzes that algorithm A1 has the lowest parallel runtime and is the best if the metric is speed, while algorithms A2 and A4 have the highest efficiency and are the best if the metric is efficiency or cost. The document emphasizes the importance of identifying the objectives of the analysis and using the appropriate metrics.
The document discusses data augmentation techniques for improving machine learning models. It begins with definitions of data augmentation and reasons for using it, such as enlarging datasets and preventing overfitting. Examples of data augmentation for images, text, and audio are provided. The document then demonstrates how to perform data augmentation for natural language processing tasks like text classification. It shows an example of augmenting a movie review dataset and evaluating a text classifier. Pros and cons of data augmentation are discussed, along with key takeaways about using it to boost performance of models with small datasets.
- Linear regression estimates the relationship between continuous dependent and independent variables using a best fit line. Multiple linear regression uses multiple independent variables while simple linear regression uses one.
- Logistic regression applies a sigmoid function to linear regression when the dependent variable is binary. It handles non-linear relationships between variables.
- Polynomial regression uses higher powers of independent variables which may lead to overfitting so model fit must be checked.
- Stepwise regression automatically selects independent variables using forward selection or backward elimination. Ridge and lasso regression address multicollinearity through regularization. Elastic net is a hybrid of ridge and lasso.
- Classification algorithms include k-nearest neighbors, decision trees, support vector machines, and naive Bayes which use probability
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A brief presentation given on the basics of Ensemble Methods. Given as a 'Lightning Talk' during the 7th Cohort of General Assembly's Data Science Immersive Course
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
This document discusses Gaussian mixture models (GMMs) and the expectation-maximization (EM) algorithm. GMMs model data as coming from a mixture of Gaussian distributions, with each data point assigned soft responsibilities to the different components. EM is used to estimate the parameters of GMMs and other latent variable models. It iterates between an E-step, where responsibilities are computed based on current parameters, and an M-step, where new parameters are estimated to maximize the expected complete-data log-likelihood given the responsibilities. EM converges to a local optimum for fitting GMMs to data.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
Chromatic Number of a Graph (Graph Colouring)Adwait Hegde
A graph coloring is an assignment of labels, called colors, to the vertices of a graph such that no two adjacent vertices share the same color. The chromatic number χ(G) of a graph G is the minimal number of colors for which such an assignment is possible.
Self-supervised learning uses unlabeled data to learn visual representations through pretext tasks like predicting relative patch location, solving jigsaw puzzles, or image rotation. These tasks require semantic understanding to solve but only use unlabeled data. The features learned through pretraining on pretext tasks can then be transferred to downstream tasks like image classification and object detection, often outperforming supervised pretraining. Several papers introduce different pretext tasks and evaluate feature transfer on datasets like ImageNet and PASCAL VOC. Recent work combines multiple pretext tasks and shows improved generalization across tasks and datasets.
This document provides an overview of classification in machine learning. It discusses supervised learning and the classification process. It describes several common classification algorithms including k-nearest neighbors, Naive Bayes, decision trees, and support vector machines. It also covers performance evaluation metrics like accuracy, precision and recall. The document uses examples to illustrate classification tasks and the training and testing process in supervised learning.
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
TensorFlow Korea 논문읽기모임 PR12 217번째 논문 review입니다
이번 논문은 GoogleBrain에서 쓴 EfficientDet입니다. EfficientNet의 후속작으로 accuracy와 efficiency를 둘 다 잡기 위한 object detection 방법을 제안한 논문입니다. 이를 위하여 weighted bidirectional feature pyramid network(BiFPN)과 EfficientNet과 유사한 방법의 detection용 compound scaling 방법을 제안하고 있는데요, 자세한 내용은 영상을 참고해주세요
논문링크: https://arxiv.org/abs/1911.09070
영상링크: https://youtu.be/11jDC8uZL0E
Gradient
Based Learning Applied to Document Recognition
Y
. LeCun , L. Bottou , Y. Bengio and P. Haffner
Proceedings of the IEEE, 86(11
):2278 ----‐2324 , November 1998
Minmax Algorithm In Artificial Intelligence slidesSamiaAziz4
Mini-max algorithm is a recursive or backtracking algorithm that is used in decision-making and game theory. Mini-Max algorithm uses recursion to search through the game-tree.
Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-tac-toe, go, and various tow-players game. This Algorithm computes the minimax decision for the current state.
This document provides an introduction to image segmentation. It discusses how image segmentation partitions an image into meaningful regions based on measurements like greyscale, color, texture, depth, or motion. Segmentation is often an initial step in image understanding and has applications in identifying objects, guiding robots, and video compression. The document describes thresholding and clustering as two common segmentation techniques and provides examples of segmentation based on greyscale, texture, motion, depth, and optical flow. It also discusses region-growing, edge-based, and active contour model approaches to segmentation.
HML: Historical View and Trends of Deep LearningYan Xu
The document provides a historical view and trends of deep learning. It discusses that deep learning models have evolved in several waves since the 1940s, with key developments including the backpropagation algorithm in 1986 and deep belief networks with pretraining in 2006. Current trends include growing datasets, increasing numbers of neurons and connections per neuron, and higher accuracy on tasks involving vision, NLP and games. Research trends focus on generative models, domain alignment, meta-learning, using graphs as inputs, and program induction.
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
This presentation explains basic ideas of graph neural networks (GNNs) and their common applications. Primary target audiences are students, engineers and researchers who are new to GNNs but interested in using GNNs for their projects. This is a modified version of the course material for a special lecture on Data Science at Nara Institute of Science and Technology (NAIST), given by Preferred Networks researcher Katsuhiko Ishiguro, PhD.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Overfitting and underfitting are modeling errors related to how well a model fits training data. Overfitting occurs when a model is too complex and fits the training data too closely, resulting in poor performance on new data. Underfitting occurs when a model is too simple and does not fit the training data well. The bias-variance tradeoff aims to balance these issues by finding a model complexity that minimizes total error.
This document provides an introduction to machine learning and inductive inference. It discusses what machine learning is, common learning tasks like concept learning and function learning, different data representations, and example applications such as knowledge discovery and building adaptive systems. The course will cover generalizing from specific examples to broader concepts through inductive inference and different learning approaches.
Asymptotic analysis of parallel programsSumita Das
The document compares four algorithms for sorting a list of numbers in parallel. It presents a table showing the number of processing elements, parallel runtime, speedup, efficiency, and processing element-time product for each algorithm. It analyzes that algorithm A1 has the lowest parallel runtime and is the best if the metric is speed, while algorithms A2 and A4 have the highest efficiency and are the best if the metric is efficiency or cost. The document emphasizes the importance of identifying the objectives of the analysis and using the appropriate metrics.
The document discusses data augmentation techniques for improving machine learning models. It begins with definitions of data augmentation and reasons for using it, such as enlarging datasets and preventing overfitting. Examples of data augmentation for images, text, and audio are provided. The document then demonstrates how to perform data augmentation for natural language processing tasks like text classification. It shows an example of augmenting a movie review dataset and evaluating a text classifier. Pros and cons of data augmentation are discussed, along with key takeaways about using it to boost performance of models with small datasets.
- Linear regression estimates the relationship between continuous dependent and independent variables using a best fit line. Multiple linear regression uses multiple independent variables while simple linear regression uses one.
- Logistic regression applies a sigmoid function to linear regression when the dependent variable is binary. It handles non-linear relationships between variables.
- Polynomial regression uses higher powers of independent variables which may lead to overfitting so model fit must be checked.
- Stepwise regression automatically selects independent variables using forward selection or backward elimination. Ridge and lasso regression address multicollinearity through regularization. Elastic net is a hybrid of ridge and lasso.
- Classification algorithms include k-nearest neighbors, decision trees, support vector machines, and naive Bayes which use probability
https://github.com/telecombcn-dl/dlmm-2017-dcu
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
A presentation about NGBoost (Natural Gradient Boosting) which I presented in the Information Theory and Probabilistic Programming course at the University of Oklahoma.
How to Win Machine Learning Competitions ? HackerEarth
This presentation was given by Marios Michailidis (a.k.a Kazanova), Current Kaggle Rank #3 to help community learn machine learning better. It comprises of useful ML tips and techniques to perform better in machine learning competitions. Read the full blog: http://blog.hackerearth.com/winning-tips-machine-learning-competitions-kazanova-current-kaggle-3
This presentation discusses about following topics:
Types of Problems Solved Using Artificial Intelligence Algorithms
Problem categories
Classification Algorithms
Naive Bayes
Example: A person playing golf
Decision Tree
Random Forest
Logistic Regression
Support Vector Machine
Support Vector Machine
K Nearest Neighbors
SEARN is an algorithm for structured prediction that casts it as a sequence of cost-sensitive classification problems. It works by learning a policy to make incremental decisions that build up the full structured output. The policy is trained through an iterative process of generating cost-sensitive examples from sample outputs produced by the current policy, training a classifier on those examples, and interpolating the new policy with the previous one. This allows SEARN to learn the structured prediction task without requiring assumptions about the output structure, unlike approaches that make independence assumptions or rely on global prediction models.
RegBoost is a novel multivariate regression ensemble algorithm that uses linear regression as weak predictors. It divides training data into branches based on predictions and recursively executes linear regression in each branch to achieve nonlinearity. Testing data is distributed to branches to continue with weak predictors, and the final result sums all weak predictors. Experiments show RegBoost achieves similar performance to gradient boosted decision trees and outperforms linear regression.
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboMaryam Farooq
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
at Capital One Labs on Tues, 10/23/18
Join us for what's sure to be an awesome night in AI! This month's event is focused Evolution Strategies, and will touch on many themes discussed here (https://blog.openai.com/evolution-strategies/).
Maxwell Rebo is a machine learning founder working on a stealth project in ML-powered simulation engine.
A class of heuristic search algorithms have been shown to be viable alternatives to reinforcement learning as well as other ML tasks. These methods can be parallelized on arbitrary numbers of CPUs and do not require GPUs to be effective. To increase explicability, it is possible to create attribution mechanisms within these methods.
Maxwell is the former founder of Machine Colony, and enterprise AI platform company, and a founding member of NYAI. A machine learning developer and three-time founder, he has been doing ML at massive scale since 2010. He has previously spoken at venues such as the Ethereal conference in NYC and the joint Asian Leadership/HelloTomorrow conference in Seoul.
Generalized linear models (GLMs) and gradient boosting machines (GBMs) are two of the most widely used supervised learning approaches in all of commercial data science. GLMs have been the go-to predictive and inferential modeling tool for decades, but important mathematical and computational advances have been made in training GLMs in recent years. This talk will contrast H2O’s implementation of penalized GLM techniques with ordinary least squares and give specific hints for building regularized and accurate GLMs for both predictive and inferential purposes. As more organizations begin experimenting with and embracing algorithms from the machine learning tradition, GBMs have come to prominence due to their predictive accuracy, the ability to train on real-world data, and resistance to overfitting training data. This talk will give some background on the GBM approach, some insight into the H2O implementation, and some tips for tuning and interpreting GBMs in H2O.
Patrick's Bio:
Patrick Hall is a senior data scientist and product engineer at H2O.ai. Patrick works with H2O.ai customers to derive substantive business value from machine learning technologies. His product work at H2O.ai focuses on two important aspects of applied machine learning, model interpretability and model deployment. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning.
Prior to joining H2O.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
Adapting neural networks for the estimation of treatment effectsViswanath Gangavaram
This document summarizes an academic paper that proposes new methods for adapting neural networks to more accurately estimate average treatment effects.
It describes several existing approaches for estimating treatment effects using neural networks, including single-model, two-model, and joint learning methods. The paper then proposes a new "DragonNet" architecture that jointly learns prediction models for the expected outcome and treatment probability from shared representations. It also introduces a targeted regularization technique to induce bias toward models with optimal asymptotic properties. Experimental results showed the DragonNet approach improved treatment effect estimation compared to existing two-stage methods.
Dimensionality reduction techniques are used to reduce the number of features or variables in a dataset. This helps simplify models and improve performance. Principal component analysis (PCA) is a common technique that transforms correlated variables into linearly uncorrelated principal components. Other techniques include backward elimination, forward selection, filtering out low variance or highly correlated features. Dimensionality reduction benefits include reducing storage space, faster training times, and better visualization of data.
This document provides an overview of machine learning and linear regression. It defines machine learning as a segment of artificial intelligence that allows computers to learn from data without being explicitly programmed. The document then discusses linear regression as an algorithm that finds a linear relationship between variables to predict future outcomes. It provides the linear regression equation and describes simple, multiple, and non-linear regression. Examples of using linear regression in various industries are also given along with best practices.
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
In the current scenario where every ML system requires a ton of data to train, changes in the data during model refreshment or even during production will cause a performance drop, sometimes quite significantly. It has become a tremendously important task in the ML system lifecycle to periodically check quality issues in the data stream itself. There are existing libraries, open-source tools or full-fledged SaaS platforms to monitor those data quality metrics but the metric used oftentimes becomes too generic and might not be useful at all.
There are simple data quality metrics, which can be developed individually and can be integrated with data quality tools/SaaS platforms to monitor them in production. In this talk, I will go through a couple of metrics for different types of data and use cases and how to use clustering and other unsupervised learning algorithms to build those metrics at the end will also try to show a demo with integrations and how it can be run in production.
ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.
An ensemble is itself a supervised learning algorithm, because it can be trained and then used to make predictions. The trained ensemble, therefore, represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built.
This document summarizes the NGBoost method for probabilistic regression. NGBoost uses gradient boosting to fit the parameters of an assumed probabilistic distribution for the target variable. It improves on existing probabilistic regression methods by using the natural gradient, which performs gradient descent in the space of distributions rather than the parameter space. This addresses issues with prior approaches and allows NGBoost to achieve state-of-the-art performance while remaining fast, flexible, and scalable. Future work may apply NGBoost to other problems like survival analysis or joint outcome regression.
Valencian Summer School in Machine Learning 2017 - Day 1
Lectures Review: Summary Day 1 Sessions. By Mercè Martín (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
Machine learning workshop, session 3.
- Data sets
- Machine Learning Algorithms
- Algorithms by Learning Style
- Algorithms by Similarity
- People to follow
Similar to Advanced regression and model selection (20)
Data analytics in fraud detection and customer feedbackAnkit Jain
This document discusses how data analytics can be used for fraud detection and analyzing customer feedback in ecommerce. It outlines common types of ecommerce frauds committed by buyers and sellers. It then describes how machine learning can be used to identify fraud buyers based on labeled transaction data and generated features. Customer feedback is also discussed, highlighting metrics like net promoter score and how natural language processing and bag of words models can analyze sentiment and pain points from reviews.
This presentation covers all the major problems that can be solved using data in E- commerce industry. It also touches on how data science teams are organized in those companies
Structure Approach to Analytics InterviewsAnkit Jain
This document provides an overview of how to prepare for and approach analytics interviews. It begins by introducing the author and their background. It then discusses the growing field of data analytics and common employer requirements like technical skills, business understanding, and communication skills. The document outlines different types of interviews including technical, guesstimate case studies, and real life case studies. It provides tips for each type and emphasizes the importance of structure, assumptions, and communicating your thought process. The document concludes with an example case study analyzing a drop in user engagement for a social network and the steps taken to investigate potential causes.
This document summarizes a data science talk that covered the following topics:
1) An introduction to data science, including what it is, where data comes from, and who uses it. Data scientists use skills in math, statistics, programming and domains to model data, experiment, validate models and deploy solutions.
2) Examples of data science applications at Roadrunnr to minimize delivery times and costs while improving reliability, such as optimizing multi-drop grouping logic for deliveries and predicting demand patterns.
3) Future data science projects at Roadrunnr, including surge pricing, wait time prediction, carrier selection, driver fraud detection, and product size estimation with images.
Ankit Jain gave a guest lecture sharing lessons he has learned. He discussed creating your own story rather than following others, having the right attitude being more important than talent alone, taking appropriate risks in life, setting goals, surrounding yourself with smart people, and using showmanship. He invited questions at the end.
This document discusses data analytics and the differences between data analysts and data scientists. It provides examples of where data science is used in everyday products from Google, Facebook, Uber, Amazon and others. A case study is presented on how Yammer investigated a drop in user engagement through data analysis. They found the major engagement drop was for mobile users, and further exploration showed the issue was likely related to long-time user retention on the mobile app.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
2. Model Selection Techniques
● If you are looking for a good place to start to choose a
machine learning algorithm for your dataset here are some
general guidelines.
● How large is your training set?
○ Small -- Prefer high bias/low variance classifiers (e.g.
Naive Bayes) over low bias/high variance classifiers (e.g.
KNN) to avoid overfitting.
○ Large - Low Bias/High Variance classifiers tend to produce
more accurate models
3. Adv/Disadv of Various Algorithms
● Naive Bayes:
○ Very simple to implement as it’s just a bunch of counts.
○ If conditional independence exists, it converges faster
than say Logistic Regression and thus requires less
training data.
○ If you want something fast,easy and performs well NB is a
good choice
○ Biggest disadvantage is that it can’t learn interactions in
the dataset
4. Adv/Disadv of Various Algorithms
● Logistic Regression:
○ Lots of ways to regularize the model and no need to worry
about features being correlated like in Naive Bayes.
○ Nice probabilistic interpretation. Helpful in problems like
churn prediction etc .
○ Online algorithm: Easy to update the model with the new
data (using an online gradient descent method)
5. Adv/Disadv of Various Algorithms
● Decision Trees:
○ Easy to explain and interpret (at least for some people)
○ Easily handles feature interactions.
○ No need to worry about outliers or whether data is linearly
separable or not.
○ Doesn’t support online learning. Rebuilding the model with
new data every time can be painful.
○ Tend to easily overfit. Solution: ensemble methods (RF)
6. Adv/Disadv of Various Algorithms
● SVM:
○ High accuracy for many datasets
○ With appropriate kernel, can work well even if your data
isn’t linearly separable in the base feature space.
○ Popular in text processing applications where high
dimensionality is a norm
○ Memory intensive, hard to interpret and kind of annoying
to run and tune
8. Linear Regression Issues
● Sensitivity to outliers
● Multicollinearity leads to high variance of the estimator.
● Prone to overfit if there are lot of variables
● Hard to interpret when the number of predictors is large.Need
a smaller subset that exhibits strongest effects.
9. Regularization Techniques
● Regularization techniques typically work by penalizing the
magnitude of coefficients of features along with minimizing
the error between predicted and actual observations
● Different types of penalization
○ Ridge Regression: Penalize on squared coefficients
○ Lasso Regression: Penalize on absolute value of
coefficients
11. Ridge Regression
● L2 penalty
● Pros
○ Variables >> Rows
○ Multicollinearity
○ Increased bias and lower variance from Linear Regression
● Cons
○ Doesn’t produce parsimonious model
Let’s see a collinearity example in R
12. Example: Luekemia Prediction
● Leukemia Data, Golub et al. Science 1999
● There are 38 training samples and 34 test samples with total
genes ~ 7000 (p >> n)
● Xij is the gene expression value for sample i and gene j
● Sample i either has tumor type AML or ALL
● We want to select genes relevant to tumor type
○ eliminate the trivial genes
○ grouped selection as many genes are highly correlated
● Ridge Regression can help to pursue this modeling
13. Grouped Selection
● If two predictors are highly correlated among themselves, the
estimated coefficients will be similar for them.
● if some variables are exactly identical, they will have same
coefficients
Ridge is good for grouped selection but not good for eliminating
trivial genes
14. LASSO
● Pros
○ Allow p >> n
○ Enforce sparsity in parameters
● Cons
○ If a group of predictors are highly correlated among
themselves, LASSO tends to pick only one of them and
shrink the other to zero
○ can not do grouped selection, tend to select one variable
LASSO is good for eliminating trivial genes but not good for
grouped selection
15. Elastic Net
● Weighted combination of L1 and L2 penalty
● Helps in enforcing sparsity
● Encourage grouping effect in highly correlated predictors
In gene selection problem, it can achieve both purposes of
removing trivial genes and doing group selection
16. Other Advanced Regression Methods
Poisson Regression
○ Typically used when the Y variable follows poisson
distribution (typically counts of events within a time t)
○ # times a customer will visit an ecommerce website next
month
17. Piecewise Linear Regression
● Polynomial regression
won’t work perfectly as it
will have high tendency to
overfit/underfit
● Instead, splitting the curve
into separate linear pieces
and building linear model
for each piece leads to
better results