Distributed algorithms in machine learning follow two main paradigms: data parallel, where the data is distributed across multiple workers and model parallel, where the model parameters are partitioned across multiple workers. The main limitation of the first approach is that the model parameters need to be replicated on every machine. This is problematic when the number of parameters is very large, and hence cannot fit in a single machine. The drawback of the latter approach is that the data needs to be replicated on each machine. Such replications limit the scalability of machine learning algorithms, since in several real-world tasks it is observed that the data and model sizes typically grow hand in hand. In this talk, I will present Hybrid-Parallelism, a new paradigm that partitions both, the data as well as the model parameters simultaneously in a completely de-centralized manner. As a result, each worker only needs access to a subset of the data and a subset of the parameters while performing parameter updates. Next, I will present a case-study showing how to apply these ideas to reformulate Multinomial Logistic Regression to achieve Hybrid Parallelism (DSMLR: Doubly-Separable Multinomial Logistic Regression). Finally, I will demonstrate the versatility of DS-MLR under various scenarios in data and model parallelism, through an empirical study consisting of real-world datasets.
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...ijcsa
Task scheduling plays an important part in the improvement of parallel and distributed systems. The problem of task scheduling has been shown to be NP hard. The time consuming is more to solve the problem in deterministic techniques. There are algorithms developed to schedule tasks for distributed environment, which focus on single objective. The problem becomes more complex, while considering biobjective.This paper presents bi-objective independent task scheduling algorithm using elitist Nondominated
sorting genetic algorithm (NSGA-II) to minimize the makespan and flowtime. This algorithm generates pareto global optimal solutions for this bi-objective task scheduling problem. NSGA-II is implemented by using the set of benchmark instances. The experimental result shows NSGA-II generates efficient optimal schedules.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
Machine learning in science and industry — day 1arogozhnikov
A course of machine learning in science and industry.
- notions and applications
- nearest neighbours: search and machine learning algorithms
- roc curve
- optimal classification and regression
- density estimation
- Gaussian mixtures and EM algorithm
- clustering, an example of clustering in the opera
Machine learning in science and industry — day 4arogozhnikov
- tabular data approach to machine learning and when it didn't work
- convolutional neural networks and their application
- deep learning: history and today
- generative adversarial networks
- finding optimal hyperparameters
- joint embeddings
Machine learning in science and industry — day 2arogozhnikov
- decision trees
- random forest
- Boosting: adaboost
- reweighting with boosting
- gradient boosting
- learning to rank with gradient boosting
- multiclass classification
- trigger in LHCb
- boosting to uniformity and flatness loss
- particle identification
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...ijcsa
Task scheduling plays an important part in the improvement of parallel and distributed systems. The problem of task scheduling has been shown to be NP hard. The time consuming is more to solve the problem in deterministic techniques. There are algorithms developed to schedule tasks for distributed environment, which focus on single objective. The problem becomes more complex, while considering biobjective.This paper presents bi-objective independent task scheduling algorithm using elitist Nondominated
sorting genetic algorithm (NSGA-II) to minimize the makespan and flowtime. This algorithm generates pareto global optimal solutions for this bi-objective task scheduling problem. NSGA-II is implemented by using the set of benchmark instances. The experimental result shows NSGA-II generates efficient optimal schedules.
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...ijaia
A support vector machine (SVM) learns the decision surface from two different classes of the input points, there are misclassifications in some of the input points in several applications. In this paper a bi-objective quadratic programming model is utilized and different feature quality measures are optimized simultaneously using the weighting method for solving our bi-objective quadratic programming problem. An important contribution will be added for the proposed bi-objective quadratic programming model by getting different efficient support vectors due to changing the weighting values. The numerical examples, give evidence of the effectiveness of the weighting parameters on reducing the misclassification between two classes of the input points. An interactive procedure will be added to identify the best compromise solution from the generated efficient solutions.
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...ijcsit
For some management programming problems, multiple objectives to be optimized rather than a single objective, and objectives can be expressed with ratio equations such as return/investment, operating
profit/net-sales, profit/manufacturing cost, etc. In this paper, we proposed the transformation characteristics to solve the multi objective linear fractional programming (MOLFP) problems. If a MOLFP problem with both the numerators and the denominators of the objectives are linear functions and some
technical linear restrictions are satisfied, then it is defined as a multi objective linear fractional programming problem MOLFPP in this research. The transformation characteristics are illustrated and the solution procedure and numerical example are presented.
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
Multimodal biometric system utilizes two or more character modalities, e.g., face, ear, and fingerprint,
Signature, plamprint to improve the recognition accuracy of conventional unimodal methods. We propose a new
dimensionality reduction method called Dimension Diminish Projection (DDP) in this paper. DDP can not only
preserve local information by capturing the intra-modal geometry, but also extract between-class relevant
structures for classification effectively. Experimental results show that our proposed method performs better
than other algorithms including PCA, LDA and MFA.
Gradient Boosted Regression Trees in scikit-learnDataRobot
Slides of the talk "Gradient Boosted Regression Trees in scikit-learn" by Peter Prettenhofer and Gilles Louppe held at PyData London 2014.
Abstract:
This talk describes Gradient Boosted Regression Trees (GBRT), a powerful statistical learning technique with applications in a variety of areas, ranging from web page ranking to environmental niche modeling. GBRT is a key ingredient of many winning solutions in data-mining competitions such as the Netflix Prize, the GE Flight Quest, or the Heritage Health Price.
I will give a brief introduction to the GBRT model and regression trees -- focusing on intuition rather than mathematical formulas. The majority of the talk will be dedicated to an in depth discussion how to apply GBRT in practice using scikit-learn. We will cover important topics such as regularization, model tuning and model interpretation that should significantly improve your score on Kaggle.
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automataijait
In this research work, we have put an emphasis on the cost effective design approach for high quality pseudo-random numbers using one dimensional Cellular Automata (CA) over Maximum Length CA. This work focuses on different complexities e.g., space complexity, time complexity, design complexity and searching complexity for the generation of pseudo-random numbers in CA. The optimization procedure for
these associated complexities is commonly referred as the cost effective generation approach for pseudorandom numbers. The mathematical approach for proposed methodology over the existing maximum length CA emphasizes on better flexibility to fault coverage. The randomness quality of the generated patterns for the proposed methodology has been verified using Diehard Tests which reflects that the randomness quality
achieved for proposed methodology is equal to the quality of randomness of the patterns generated by the maximum length cellular automata. The cost effectiveness results a cheap hardware implementation for the concerned pseudo-random pattern generator. Short version of this paper has been published in [1].
Metric learning is an area of machine learning which aims to learn a distance (or similarity) measure between samples for a given task. In this presentation, I will start by briefly introducing the main ideas of metric learning and some of its applications, and show a concrete example of using metric-learn, the metric learning library in Python. I will then highlight the importance of making a machine learning package compatible with scikit-learn and discuss the challenges in the specific case of metric-learn, in particular regarding API constraints. Finally, we will dig into metric-learn's code to illustrate the main design choices, and emphasize some general issues (such as test design) that require special care when developing a machine learning toolbox.
https://github.com/metric-learn/metric-learn
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
A hybrid measure is proposed for assessing the similarity among gray-scale images. The well-known Structural Similarity Index Measure (SSIM) has been designed using a statistical approach that fails under significant noise (low PSNR). The proposed measure, denoted by SjhCorr2, uses a combination of two parts: the first part is information - theoretic, while the second part is based on 2D correlation. The concept
of symmetric joint histogram is used in the information - heoretic part. The new measure shows the advantages of statistical approaches and information - theoretic approaches. The proposed similarity approach is robust under noise. The new measure outperforms the classical SSIM in detecting image similarity at low PSN.
Abstract:
A combination of exponential and Lindley failure rate model is considered and named it as exponential-Lindley
additive failure rate model. In this paper, we studied the distributional properties, central and non-central moments,
estimation of parameters, testing of hypothesis and the power of likelihood ratio criterion about the proposed model.
Key words: Exponential distribution, Lindley distribution, ML estimation, Likelihood ratio type criterion.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
This talk gives an introduction to tree-based methods, both from a theoretical and practical point of view. It covers decision trees, random forests and boosting estimators, along with concrete examples based on Scikit-Learn about how they work, when they work and why they work.
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
MASSICCC - Une plateforme SaaS pour le traitement de la classification de données complexes hétérogènes et incomplètes.
Dans ce Tech Talk venez découvrir, tester et apprendre à maîtriser MASSICCC (Massive clustering in cloud computing) une plateforme SaaS orientée utilisateurs, ainsi que ses trois familles d’algorithmes de #classification, fruits des dernières avancées des équipes de recherche Modal & Celeste de Inria, pour analyser et faire de l’apprentissage sur vos "Big Data" (ex : en immobilier, maintenance prédictive, santé, open data, etc. ).
MASSICCC c’est aussi :
- Un accès gratuit pour le test et la recherche sur https://massiccc.lille.inria.fr
- Un "one for all" de la classification
- Une forte interprétabilité des résultats (avec ses graphiques)
- Un mode SaaS qui vous permet un suivi des expériences (en cours ou terminées)
- Et des algorithmes open source qui sont réutilisables indépendamment.
Thesis Title: Hybrid-Parallel Parameter Estimation for Frequentist and Bayesian Models
Thesis Committee: S.V.N. Vishwanathan, Manfred Warmuth, David P. Helmbold
Abstract: Distributed algorithms in machine learning follow two main flavors: data parallel, where the data is distributed across multiple workers and model parallel, where the model parameters are partitioned across multiple workers. The main limitation of the first approach is that the model parameters need to be replicated on every machine. This is problematic when the number of parameters is very large, and hence cannot fit in a single machine. This drawback of the latter approach is that the data needs to be replicated on each machine.
In this defense talk, I will present Hybrid-Parallelism, an approach that allows us to partition both, the data as well as the model parameters simultaneously. As a result, each worker only needs access to a subset of the data and a subset of the parameters while performing parameter updates. I will present case studies concerning two popular models: (1) Multinomial Logistic Regression, and (2) Mixture of Exponential Families. In both cases, I will show how to exploit the access pattern of parameter updates to derive Hybrid-Parallel asynchronous algorithms.
APPLYING TRANSFORMATION CHARACTERISTICS TO SOLVE THE MULTI OBJECTIVE LINEAR F...ijcsit
For some management programming problems, multiple objectives to be optimized rather than a single objective, and objectives can be expressed with ratio equations such as return/investment, operating
profit/net-sales, profit/manufacturing cost, etc. In this paper, we proposed the transformation characteristics to solve the multi objective linear fractional programming (MOLFP) problems. If a MOLFP problem with both the numerators and the denominators of the objectives are linear functions and some
technical linear restrictions are satisfied, then it is defined as a multi objective linear fractional programming problem MOLFPP in this research. The transformation characteristics are illustrated and the solution procedure and numerical example are presented.
This slide represents topics on PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) where I tried to cover basic PCA, application, and use of PCA and SVD, Important keywords to know about PCA briefly, PCA algorithm and implementation, Basic SVD, SVD calculation, SVD implementation, Performance comparison of SVD and PCA regarding one publicly available dataset.
N.B. Information in this slide are gathered from
1. Machine Learning course by Andrew NG,
2. Mining of Massive Dataset | Stanford University | Artificial Intelligence - All in One (youtube channel)
3. and many more they are described in the slide.
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
Multimodal biometric system utilizes two or more character modalities, e.g., face, ear, and fingerprint,
Signature, plamprint to improve the recognition accuracy of conventional unimodal methods. We propose a new
dimensionality reduction method called Dimension Diminish Projection (DDP) in this paper. DDP can not only
preserve local information by capturing the intra-modal geometry, but also extract between-class relevant
structures for classification effectively. Experimental results show that our proposed method performs better
than other algorithms including PCA, LDA and MFA.
Gradient Boosted Regression Trees in scikit-learnDataRobot
Slides of the talk "Gradient Boosted Regression Trees in scikit-learn" by Peter Prettenhofer and Gilles Louppe held at PyData London 2014.
Abstract:
This talk describes Gradient Boosted Regression Trees (GBRT), a powerful statistical learning technique with applications in a variety of areas, ranging from web page ranking to environmental niche modeling. GBRT is a key ingredient of many winning solutions in data-mining competitions such as the Netflix Prize, the GE Flight Quest, or the Heritage Health Price.
I will give a brief introduction to the GBRT model and regression trees -- focusing on intuition rather than mathematical formulas. The majority of the talk will be dedicated to an in depth discussion how to apply GBRT in practice using scikit-learn. We will cover important topics such as regularization, model tuning and model interpretation that should significantly improve your score on Kaggle.
Cost Optimized Design Technique for Pseudo-Random Numbers in Cellular Automataijait
In this research work, we have put an emphasis on the cost effective design approach for high quality pseudo-random numbers using one dimensional Cellular Automata (CA) over Maximum Length CA. This work focuses on different complexities e.g., space complexity, time complexity, design complexity and searching complexity for the generation of pseudo-random numbers in CA. The optimization procedure for
these associated complexities is commonly referred as the cost effective generation approach for pseudorandom numbers. The mathematical approach for proposed methodology over the existing maximum length CA emphasizes on better flexibility to fault coverage. The randomness quality of the generated patterns for the proposed methodology has been verified using Diehard Tests which reflects that the randomness quality
achieved for proposed methodology is equal to the quality of randomness of the patterns generated by the maximum length cellular automata. The cost effectiveness results a cheap hardware implementation for the concerned pseudo-random pattern generator. Short version of this paper has been published in [1].
Metric learning is an area of machine learning which aims to learn a distance (or similarity) measure between samples for a given task. In this presentation, I will start by briefly introducing the main ideas of metric learning and some of its applications, and show a concrete example of using metric-learn, the metric learning library in Python. I will then highlight the importance of making a machine learning package compatible with scikit-learn and discuss the challenges in the specific case of metric-learn, in particular regarding API constraints. Finally, we will dig into metric-learn's code to illustrate the main design choices, and emphasize some general issues (such as test design) that require special care when developing a machine learning toolbox.
https://github.com/metric-learn/metric-learn
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
A hybrid measure is proposed for assessing the similarity among gray-scale images. The well-known Structural Similarity Index Measure (SSIM) has been designed using a statistical approach that fails under significant noise (low PSNR). The proposed measure, denoted by SjhCorr2, uses a combination of two parts: the first part is information - theoretic, while the second part is based on 2D correlation. The concept
of symmetric joint histogram is used in the information - heoretic part. The new measure shows the advantages of statistical approaches and information - theoretic approaches. The proposed similarity approach is robust under noise. The new measure outperforms the classical SSIM in detecting image similarity at low PSN.
Abstract:
A combination of exponential and Lindley failure rate model is considered and named it as exponential-Lindley
additive failure rate model. In this paper, we studied the distributional properties, central and non-central moments,
estimation of parameters, testing of hypothesis and the power of likelihood ratio criterion about the proposed model.
Key words: Exponential distribution, Lindley distribution, ML estimation, Likelihood ratio type criterion.
Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot!
If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.
---------------------------------------------------------------
Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout.
---------------------------------------------------------------
Speaker Bio:
Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays.
Pre-requisite(if any): R /Calculus
Preparation: A laptop with R installed. Windows users might need to have RTools installed as well.
Agenda:
Introduction of Xgboost
Real World Application
Model Specification
Parameter Introduction
Advanced Features
Kaggle Winning Solution
Event arrangement:
6:45pm Doors open. Come early to network, grab a beer and settle in.
7:00-9:00pm XgBoost Demo
Reference:
https://github.com/dmlc/xgboost
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
This talk gives an introduction to tree-based methods, both from a theoretical and practical point of view. It covers decision trees, random forests and boosting estimators, along with concrete examples based on Scikit-Learn about how they work, when they work and why they work.
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
MASSICCC - Une plateforme SaaS pour le traitement de la classification de données complexes hétérogènes et incomplètes.
Dans ce Tech Talk venez découvrir, tester et apprendre à maîtriser MASSICCC (Massive clustering in cloud computing) une plateforme SaaS orientée utilisateurs, ainsi que ses trois familles d’algorithmes de #classification, fruits des dernières avancées des équipes de recherche Modal & Celeste de Inria, pour analyser et faire de l’apprentissage sur vos "Big Data" (ex : en immobilier, maintenance prédictive, santé, open data, etc. ).
MASSICCC c’est aussi :
- Un accès gratuit pour le test et la recherche sur https://massiccc.lille.inria.fr
- Un "one for all" de la classification
- Une forte interprétabilité des résultats (avec ses graphiques)
- Un mode SaaS qui vous permet un suivi des expériences (en cours ou terminées)
- Et des algorithmes open source qui sont réutilisables indépendamment.
Thesis Title: Hybrid-Parallel Parameter Estimation for Frequentist and Bayesian Models
Thesis Committee: S.V.N. Vishwanathan, Manfred Warmuth, David P. Helmbold
Abstract: Distributed algorithms in machine learning follow two main flavors: data parallel, where the data is distributed across multiple workers and model parallel, where the model parameters are partitioned across multiple workers. The main limitation of the first approach is that the model parameters need to be replicated on every machine. This is problematic when the number of parameters is very large, and hence cannot fit in a single machine. This drawback of the latter approach is that the data needs to be replicated on each machine.
In this defense talk, I will present Hybrid-Parallelism, an approach that allows us to partition both, the data as well as the model parameters simultaneously. As a result, each worker only needs access to a subset of the data and a subset of the parameters while performing parameter updates. I will present case studies concerning two popular models: (1) Multinomial Logistic Regression, and (2) Mixture of Exponential Families. In both cases, I will show how to exploit the access pattern of parameter updates to derive Hybrid-Parallel asynchronous algorithms.
Uncertainty-quantification tasks are often ``many query'' in nature, as they require repeated evaluations of a model that often corresponds to a parameterized system of nonlinear equations (e.g., arising from the spatial discretization of a PDE). To make this task tractable for large-scale models, low-fidelity models (e.g., reduced-order models, coarse-mesh solutions) must be employed. However, such approximations introduce additional error, which may be treated as a source of epistemic uncertainty that must be quantified to ensure rigor in the ultimate UQ result. We present a new approach to quantify the error (i.e., epistemic uncertainty) introduced by these low-fidelity models approximations. The approach (1) engineers features that are informative of the error using concepts related to dual-weighted residuals and rigorous error bounds, and (2) applies machine learning regression techniques (e.g., artificial neural networks, random forests, support vector machines) to construct a statistical model of the error from these features. We consider both (signed) errors in quantities of interest, as well as global state-space error norms. We present several examples to demonstrate the effectiveness of the proposed approach compared to more conventional feature and regression choices. In each of the examples, the predicted errors have a coefficient of determination value of at least 0.998.
A preliminary study of diversity in ELM ensembles (HAIS 2018)Carlos Perales
Presentation in the International Conference on Hybrid Artificial Intelligent Systems (HAIS) 2018 of a preliminary study of diversity in ensembles, applied to Extreme Learning Machine (ELM)
[update] Introductory Parts of the Book "Dive into Deep Learning"Young-Min kang
Introduction / Basics (Linear Algebra, Probability and Statistics) Bayes Classifier (Theory and Implementation) / Linear Regression (Theory and Implementation) / Softmax Regression for Classification (Theory and Implementation)
A simple framework for contrastive learning of visual representationsDevansh16
Link: https://machine-learning-made-simple.medium.com/learnings-from-simclr-a-framework-contrastive-learning-for-visual-representations-6c145a5d8e99
If you'd like to discuss something, text me on LinkedIn, IG, or Twitter. To support me, please use my referral link to Robinhood. It's completely free, and we both get a free stock. Not using it is literally losing out on free money.
Check out my other articles on Medium. : https://rb.gy/zn1aiu
My YouTube: https://rb.gy/88iwdd
Reach out to me on LinkedIn. Let's connect: https://rb.gy/m5ok2y
My Instagram: https://rb.gy/gmvuy9
My Twitter: https://twitter.com/Machine01776819
My Substack: https://devanshacc.substack.com/
Live conversations at twitch here: https://rb.gy/zlhk9y
Get a free stock on Robinhood: https://join.robinhood.com/fnud75
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.
Comments: ICML'2020. Code and pretrained models at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as: arXiv:2002.05709 [cs.LG]
(or arXiv:2002.05709v3 [cs.LG] for this version)
Submission history
From: Ting Chen [view email]
[v1] Thu, 13 Feb 2020 18:50:45 UTC (5,093 KB)
[v2] Mon, 30 Mar 2020 15:32:51 UTC (5,047 KB)
[v3] Wed, 1 Jul 2020 00:09:08 UTC (5,829 KB)
In this presentation at R user group, I share about the various advance techniques I used for Kaggle competitions. Includes: Interactive visualization via leaflet, geospatial clustering via local Moran's I, feature creation, text categorization via splitTag techniques and ensemble modeling.
Full code can be downloaded here: https://github.com/thiakx/RUGS-Meetup
Train / test data from Kaggle: http://www.kaggle.com/c/see-click-predict-fix/data
Interactive map demo: http://www.thiakx.com/misc/playground/scfMap/scfMap.html
Similar to Scaling Multinomial Logistic Regression via Hybrid Parallelism (20)
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
In silico drugs analogue design: novobiocin analogues.pptx
Scaling Multinomial Logistic Regression via Hybrid Parallelism
1. Scaling Multinomial Logistic Regression via
Hybrid Parallelism
Parameswaran Raman
Ph.D. Candidate
University of California Santa Cruz
Tech Talk: Amazon
March 17 2020
1 / 62
9. Challenges in Parameter Estimation
1 Storage limitations of Data and Model
2 Interdependence in parameter updates
6 / 62
10. Challenges in Parameter Estimation
1 Storage limitations of Data and Model
2 Interdependence in parameter updates
3 Bulk-Synchronization is expensive
6 / 62
11. Challenges in Parameter Estimation
1 Storage limitations of Data and Model
2 Interdependence in parameter updates
3 Bulk-Synchronization is expensive
4 Synchronous communication is inefficient
6 / 62
12. Challenges in Parameter Estimation
1 Storage limitations of Data and Model
2 Interdependence in parameter updates
3 Bulk-Synchronization is expensive
4 Synchronous communication is inefficient
Traditional distributed machine learning approaches fall short.
6 / 62
13. Challenges in Parameter Estimation
1 Storage limitations of Data and Model
2 Interdependence in parameter updates
3 Bulk-Synchronization is expensive
4 Synchronous communication is inefficient
Traditional distributed machine learning approaches fall short.
Hybrid-Parallel algorithms for parameter estimation!
6 / 62
17. Regularized Risk Minimization
Goals in machine learning
We want to build a model using observed (training) data
Our model must generalize to unseen (test) data
8 / 62
18. Regularized Risk Minimization
Goals in machine learning
We want to build a model using observed (training) data
Our model must generalize to unseen (test) data
min
θ
L (θ) = λ R (θ)
regularizer
+
1
N
N
i=1
loss (xi , yi , θ)
empirical risk
8 / 62
19. Regularized Risk Minimization
Goals in machine learning
We want to build a model using observed (training) data
Our model must generalize to unseen (test) data
min
θ
L (θ) = λ R (θ)
regularizer
+
1
N
N
i=1
loss (xi , yi , θ)
empirical risk
X = {x1, . . . , xN}, y = {y1, . . . , yN} is the observed training data
θ are the model parameters
8 / 62
20. Regularized Risk Minimization
Goals in machine learning
We want to build a model using observed (training) data
Our model must generalize to unseen (test) data
min
θ
L (θ) = λ R (θ)
regularizer
+
1
N
N
i=1
loss (xi , yi , θ)
empirical risk
X = {x1, . . . , xN}, y = {y1, . . . , yN} is the observed training data
θ are the model parameters
loss (·) to quantify model’s performance
8 / 62
21. Regularized Risk Minimization
Goals in machine learning
We want to build a model using observed (training) data
Our model must generalize to unseen (test) data
min
θ
L (θ) = λ R (θ)
regularizer
+
1
N
N
i=1
loss (xi , yi , θ)
empirical risk
X = {x1, . . . , xN}, y = {y1, . . . , yN} is the observed training data
θ are the model parameters
loss (·) to quantify model’s performance
regularizer R (θ) to avoid over-fitting (penalizes complex models)
8 / 62
26. Bayesian Models
Gaussian Mixture
Models (GMM)
Latent Dirichlet
Allocation (LDA)
p(θ|X)
posterior
=
likelihood
p(X|θ) ·
prior
p(θ)
p(X, θ)dθ
marginal likelihood (model evidence)
prior plays the role of regularizer R (θ)
likelihood plays the role of empirical risk
10 / 62
27. Focus on Matrix Parameterized Models
N
D
X
Data
D
K
θ
Model
11 / 62
28. Focus on Matrix Parameterized Models
N
D
X
Data
D
K
θ
Model
What if these matrices do not fit in memory?
11 / 62
32. Distributed Parameter Estimation
Data parallel
N
D
X
Data
D
K
θ
Model
e.g. L-BFGS
Model parallel
N
D
X
Data
D
K
θ
Model
e.g. LC [Gopal et al., 2013]
13 / 62
34. Distributed Parameter Estimation
Good
Easy to implement using map-reduce
Scales as long as Data or Model fits in memory
Bad
Either Data or the Model is replicated on each worker.
14 / 62
35. Distributed Parameter Estimation
Good
Easy to implement using map-reduce
Scales as long as Data or Model fits in memory
Bad
Either Data or the Model is replicated on each worker.
Data Parallel: Each worker requires O N×D
P + O (K × D)
bottleneck
14 / 62
36. Distributed Parameter Estimation
Good
Easy to implement using map-reduce
Scales as long as Data or Model fits in memory
Bad
Either Data or the Model is replicated on each worker.
Data Parallel: Each worker requires O N×D
P + O (K × D)
bottleneck
Model Parallel: Each worker requires requires O (N × D)
bottleneck
+O K×D
P
14 / 62
45. Hybrid-Parallelism
1 One versatile method for all regimes of data and model parallelism
2 Independent parameter updates on each worker
23 / 62
46. Hybrid-Parallelism
1 One versatile method for all regimes of data and model parallelism
2 Independent parameter updates on each worker
3 Fully de-centralized and asynchronous optimization algorithms
24 / 62
47. How do we achieve Hybrid Parallelism in machine learning models?
25 / 62
49. Double-Separability
Definition
A function f in two sets of parameters θ and θ is doubly separable if it
can be decomposed into sub-functions fij such that:
f (θ1, θ2, . . . , θm, θ1, θ2, . . . , θm ) =
m
i=1
m
j=1
fij (θi , θj )
27 / 62
51. Double-Separability
f (θ1, θ2, . . . , θm, θ1, θ2, . . . , θm ) =
m
i=1
m
j=1
fij (θi , θj )
x
x
x
x
x
x
x
x
x
x
x
x x
m
m
fij (θi , θj )
28 / 62
52. Double-Separability
f (θ1, θ2, . . . , θm, θ1, θ2, . . . , θm ) =
m
i=1
m
j=1
fij (θi , θj )
x
x
x
x
x
x
x
x
x
x
x
x x
m
m
fij (θi , θj )
fij corresponding to
highlighted diagonal blocks
can be computed
independently and in
parallel
28 / 62
55. Direct Double-Separability
e.g. Matrix Factorization
L(w1, w2, . . . , wN, h1, h2, . . . , hM) =
1
2
N
i=1
M
j=1
(Xij − wi , hj )2
Objective function is trivially doubly-separable! [Yun et al 2014]
29 / 62
57. Doubly-Separable Multinomial Logistic Regression
(DS-MLR)
min
W
λ
2
K
k=1
wk
2
−
1
N
N
i=1
K
k=1
yikwT
k xi +
1
N
N
i=1
log
K
k=1
exp(wT
k xi )
makes model parallelism hard
Doubly-Separable form
min
W ,A
N
i=1
K
k=1
λ wk
2
2N
−
yik wT
k xi
N
−
log ai
NK
+
exp(wT
k xi + log ai )
N
−
1
NK
31 / 62
59. Multinomial Logistic Regression (MLR)
Given:
Training data (xi , yi )i=1,...,N,
xi ∈ RD
Labels yi ∈ {1, 2, . . . , K}
N
D
X
Data
y
33 / 62
60. Multinomial Logistic Regression (MLR)
Given:
Training data (xi , yi )i=1,...,N,
xi ∈ RD
Labels yi ∈ {1, 2, . . . , K}
N
D
X
Data
y
Goal:
Learn a model W
Predict labels for the test
data points using W
D
K
W
Model
33 / 62
61. Multinomial Logistic Regression (MLR)
Given:
Training data (xi , yi )i=1,...,N,
xi ∈ RD
Labels yi ∈ {1, 2, . . . , K}
N
D
X
Data
y
Goal:
Learn a model W
Predict labels for the test
data points using W
D
K
W
Model
Assume: N, D and K are large (N >>> D >> K)
33 / 62
62. Multinomial Logistic Regression (MLR)
The probability that xi belongs to class k is given by:
p(yi = k|xi , W ) =
exp(wk
T xi )
K
j=1 exp(wj
T xi )
where W = {w1, w2, . . . , wK } denotes the parameter for the model.
34 / 62
63. Multinomial Logistic Regression (MLR)
The corresponding l2 regularized negative log-likelihood loss:
min
W
λ
2
K
k=1
wk
2
−
1
N
N
i=1
K
k=1
yikwk
T
xi +
1
N
N
i=1
log
K
k=1
exp(wk
T
xi )
where λ is the regularization hyper-parameter.
35 / 62
64. Multinomial Logistic Regression (MLR)
The corresponding l2 regularized negative log-likelihood loss:
min
W
λ
2
K
k=1
wk
2
−
1
N
N
i=1
K
k=1
yikwk
T
xi +
1
N
N
i=1
log
K
k=1
exp(wk
T
xi )
makes model parallelism hard
where λ is the regularization hyper-parameter.
36 / 62
65. Reformulation into Doubly-Separable form
Log-concavity bound [Bouchard07]
log(γ) ≤ a · γ − log(a) − 1, ∀γ, a > 0,
where a is a variational parameter. This bound is tight when a = 1
γ .
37 / 62
66. Reformulating the objective of MLR
min
W ,A
λ
2
K
k=1
wk
2
+
1
N
N
i=1
−
K
k=1
yik wk
T
xi + ai
K
k=1
exp(wk
T
xi ) − log(ai ) − 1
where ai can be computed in closed form as:
ai =
1
K
k=1 exp(wk
T xi )
38 / 62
67. Doubly-Separable Multinomial Logistic Regression
(DS-MLR)
Doubly-Separable form
min
W ,A
N
i=1
K
k=1
λ wk
2
2N
−
yik wk
T
xi
N
−
log ai
NK
+
exp(wk
T
xi + log ai )
N
−
1
NK
39 / 62
68. Doubly-Separable Multinomial Logistic Regression
(DS-MLR)
Stochastic Gradient Updates
Each term in stochastic update depends on only data point i and class k.
40 / 62
69. Doubly-Separable Multinomial Logistic Regression
(DS-MLR)
Stochastic Gradient Updates
Each term in stochastic update depends on only data point i and class k.
wk
t+1 ← wk
t − ηtK λxi − yikxi + exp wk
T xi + log ai xi
40 / 62
70. Doubly-Separable Multinomial Logistic Regression
(DS-MLR)
Stochastic Gradient Updates
Each term in stochastic update depends on only data point i and class k.
wk
t+1 ← wk
t − ηtK λxi − yikxi + exp wk
T xi + log ai xi
log ai
t+1 ← log ai
t − ηtK exp(wk
T xi + log ai ) − 1
K
40 / 62
71. Access Pattern of updates: Stoch wk, Stoch ai
X
W
A
(a) Updating wk only requires
computing ai
X
W
A
(b) Updating ai only requires
accessing wk and xi .
41 / 62
72. Updating ai: Closed form instead of Stoch update
Closed-form update for ai
ai =
1
K
k=1 exp(wk
T xi )
42 / 62
73. Access Pattern of updates: Stoch wk, Exact ai
X
W
A
(a) Updating wk only requires
computing ai
X
W
A
(b) Updating ai requires accessing
entire W. Synchronization
bottleneck!
43 / 62
74. Updating ai: Avoiding bulk-synchronization
Closed-form update for ai
ai =
1
K
k=1 exp(wk
T xi )
44 / 62
75. Updating ai: Avoiding bulk-synchronization
Closed-form update for ai
ai =
1
K
k=1 exp(wk
T xi )
Each worker computes partial sum using the wk it owns.
44 / 62
76. Updating ai: Avoiding bulk-synchronization
Closed-form update for ai
ai =
1
K
k=1 exp(wk
T xi )
Each worker computes partial sum using the wk it owns.
P workers: After P rounds the global sum is available
44 / 62
77. Parallelization: Synchronous DSGD [Gemulla et al., 2011]
X and local parameters A are partitioned horizontally (1, . . . , N)
45 / 62
78. Parallelization: Synchronous DSGD [Gemulla et al., 2011]
X and local parameters A are partitioned horizontally (1, . . . , N)
Global model parameters W are partitioned vertically (1, . . . , K)
45 / 62
79. Parallelization: Synchronous DSGD [Gemulla et al., 2011]
X and local parameters A are partitioned horizontally (1, . . . , N)
Global model parameters W are partitioned vertically (1, . . . , K)
P = 4 workers work on mutually-exclusive blocks of A and W
45 / 62
80. Parallelization: Asynchronous NOMAD [Yun et al., 2014]
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
xx
A
W
46 / 62
81. Parallelization: Asynchronous NOMAD [Yun et al 2014]
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
xx
A
W
47 / 62
82. Parallelization: Asynchronous NOMAD [Yun et al 2014]
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
xx
A
W
48 / 62
83. Parallelization: Asynchronous NOMAD [Yun et al 2014]
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
x
x
x
xx
x
x
xx
A
W
49 / 62
93. Conclusion and Key Takeaways
Data and Model grow hand in hand.
Challenges in Parameter Estimation.
58 / 62
94. Conclusion and Key Takeaways
Data and Model grow hand in hand.
Challenges in Parameter Estimation.
I have developed:
58 / 62
95. Conclusion and Key Takeaways
Data and Model grow hand in hand.
Challenges in Parameter Estimation.
I have developed:
Hybrid-Parallel formulations
58 / 62
96. Conclusion and Key Takeaways
Data and Model grow hand in hand.
Challenges in Parameter Estimation.
I have developed:
Hybrid-Parallel formulations
Distributed, Asynchronous Algorithms
58 / 62
97. Conclusion and Key Takeaways
Data and Model grow hand in hand.
Challenges in Parameter Estimation.
I have developed:
Hybrid-Parallel formulations
Distributed, Asynchronous Algorithms
Applied them to several machine learning tasks:
Classification (e.g. Multinomial Logistic Regression)
Clustering (e.g. Mixture Models)
Ranking
58 / 62