The document discusses distributed linear classification on Apache Spark. It describes using Spark to train logistic regression and linear support vector machine models on large datasets. Spark improves on MapReduce by conducting communications in-memory and supporting fault tolerance. The paper proposes using a trust region Newton method to optimize the objective functions for logistic regression and linear SVM. Conjugate gradient is used to approximate the Hessian matrix and solve the Newton system without explicitly storing the large Hessian.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Support Vector Machines in MapReduce presented an overview of support vector machines (SVMs) and how to implement them in a MapReduce framework to handle large datasets. The document discussed the theory behind basic linear SVMs and generalized multi-classification SVMs. It explained how to parallelize SVM training using stochastic gradient descent and randomly distributing samples across mappers and reducers. The document also addressed handling non-linear SVMs using kernel methods and approximations that allow SVMs to be treated as a linear problem in MapReduce. Finally, examples were given of large companies using SVMs trained on MapReduce to perform customer segmentation and improve inventory value.
Scaling out logistic regression with SparkBarak Gitsis
This document discusses scaling out logistic regression with Apache Spark. It describes the need to classify a large number of websites using machine learning. Several approaches to logistic regression were tried, including a single machine Java implementation and moving to Spark for better scalability. Spark's L-BFGS algorithm was chosen for its out of the box distributed logistic regression solution. Challenges implementing logistic regression at large scale are discussed, such as overfitting and regularization. Methods used to address these challenges include L2 regularization, cross-validation to select the regularization parameter, and extensions made to Spark's LBFGS implementation.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
The document discusses new techniques for improving the k-means clustering algorithm. It begins by describing the standard k-means algorithm and Lloyd's method. It then discusses issues with random initialization for k-means. It proposes using furthest point initialization (k-means++) as an improvement. The document also discusses parallelizing k-means initialization (k-means||) and using nearest neighbor data structures to speed up assigning points to clusters, which allows k-means to scale to many clusters. Experimental results show these techniques provide faster and higher quality clustering compared to standard k-means.
Vowpal Wabbit is a machine learning system that has four main goals: scalable and efficient machine learning, supporting new algorithm research, simplicity with few dependencies, and usability with minimal setup requirements. It uses several "tricks" like feature hashing and caching, online learning, and importance weighting to achieve scalability. It also supports newer algorithms like adaptive learning rates and dimensional correction. Vowpal Wabbit can be run in parallel on large clusters to handle terascale problems with billions of examples.
study Streaming Multigrid For Gradient Domain Operations On Large ImagesChiamin Hsu
The document describes a streaming multigrid solver for solving Poisson's equation on large images. It develops a multigrid method using a B-spline finite element basis that can efficiently process images in a streaming fashion using only a small window of image rows in memory at a time. The method achieves accurate solutions to Poisson's equation on gigapixel images in only 2 V-cycles by leveraging the temporal locality of the multigrid algorithm.
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Support Vector Machines in MapReduce presented an overview of support vector machines (SVMs) and how to implement them in a MapReduce framework to handle large datasets. The document discussed the theory behind basic linear SVMs and generalized multi-classification SVMs. It explained how to parallelize SVM training using stochastic gradient descent and randomly distributing samples across mappers and reducers. The document also addressed handling non-linear SVMs using kernel methods and approximations that allow SVMs to be treated as a linear problem in MapReduce. Finally, examples were given of large companies using SVMs trained on MapReduce to perform customer segmentation and improve inventory value.
Scaling out logistic regression with SparkBarak Gitsis
This document discusses scaling out logistic regression with Apache Spark. It describes the need to classify a large number of websites using machine learning. Several approaches to logistic regression were tried, including a single machine Java implementation and moving to Spark for better scalability. Spark's L-BFGS algorithm was chosen for its out of the box distributed logistic regression solution. Challenges implementing logistic regression at large scale are discussed, such as overfitting and regularization. Methods used to address these challenges include L2 regularization, cross-validation to select the regularization parameter, and extensions made to Spark's LBFGS implementation.
Multinomial Logistic Regression with Apache SparkDB Tsai
Logistic Regression can not only be used for modeling binary outcomes but also multinomial outcome with some extension. In this talk, DB will talk about basic idea of binary logistic regression step by step, and then extend to multinomial one. He will show how easy it's with Spark to parallelize this iterative algorithm by utilizing the in-memory RDD cache to scale horizontally (the numbers of training data.) However, there is mathematical limitation on scaling vertically (the numbers of training features) while many recent applications from document classification and computational linguistics are of this type. He will talk about how to address this problem by L-BFGS optimizer instead of Newton optimizer.
Bio:
DB Tsai is a machine learning engineer working at Alpine Data Labs. He is recently working with Spark MLlib team to add support of L-BFGS optimizer and multinomial logistic regression in the upstream. He also led the Apache Spark development at Alpine Data Labs. Before joining Alpine Data labs, he was working on large-scale optimization of optical quantum circuits at Stanford as a PhD student.
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
The document discusses new techniques for improving the k-means clustering algorithm. It begins by describing the standard k-means algorithm and Lloyd's method. It then discusses issues with random initialization for k-means. It proposes using furthest point initialization (k-means++) as an improvement. The document also discusses parallelizing k-means initialization (k-means||) and using nearest neighbor data structures to speed up assigning points to clusters, which allows k-means to scale to many clusters. Experimental results show these techniques provide faster and higher quality clustering compared to standard k-means.
Vowpal Wabbit is a machine learning system that has four main goals: scalable and efficient machine learning, supporting new algorithm research, simplicity with few dependencies, and usability with minimal setup requirements. It uses several "tricks" like feature hashing and caching, online learning, and importance weighting to achieve scalability. It also supports newer algorithms like adaptive learning rates and dimensional correction. Vowpal Wabbit can be run in parallel on large clusters to handle terascale problems with billions of examples.
study Streaming Multigrid For Gradient Domain Operations On Large ImagesChiamin Hsu
The document describes a streaming multigrid solver for solving Poisson's equation on large images. It develops a multigrid method using a B-spline finite element basis that can efficiently process images in a streaming fashion using only a small window of image rows in memory at a time. The method achieves accurate solutions to Poisson's equation on gigapixel images in only 2 V-cycles by leveraging the temporal locality of the multigrid algorithm.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Graphs are the natural data structure to represent relations. Graph algorithms show irregular memory access pattern. This causes, distributed-memory parallel graph algorithms to do more communication than computation. When an algorithm generates more work the more communication they need to do. The amount of work can be reduced with frequent synchronization. However, the overhead of frequent synchronization reduces the performance of distributed-memory parallel graph algorithms. Abstract Graph Machine (AGM) is a model that can control the amount of synchronization and the amount of work generated by an algorithm,
This document discusses GPU computing and CUDA programming. It begins with an introduction to GPU computing and CUDA. CUDA (Compute Unified Device Architecture) allows programming of Nvidia GPUs for parallel computing. The document then provides examples of optimizing matrix multiplication and closest pair problems using CUDA. It also discusses implementing and optimizing convolutional neural networks (CNNs) and autoencoders for GPUs using CUDA. Performance results show speedups for these deep learning algorithms when using GPUs versus CPU-only implementations.
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
The document describes Threp, a lightweight remapping framework for use in Earth system models. Threp aims to provide a flexible, readable, and efficient framework for remapping data between different grid types, including regular, rectilinear, curvilinear, and unstructured grids. It supports operations like interpolation, masking, and extrapolation between source and destination grids. Threp uses a two-stage process of first generating interpolation weights and then applying those weights to remap data values. It is designed for parallel computation and to be easily extensible to support new interpolation methods and grid types.
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
PyTorch is an open-source machine learning library for Python. It is primarily developed by Facebook's AI research group. The document discusses setting up PyTorch, including installing necessary packages and configuring development environments. It also provides examples of core PyTorch concepts like tensors, common datasets, and constructing basic neural networks.
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
* Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)Hansol Kang
The document discusses generative adversarial networks (GANs). It begins with an introduction to GANs, describing their concept and training process. It then reviews a seminal GAN paper, discussing its mathematical formulation of GAN training as a minimax game and theoretical results showing global optimality can be achieved. The document concludes by outlining the configuration, implementation, and flowchart for a GAN experiment.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterACSG Section Montréal
La plus importantes nouveautés de la base de données spatiale open source PostgreSQL/PostGIS 2.0 est le support pour les données raster. PostGIS Raster comprend un outil d’importation similaire à shp2pgsql basé sur GDAL et une série d’opérateurs SQL pour la manipulation et l'analyse des données matricielles. Le nouveau type RASTER est géoréférencé, multi-résolutions et multi-bandes et il supporte une valeur nulle (nodata) et un type de valeur de pixel par bande. PostGIS raster s’inspire de la simplicité de l’expérience vecteur offerte par PostGIS pour rendre toutes les opérations raster aussi simples que possible. Comme pour une couverture vecteur, une couverture raster est divisée en un ensemble d’enregistrements (une ligne = une tuile) stockés dans une seule table (contrairement à Oracle Spatial qui utilise deux types et donc deux tables ou plus). Il est possible d’importer une couverture complète et de la retuiler en une seule commande avec l’outil d’importation et de multiples résolutions de la même couverture peuvent être importées dans des tables adjacentes. Les propriétés des objets raster et de chacune des bandes peuvent être consultées et modifiées ainsi que les valeurs des pixels. Des fonctions existent pour obtenir le minimum, le maximum, la somme, la moyenne, la déviation standard, l’histogramme d’une tuile ou d’une couverture complète. Les fonctions ST_Intersection() et ST_Intersects() fonctionnent pratiquement de manière transparente entre des données raster et vecteur et une série de fonctions pour l’algèbre matricielle (ST_MapAlgebra()) permet de faire de l’analyse de type raster. Il est possible de reclasser les bandes et de les convertir en n’importe quel format d’écriture GDAL. Des fonctions pour générer des rasters et des bandes existent également pour du développement PL/pgSQL. Un driver GDAL pour convertir les couvertures raster en fichiers images est en développement et des plugins pour QGIS et svSIG existent déjà pour les visualiser.
Chap 8. Optimization for training deep modelsYoung-Geun Choi
연구실 내부 세미나 자료. Goodfellow et al. (2016), Deep Learning, MIT Press의 Chapter 8을 요약/발췌하였습니다. 깊은 신경망(deep neural network) 모형 훈련시 목적함수 최적화 방법으로 흔히 사용되는 방법들을 소개합니다.
The document summarizes a deep learning programming course for artificial intelligence. The course covers topics like machine learning, deep learning, convolutional neural networks, recurrent neural networks, and applications of deep learning in medicine. It provides an overview of each week's topics, including an introduction to AI and machine learning in week 3, deep learning in week 4, and applications of AI in medicine in week 5.
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
Generalized linear models (GLMs) are a class of models that include linear regression, logistic regression, and other forms. GLMs are implemented in both MLlib and SparkR in Spark. They support various solvers like gradient descent, L-BFGS, and iteratively re-weighted least squares. Performance is optimized through techniques like sparsity, tree aggregation, and avoiding unnecessary data copies. Future work includes better handling of categoricals, more model statistics, and model parallelism.
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
Nonlinear methods are widely used to produce higher performance compared with linear methods; however, nonlinear methods are generally more expensive in model size, training time, and scoring phase. With proper feature engineering techniques like polynomial expansion, the linear methods can be as competitive as those nonlinear methods. In the process of mapping the data to higher dimensional space, the linear methods will be subject to overfitting and instability of coefficients which can be addressed by penalization methods including Lasso and Elastic-Net. Finally, we'll show how to train linear models with Elastic-Net regularization using MLlib.
Several learning algorithms such as kernel methods, decision tress, and random forests are nonlinear approaches which are widely used to have better performance compared with linear methods. However, with feature engineering techniques like polynomial expansion by mapping the data into a higher dimensional space, the performance of linear methods can be as competitive as those nonlinear methods. As a result, linear methods remain to be very useful given that the training time of linear methods is significantly faster than the nonlinear ones, and the model is just a simple small vector which makes the prediction step very efficient and easy. However, by mapping the data into higher dimensional space, those linear methods are subject to overfitting and instability of coefficients, and those issues can be successfully addressed by penalization methods including Lasso and Elastic-Net. Lasso method with L1 penalty tends to result in many coefficients shrunk exactly to zero and a few other coefficients with comparatively little shrinkage. L2 penalty trends to result in all small but non-zero coefficients. Combining L1 and L2 penalties are called Elastic-Net method which tends to give a result in between. In the first part of the talk, we'll give an overview of linear methods including commonly used formulations and optimization techniques such as L-BFGS and OWLQN. In the second part of talk, we will talk about how to train linear models with Elastic-Net using our recent contribution to Spark MLlib. We'll also talk about how linear models are practically applied with big dataset, and how polynomial expansion can be used to dramatically increase the performance.
DB Tsai is an Apache Spark committer and a Senior Research Engineer at Netflix. He is recently working with Apache Spark community to add several new algorithms including Linear Regression and Binary Logistic Regression with ElasticNet (L1/L2) regularization, Multinomial Logistic Regression, and LBFGS optimizer. Prior to joining Netflix, DB was a Lead Machine Learning Engineer at Alpine Data Labs, where he developed innovative large-scale distributed linear algorithms, and then contributed back to open source Apache Spark project.
Gradient descent optimization with simple examples. covers sgd, mini-batch, momentum, adagrad, rmsprop and adam.
Made for people with little knowledge of neural network.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Graphs are the natural data structure to represent relations. Graph algorithms show irregular memory access pattern. This causes, distributed-memory parallel graph algorithms to do more communication than computation. When an algorithm generates more work the more communication they need to do. The amount of work can be reduced with frequent synchronization. However, the overhead of frequent synchronization reduces the performance of distributed-memory parallel graph algorithms. Abstract Graph Machine (AGM) is a model that can control the amount of synchronization and the amount of work generated by an algorithm,
This document discusses GPU computing and CUDA programming. It begins with an introduction to GPU computing and CUDA. CUDA (Compute Unified Device Architecture) allows programming of Nvidia GPUs for parallel computing. The document then provides examples of optimizing matrix multiplication and closest pair problems using CUDA. It also discusses implementing and optimizing convolutional neural networks (CNNs) and autoencoders for GPUs using CUDA. Performance results show speedups for these deep learning algorithms when using GPUs versus CPU-only implementations.
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
This presentation about backpropagation and gradient descent will cover the basics of how backpropagation and gradient descent plays a role in training neural networks - using an example on how to recognize the handwritten digits using a neural network. After predicting the results, you will see how to train the network using backpropagation to obtain the results with high accuracy. Backpropagation is the process of updating the parameters of a network to reduce the error in prediction. You will also understand how to calculate the loss function to measure the error in the model. Finally, you will see with the help of a graph, how to find the minimum of a function using gradient descent. Now, let’s get started with learning backpropagation and gradient descent in neural networks.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning, and artificial intelligence
Learn more at https://www.simplilearn.com/deep-learning-course-with-tensorflow-training
The document describes Threp, a lightweight remapping framework for use in Earth system models. Threp aims to provide a flexible, readable, and efficient framework for remapping data between different grid types, including regular, rectilinear, curvilinear, and unstructured grids. It supports operations like interpolation, masking, and extrapolation between source and destination grids. Threp uses a two-stage process of first generating interpolation weights and then applying those weights to remap data values. It is designed for parallel computation and to be easily extensible to support new interpolation methods and grid types.
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik
Abstract: In this paper a digital image coding technique called ML-SRHWT (Machine Learning based image coding by Superlative Rapid HAAR Wavelet Transform) has been introduced. Compression of digital image is done using the model Superlative Rapid HAAR Wavelet Transform (SRHWT). The Least Square Support vector Machine regression predicts hyper coefficients obtained by using QPSO model. The mathematical models are discussed in brief in this paper are SRHWT, which results in good performance and reduces the complexity compared to FHAAR and EQPSO by replacing the least good particle with the new best obtained particle in QPSO. On comparing the ML-SRHWT with JPEG and JPEG2000 standards, the former is considered to be the better.
Stochastic gradient descent and its tuningArsalan Qadri
This paper talks about optimization algorithms used for big data applications. We start with explaining the gradient descent algorithms and its limitations. Later we delve into the stochastic gradient descent algorithms and explore methods to improve it it by adjusting learning rates.
PyTorch is an open-source machine learning library for Python. It is primarily developed by Facebook's AI research group. The document discusses setting up PyTorch, including installing necessary packages and configuring development environments. It also provides examples of core PyTorch concepts like tensors, common datasets, and constructing basic neural networks.
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
* Ledig, Christian, et al. "Photo-realistic single image super-resolution using a generative adversarial network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)Hansol Kang
The document discusses generative adversarial networks (GANs). It begins with an introduction to GANs, describing their concept and training process. It then reviews a seminal GAN paper, discussing its mathematical formulation of GAN training as a minimax game and theoretical results showing global optimality can be achieved. The document concludes by outlining the configuration, implementation, and flowchart for a GAN experiment.
An improved spfa algorithm for single source shortest path problem using forw...IJMIT JOURNAL
We present an improved SPFA algorithm for the single source shortest path problem. For a random graph,
the empirical average time complexity is O(|E|), where |E| is the number of edges of the input network.
SPFA maintains a queue of candidate vertices and add a vertex to the queue only if that vertex is relaxed.
In the improved SPFA, MinPoP principle is employed to improve the quality of the queue. We theoretically
analyse the advantage of this new algorithm and experimentally demonstrate that the algorithm is efficient.
Stockage, manipulation et analyse de données matricielles avec PostGIS RasterACSG Section Montréal
La plus importantes nouveautés de la base de données spatiale open source PostgreSQL/PostGIS 2.0 est le support pour les données raster. PostGIS Raster comprend un outil d’importation similaire à shp2pgsql basé sur GDAL et une série d’opérateurs SQL pour la manipulation et l'analyse des données matricielles. Le nouveau type RASTER est géoréférencé, multi-résolutions et multi-bandes et il supporte une valeur nulle (nodata) et un type de valeur de pixel par bande. PostGIS raster s’inspire de la simplicité de l’expérience vecteur offerte par PostGIS pour rendre toutes les opérations raster aussi simples que possible. Comme pour une couverture vecteur, une couverture raster est divisée en un ensemble d’enregistrements (une ligne = une tuile) stockés dans une seule table (contrairement à Oracle Spatial qui utilise deux types et donc deux tables ou plus). Il est possible d’importer une couverture complète et de la retuiler en une seule commande avec l’outil d’importation et de multiples résolutions de la même couverture peuvent être importées dans des tables adjacentes. Les propriétés des objets raster et de chacune des bandes peuvent être consultées et modifiées ainsi que les valeurs des pixels. Des fonctions existent pour obtenir le minimum, le maximum, la somme, la moyenne, la déviation standard, l’histogramme d’une tuile ou d’une couverture complète. Les fonctions ST_Intersection() et ST_Intersects() fonctionnent pratiquement de manière transparente entre des données raster et vecteur et une série de fonctions pour l’algèbre matricielle (ST_MapAlgebra()) permet de faire de l’analyse de type raster. Il est possible de reclasser les bandes et de les convertir en n’importe quel format d’écriture GDAL. Des fonctions pour générer des rasters et des bandes existent également pour du développement PL/pgSQL. Un driver GDAL pour convertir les couvertures raster en fichiers images est en développement et des plugins pour QGIS et svSIG existent déjà pour les visualiser.
Chap 8. Optimization for training deep modelsYoung-Geun Choi
연구실 내부 세미나 자료. Goodfellow et al. (2016), Deep Learning, MIT Press의 Chapter 8을 요약/발췌하였습니다. 깊은 신경망(deep neural network) 모형 훈련시 목적함수 최적화 방법으로 흔히 사용되는 방법들을 소개합니다.
The document summarizes a deep learning programming course for artificial intelligence. The course covers topics like machine learning, deep learning, convolutional neural networks, recurrent neural networks, and applications of deep learning in medicine. It provides an overview of each week's topics, including an introduction to AI and machine learning in week 3, deep learning in week 4, and applications of AI in medicine in week 5.
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
Generalized linear models (GLMs) are a class of models that include linear regression, logistic regression, and other forms. GLMs are implemented in both MLlib and SparkR in Spark. They support various solvers like gradient descent, L-BFGS, and iteratively re-weighted least squares. Performance is optimized through techniques like sparsity, tree aggregation, and avoiding unnecessary data copies. Future work includes better handling of categoricals, more model statistics, and model parallelism.
2015-06-15 Large-Scale Elastic-Net Regularized Generalized Linear Models at S...DB Tsai
Nonlinear methods are widely used to produce higher performance compared with linear methods; however, nonlinear methods are generally more expensive in model size, training time, and scoring phase. With proper feature engineering techniques like polynomial expansion, the linear methods can be as competitive as those nonlinear methods. In the process of mapping the data to higher dimensional space, the linear methods will be subject to overfitting and instability of coefficients which can be addressed by penalization methods including Lasso and Elastic-Net. Finally, we'll show how to train linear models with Elastic-Net regularization using MLlib.
Several learning algorithms such as kernel methods, decision tress, and random forests are nonlinear approaches which are widely used to have better performance compared with linear methods. However, with feature engineering techniques like polynomial expansion by mapping the data into a higher dimensional space, the performance of linear methods can be as competitive as those nonlinear methods. As a result, linear methods remain to be very useful given that the training time of linear methods is significantly faster than the nonlinear ones, and the model is just a simple small vector which makes the prediction step very efficient and easy. However, by mapping the data into higher dimensional space, those linear methods are subject to overfitting and instability of coefficients, and those issues can be successfully addressed by penalization methods including Lasso and Elastic-Net. Lasso method with L1 penalty tends to result in many coefficients shrunk exactly to zero and a few other coefficients with comparatively little shrinkage. L2 penalty trends to result in all small but non-zero coefficients. Combining L1 and L2 penalties are called Elastic-Net method which tends to give a result in between. In the first part of the talk, we'll give an overview of linear methods including commonly used formulations and optimization techniques such as L-BFGS and OWLQN. In the second part of talk, we will talk about how to train linear models with Elastic-Net using our recent contribution to Spark MLlib. We'll also talk about how linear models are practically applied with big dataset, and how polynomial expansion can be used to dramatically increase the performance.
DB Tsai is an Apache Spark committer and a Senior Research Engineer at Netflix. He is recently working with Apache Spark community to add several new algorithms including Linear Regression and Binary Logistic Regression with ElasticNet (L1/L2) regularization, Multinomial Logistic Regression, and LBFGS optimizer. Prior to joining Netflix, DB was a Lead Machine Learning Engineer at Alpine Data Labs, where he developed innovative large-scale distributed linear algorithms, and then contributed back to open source Apache Spark project.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
In Part II of the Anomaly Detection Series, we discuss the challenges in analyzing Temporal datasets and discuss methods for outlier analysis. We focus on single time series and discuss point outlier and sub-sequence methods.
Anomaly detection (or Outlier analysis) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. It is used is applications such as intrusion detection, fraud detection, fault detection and monitoring processes in various domains including energy, healthcare and finance. In this talk, we will introduce anomaly detection and discuss the various analytical and machine learning techniques used in in this field. Through a case study, we will discuss how anomaly detection techniques could be applied to energy data sets. We will also demonstrate, using R and Apache Spark, an application to help reinforce concepts in anomaly detection and best practices in analyzing and reviewing results.
Machine Learning with Big Data using Apache SparkInSemble
"Machine Learning with Big Data
using Apache Spark" was presented to Lansing Big Data and Hadoop User Group by Muk Agaram and Amit Singh on 3/31/2015. It goes over the basics of machine learning and demos a use case of predicting recession using Apache Spark through Logistic Regression, SVM and Random Forest Algorithm
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will discuss the basics of Neural Networks and discuss how Deep Learning Neural networks are different from conventional Neural Network architectures. We will review a bit of mathematics that goes into building neural networks and understand the role of GPUs in Deep Learning. We will also get an introduction to Autoencoders, Convolutional Neural Networks, Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
Interest is growing in the Apache Spark community in using Deep Learning techniques and in the Deep Learning community in scaling algorithms with Apache Spark. A few of them to note include:
· Databrick’s efforts in scaling Deep learning with Spark
· Intel announcing the BigDL: A Deep learning library for Spark
· Yahoo’s recent efforts to opensource TensorFlowOnSpark
In this lecture we will discuss the key use cases and developments that have emerged in the last year in using Deep Learning techniques with Spark.
Big Data Analytics and Ubiquitous Computing is a document that discusses big data analytics using Apache Spark and ubiquitous computing concepts. It provides an overview of Spark, including Resilient Distributed Datasets (RDDs), and libraries for SQL, machine learning, graph processing, and streaming. It also discusses parallel FP-Growth (PFP) for recommendation and ubiquitous computing approaches like edge computing, cloudlets, fog computing, and virtualization. Virtual conferencing using tools like Google Meet, Skype and Microsoft Teams is also summarized.
Spark is a fast and general cluster computing system that improves on MapReduce by keeping data in-memory between jobs. It was developed in 2009 at UC Berkeley and open sourced in 2010. Spark core provides in-memory computing capabilities and a programming model that allows users to write programs as transformations on distributed datasets.
Presentation detailed about capabilities of In memory Analytic using Apache Spark. Apache Spark overview with programming mode, cluster mode with Mosos, supported operations and comparison with Hadoop Map Reduce. Elaborating Apache Spark Stack expansion like Shark, Streaming, MLib, GraphX
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
This document provides an introduction to Apache Spark and collaborative filtering. It discusses big data and the limitations of MapReduce, then introduces Apache Spark including Resilient Distributed Datasets (RDDs), transformations, actions, and DataFrames. It also covers Spark Machine Learning (ML) libraries and algorithms such as classification, regression, clustering, and collaborative filtering.
Spark is a cluster computing framework designed to be fast, general-purpose, and able to handle a wide range of workloads including batch processing, iterative algorithms, interactive queries, and streaming. It is faster than Hadoop for interactive queries and complex applications by running computations in-memory when possible. Spark also simplifies combining different processing types through a single engine. It offers APIs in Java, Python, Scala and SQL and integrates closely with other big data tools like Hadoop. Spark is commonly used for interactive queries on large datasets, streaming data processing, and machine learning tasks.
This document provides an introduction to Apache Spark, including its history and key concepts. It discusses how Spark was developed in response to big data processing needs at Google and how it builds upon earlier systems like MapReduce. The document then covers Spark's core abstractions like RDDs and DataFrames/Datasets and common transformations and actions. It also provides an overview of Spark SQL and how to deploy Spark applications on a cluster.
The document provides an overview of Apache Spark, including what it is, its ecosystem, features, and architecture. Some key points:
- Apache Spark is an open-source cluster computing framework for large-scale data processing. It is up to 100x faster than Hadoop for iterative/interactive algorithms.
- Spark features include its RDD abstraction, lazy evaluation, and use of DAGs to optimize performance. It supports Scala, Java, Python, and R.
- The Spark ecosystem includes tools like Spark SQL, MLlib, GraphX, and Spark Streaming. It can run on Hadoop YARN, Mesos, or in standalone mode.
- Spark's architecture includes the SparkContext,
Unified Big Data Processing with Apache SparkC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1yNuLGF.
Matei Zaharia talks about the latest developments in Spark and shows examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code. Filmed at qconsf.com.
Matei Zaharia is an assistant professor of computer science at MIT, and CTO of Databricks, the company commercializing Apache Spark.
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
This document discusses Apache Spark, a fast and general engine for big data processing. It describes how Spark generalizes the MapReduce model through its Resilient Distributed Datasets (RDDs) abstraction, which allows efficient sharing of data across parallel operations. This unified approach allows Spark to support multiple types of processing, like SQL queries, streaming, and machine learning, within a single framework. The document also outlines ongoing developments like Spark SQL and improved machine learning capabilities.
2015 01-17 Lambda Architecture with Apache Spark, NextML ConferenceDB Tsai
Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. In Lambda architecture, the system involves three layers: batch processing, speed (or real-time) processing, and a serving layer for responding to queries, and each comes with its own set of requirements.
In batch layer, it aims at perfect accuracy by being able to process the all available big dataset which is an immutable, append-only set of raw data using distributed processing system. Output will be typically stored in a read-only database with result completely replacing existing precomputed views. Apache Hadoop, Pig, and HIVE are
the de facto batch-processing system.
In speed layer, the data is processed in streaming fashion, and the real-time views are provided by the most recent data. As a result, the speed layer is responsible for filling the "gap" caused by the batch layer's lag in providing views based on the most recent data. This layer's views may not be as accurate as the views provided by batch layer's views created with full dataset, so they will be eventually replaced by the batch layer's views. Traditionally, Apache Storm is
used in this layer.
In serving layer, the result from batch layer and speed layer will be stored here, and it responds to queries in a low-latency and ad-hoc way.
One of the lambda architecture examples in machine learning context is building the fraud detection system. In speed layer, the incoming streaming data can be used for online learning to update the model learnt in batch layer to incorporate the recent events. After a while, the model can be rebuilt using the full dataset.
Why Spark for lambda architecture? Traditionally, different
technologies are used in batch layer and speed layer. If your batch system is implemented with Apache Pig, and your speed layer is implemented with Apache Storm, you have to write and maintain the same logics in SQL and in Java/Scala. This will very quickly becomes a maintenance nightmare. With Spark, we have an unified development framework for batch and speed layer at scale. In this talk, an end-to-end example implemented in Spark will be shown, and we will
discuss about the development, testing, maintenance, and deployment of lambda architecture system with Apache Spark.
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
Paco Nathan, Director of Community Evangelism at Databricks
Apache Spark is intended as a fast and powerful general purpose engine for processing Hadoop data. Spark supports combinations of batch processing, streaming, SQL, ML, Graph, etc., for applications written in Scala, Java, Python, Clojure, and R, among others. In this talk, I'll explore how Spark fits into the Big Data landscape. In addition, I'll describe other systems with which Spark pairs nicely, and will also explain why Spark is needed for the work ahead.
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
Spark has evolved its APIs and engine over the last 6 years to combine the best aspects of previous systems like databases, MapReduce, and data frames. Its latest structured APIs like DataFrames provide a declarative interface inspired by data frames in R/Python for ease of use, along with optimizations from databases for performance and future-proofing. This unified approach allows Spark to scale massively like MapReduce while retaining flexibility.
The document summarizes several open source big data analytics toolkits that support Hadoop, including RHadoop, Mahout, MADLib, HiveMall, H2O, and Spark-MLLib. It describes each toolkit's key features such as algorithms supported, performance, ease of use, and architecture. Spark-MLLib provides machine learning algorithms in a distributed in-memory framework for improved performance compared to disk-based approaches. MADLib and H2O also operate directly on data in memory for faster iterative modeling.
Spark is a unified analytics engine for large-scale data processing. It provides APIs for SQL queries, streaming data, and machine learning. Spark uses RDDs (Resilient Distributed Datasets) as its fundamental data abstraction, which allows data to be operated on in parallel. RDDs track lineage information to efficiently recover lost data. Spark offers advantages over MapReduce like being faster, using less code, and supporting iterative algorithms. It can also be used for both batch and streaming workloads using the same APIs. While still maturing, Spark is gaining popularity for its ease of use and performance.
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks
Big data and AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models AI has always been on of the most exciting applications of big data and Apache Spark. Increasingly Spark users want to integrate Spark with distributed deep learning and machine learning frameworks built for state-of-the-art training. On the other side, increasingly DL/AI users want to handle large and complex data scenarios needed for their production pipelines.
This talk introduces a new project that substantially improves the performance and fault-recovery of distributed deep learning and machine learning frameworks on Spark. We will introduce the major directions and provide progress updates, including 1) barrier execution mode for distributed DL training, 2) fast data exchange between Spark and DL frameworks, and 3) accelerator-awareness scheduling.
This document discusses Scala and big data technologies. It provides an overview of Scala libraries for working with Hadoop and MapReduce, including Scalding which provides a Scala DSL for Cascading. It also covers Spark, a cluster computing framework that operates on distributed datasets in memory for faster performance. Additional Scala projects for data analysis using functional programming approaches on Hadoop are also mentioned.
In this one day workshop, we will introduce Spark at a high level context. Spark is fundamentally different than writing MapReduce jobs so no prior Hadoop experience is needed. You will learn how to interact with Spark on the command line and conduct rapid in-memory data analyses. We will then work on writing Spark applications to perform large cluster-based analyses including SQL-like aggregations, machine learning applications, and graph algorithms. The course will be conducted in Python using PySpark.
This document provides an agenda and summaries for a meetup on introducing DataFrames and R on Apache Spark. The agenda includes overviews of Apache Spark 1.3, DataFrames, R on Spark, and large scale machine learning on Spark. There will also be discussions on news items, contributions so far, what's new in Spark 1.3, more data source APIs, what DataFrames are, writing DataFrames, and DataFrames with RDDs and Parquet. Presentations will cover Spark components, an introduction to SparkR, and Spark machine learning experiences.
Similar to Large scale logistic regression and linear support vector machines using spark (20)
This document provides an introduction to Tokyo Tech and Tokyo, Japan. It first gives an overview of Tokyo, including its basic information, population statistics, major industries, and must-see places like Tokyo Tower, Tokyo Skytree, and Shibuya. It then discusses Japanese traditions such as anime, public transportation, food, and clothing. The document then summarizes Tokyo Tech, providing its location, student enrollment breakdown, history, key facilities, extracurricular activities including notable club and research achievements, and examples of accomplished alumni and Nobel Prize winners.
Gunkanjima, or Battleship Island, is an abandoned island located 15 kilometers from Nagasaki, Japan that was once home to over 5,000 residents working in coal mining. The island became a UNESCO World Heritage Site in 2015 due to its role in Japan's Meiji-era industrialization through coal mining and steel production. At its peak in the 1960s, the island's population density was nine times that of Tokyo due to high-rise apartment buildings constructed to house the many coal miners and their families living and working on the island. However, the coal mine closed in 1974 as energy production shifted from coal to petroleum, leading the remaining residents to leave and resulting in Gunkanjima
Scientific realism is the view that scientific theories aim for truth and acceptance, involving the belief that theories are true. The document outlines scientific realism by discussing defining principles in natural sciences, classification essence, explanation with models, and three realms of the world. It examines how classification involves nominal and real essences, how models represent unobservable phenomena, and how different realms can be observed with and without instruments.
The document discusses improving generalization performance in large mini-batch training for distributed deep learning. It proposes using second-order optimization methods like natural gradient descent (NGD) and K-FAC to converge with fewer iterations, and smoothing the objective function with data augmentation techniques like mixup to avoid sharp minima and improve accuracy. Experimental results show that K-FAC converges faster than SGD but suffers from greater accuracy degradation with large batches. Applying mixup to K-FAC training reduces the accuracy degradation and improves generalization.
The document analyzes the leadership style of Reed Hastings, CEO of Netflix, through a case study. It summarizes Hastings' background and the history of Netflix. It identifies Hastings' leadership style as egalitarian and innovative. It analyzes two examples of Hastings' leadership: 1) Changing Netflix's business model from DVD rental to streaming and 2) Cultivating a unique culture that emphasizes employee freedom, responsibility, and reward over hierarchy. The analysis applies leadership frameworks to demonstrate how Hastings exhibits qualities like courage, judgment, and enabling others. It concludes by recommending lessons for developing one's own leadership skills.
This document summarizes recent research on deep learning with low precision numerical representations. It discusses 6 different methods for reducing precision: 16-bit fixed-point, dynamic fixed-point, 8-bit approximate representation, BinaryConnect, BinaryNet, and XNOR-Nets. For each method, it provides an abstract, the key techniques used, experimental results on datasets like MNIST and CIFAR-10, and a discussion of findings. The overall goal of these methods is to reduce computational and memory costs while maintaining high recognition performance in neural networks.
This document summarizes research conducted by Preferred Networks to train a ResNet-50 model on the ImageNet dataset using a minibatch size of 32,000 across 1024 Tesla P100 GPUs. They were able to complete 90 training epochs in 15 minutes, achieving a validation accuracy of 74.9%. To enable training with such a large minibatch, they employed techniques like RMSprop warmup, slow start learning rates, and batch normalization without moving averages. The training was conducted on Preferred Networks' in-house cluster MN-1 which has 128 nodes each with 8 GPUs connected via InfiniBand FDR interconnect.
New media art encompasses artworks created with emerging technologies like digital art, computer graphics, virtual art, and interactive art. It differentiates itself from traditional visual arts through its cultural objects and social interactions. New media art often involves participation between the artist and observer or among observers. The origins of new media art can be traced back to kinetic art in the early 20th century, but it began incorporating new technologies like video in the 1960s and expanded with computer graphics and the internet in later decades. New media art influences ideas around hypertext, databases, and networks to explore interactive and nonlinear narratives.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
2. Selected Paper 2
- Published in:
IEEE International Conference on Big Data, 2014
- Date of Conferrence:
October 27-30, 2014
- http://www.csie.ntu.edu.tw/~cjlin/papers/spark-liblinear/spark-liblinear.pdf
3. Abstract 3
- Logistic regression and linear SVM are useful methods for
large-scale classification.
However, their distributed implementations have not been
well studied.
4. Abstract 4
- Logistic regression and linear SVM are useful methods for
large-scale classification.
However, their distributed implementations have not been
well studied.
- Recently, because of the inefficiency of the MapReduce
framework on iterative algorithms, Spark, an in-memory
cluster-computing platform, has been proposed.It has
emerged as a popular framework for large-scale data
processing and analytics.
5. Abstract 5
- Logistic regression and linear SVM are useful methods for
large-scale classification.However, their distributed
implementations have not been well studied.
- Recently, because of the inefficiency of the MapReduce
framework on iterative algorithms, Spark, an in-memory
cluster-computing platform, has been proposed.It has
emerged as a popular framework for large-scale data
processing and analytics.
- In this work,consider a distributed Newton method for
solving logistic regression as well linear SVM and
implement it on Spark.
8. Linear Classification on One Computer 8
- Linear classification on one machine is a mature technique:
millions of data can be trained in a few seconds.
9. Linear Classification on One Computer 9
- Linear classification on one machine is a mature technique:
millions of data can be trained in a few seconds.
- What if the data are even bigger than the capacity of our
machine?
10. Linear Classification on One Computer 10
- Linear classification on one machine is a mature technique:
millions of data can be trained in a few seconds.
- What if the data are even bigger than the capacity of our
machine?
Solution 1: get a machine with larger memory/disk.
-> The data loading time would be too lengthy.
11. Linear Classification on One Computer 11
- Linear classification on one machine is a mature technique:
millions of data can be trained in a few seconds.
- What if the data are even bigger than the capacity of our
machine?
Solution 1: get a machine with larger memory/disk.
-> The data loading time would be too lengthy.
Solution 2: distributed training.
13. Distributed Linear Classification 13
- In distributed training, data loaded in parallel to reduce
the I/O time. (e.g. HDFS)
- With more machines, computation is faster.
14. Distributed Linear Classification 14
- In distributed training, data loaded in parallel to reduce
the I/O time. (e.g. HDFS)
- With more machines, computation is faster.
- But communication and synchronization cost become
significant.
15. Distributed Linear Classification 15
- In distributed training, data loaded in parallel to reduce
the I/O time. (e.g. HDFS)
- With more machines, computation is faster.
- But communication and synchronization cost become
significant.
- To keep the training efficiency, we need to consider
algorithms with less communication cost, and examine
implementation details carefully.
16. Distributed Linear Classification on
Apache Spark
16
- Train logistic regression (LR) and L2-loss linear support
vector machine (SVM) models on Apache Spark (Zaharia
et al., 2010).
17. Distributed Linear Classification on
Apache Spark
17
- Train logistic regression (LR) and L2-loss linear support
vector machine (SVM) models on Apache Spark (Zaharia
et al., 2010).
- Why Spark?
MPI (Snir and Otto, 1998) is efficient, but does not
support fault tolerance.
18. Distributed Linear Classification on
Apache Spark
18
- Train logistic regression (LR) and L2-loss linear support
vector machine (SVM) models on Apache Spark (Zaharia
et al., 2010).
- Why Spark?
MPI (Snir and Otto, 1998) is efficient, but does not
support fault tolerance.
MapReduce (Dean and Ghemawat, 2008) supports
fault tolerance, but is slow in communication.
19. Distributed Linear Classification on
Apache Spark
19
- Why Spark?
MPI (Snir and Otto, 1998) is efficient, but does not
support fault tolerance.
MapReduce (Dean and Ghemawat, 2008) supports
fault tolerance, but is slow in communication.
Spark combines advantages of both frameworks.
20. Distributed Linear Classification on
Apache Spark
20
- Why Spark?
MPI (Snir and Otto, 1998) is e cient, but does not
support fault tolerance.
MapReduce (Dean and Ghemawat, 2008) supports
fault tolerance, but is slow in communication.
Spark combines advantages of both frameworks.
Communications conducted in-memory.
21. Distributed Linear Classification on
Apache Spark
21
- Why Spark?
Spark combines advantages of both frameworks.
Communications conducted in-memory.
Supports fault tolerance.
- However, Spark is new and still under development.
(2014)
- Therefore it is necessary to examine important
implementation issues to ensure efficiency.
23. Apache Spark 23
- Only the master-slave framework.
- Data fault tolerance: Hadoop Distributed File System
(Borthakur, 2008).
24. Apache Spark 24
- Only the master-slave framework.
- Data fault tolerance: Hadoop Distributed File System
(Borthakur, 2008).
- Computation fault tolerance:
25. Apache Spark 25
- Only the master-slave framework.
- Data fault tolerance: Hadoop Distributed File System
(Borthakur, 2008).
- Computation fault tolerance: Read-only Resilient
Distributed Datasets (RDD) and lineage (Zaharia et al.,
2012).
Basic idea: reconduct operations recorded in lineage on
immutable RDDs.
26. Apache Spark 26
- Lineage (Zaharia et al., 2012).
Spark firstly creates a logical plan (namely data dependency graph) for each application.
27. Apache Spark 27
- Lineage (Zaharia et al., 2012).
Then it transforms the logical plan into a physical plan (a DAG graph of map/reduce stages
and map/reduce tasks).
28. Apache Spark 28
- Lineage (Zaharia et al., 2012).
After that, concrete map/reduce tasks will be lanuched to process the input data.
29. Apache Spark (skip) 29
package org.apache.spark.examples
import java.util.Random
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
/**
* Usage: GroupByTest [numMappers] [numKVPairs]
[valSize] [numReducers]
*/
object GroupByTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("GroupBy
Test")
var numMappers = 100
var numKVPairs = 10000
var valSize = 1000
var numReducers = 36
val sc = new SparkContext(sparkConf)
val pairs1 = sc.parallelize(0 until numMappers,
numMappers).flatMap { p =>
val ranGen = new Random
var arr1 = new Array[(Int, Array[Byte])]
(numKVPairs)
for (i <- 0 until numKVPairs) {
val byteArr = new Array[Byte](valSize)
ranGen.nextBytes(byteArr)
arr1(i) = (ranGen.nextInt(Int.MaxValue),
byteArr)
}
arr1
}.cache
// Enforce that everything has been calculated and
in cache
pairs1.count
println(pairs1.groupByKey(numReducers).count)
sc.stop()
}
}
GroupByTest.scala- Lineage (Zaharia et al., 2012).
spark code example
30. 30
package org.apache.spark.examples
import java.util.Random
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
/**
* Usage: GroupByTest [numMappers] [numKVPairs]
[valSize] [numReducers]
*/
object GroupByTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("GroupBy
Test")
var numMappers = 100
var numKVPairs = 10000
var valSize = 1000
var numReducers = 36
val sc = new SparkContext(sparkConf)
val pairs1 = sc.parallelize(0 until numMappers,
numMappers).flatMap { p =>
val ranGen = new Random
var arr1 = new Array[(Int, Array[Byte])]
(numKVPairs)
for (i <- 0 until numKVPairs) {
val byteArr = new Array[Byte](valSize)
ranGen.nextBytes(byteArr)
arr1(i) = (ranGen.nextInt(Int.MaxValue),
byteArr)
}
arr1
}.cache
// Enforce that everything has been calculated and
in cache
pairs1.count
println(pairs1.groupByKey(numReducers).count)
sc.stop()
}
}
GroupByTest.scala
[1]. Initialize SparkConf.
[2]. Initialize numMappers=100, numKVPairs=10,000, valSize=1000, numReducers=
36.
[3]. Initialize SparkContext, which creates the necessary objects and actors for the
driver.
[4].Each mapper creats an arr1: Array[(Int, Byte[])], which has numKVPairs
elements. Each Int is a random integer, and each byte array's size is valSize. We can
estimate Size(arr1) = numKVPairs * (4 + valSize) = 10MB, so that Size(pairs1) =
numMappers * Size(arr1) =1000MB.
[5].Each mapper is instructed to cache its arr1 array into the memory.
[6].The action count() is applied to sum the number of elements in arr1 in all
mappers, the result is numMappers * numKVPairs = 1,000,000. This action triggers
the caching of arr1s.
[7].groupByKey operation is performed on cached pairs1. The reducer number
(a.k.a., partition number) is numReducers. Theoretically, if hash(key) is evenly
distributed, each reducer will receive numMappers * numKVPairs / numReducer =
27,777 pairs of (Int, Array[Byte]), with a size of Size(pairs1) / numReducer = 27MB.
[8].Reducer aggregates the records with the same Int key, the result is (Int,
List(Byte[], Byte[], ..., Byte[])).
[9].Finally, a count() action sums up the record number in each reducer, the final
result is actually the number of distinct integers in pairs1.
- Lineage (Zaharia et al., 2012).
spark code explanation
Apache Spark (skip)
31. Apache Spark 31
- Read-only Resilient Distributed Datasets (RDD)
RDD is a mechanism capable of holding the data to be repeatedly used in the memory.
MapReduce of Hadoop had been held, fault tolerance, data locality, scalability has taken
over as it is.
1.an RDD is a read-only, partitioned
collection of records.
2.an RDD has enough information
about how it was derived from other
datasets (its lineage)
3.persistence and partitioning
4.lazy operations
32. Apache Spark 32
- Read-only Resilient Distributed Datasets (RDD)
reference
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-
Memory Cluster Computing
http://www-bcf.usc.edu/~minlanyu/teach/csci599-fall12/papers/
nsdi_spark.pdf
- My article of Qiita
http://qiita.com/Hiroki11x/items/4f5129094da4c91955bc
34. Logistic Regression and Linear Support
Vector Machine
34
Most linear classification models consider the following optimization problem
{(xi,yi)}li=1, xi ∈ Rn, yi ∈ {−1,1}, ∀i, :Given a set of training label-instance pairs
C > 0 : User-specified parameter
ξ(w;xi,yi) : loss function
35. Logistic Regression and Linear Support
Vector Machine
35
Most linear classification models consider the following optimization problem
The objective function f has two parts: the regularizer that controls the complexity of
the model, and the loss that measures the error of the model on the training data. The
loss function ξ(w;xi,yi) is typically a convex function in w.
regularizer loss function
36. Logistic Regression and Linear Support
Vector Machine
36
regularizer loss function
Problem (1) is referred as L1-loss and L2-loss SVM if (2) and (3) is used, respectively.
It is known that (2) and (3) are differentiable while (2) is not and is thus more difficult to
optimize. Therefore, we focus on solving LR and L2-loss SVM in the rest of the paper.
37. Logistic Regression and Linear Support
Vector Machine
37
regularizer loss function
Problem (1) is referred as LR if (4) is used.
38. Logistic Regression and Linear Support
Vector Machine
38
Most linear classification models consider the following optimization problem
{(xi,yi)}li=1, xi ∈ Rn, yi ∈ {−1,1}, ∀i, :Given a set of training label-instance pairs
C > 0 : User-specified parameter
ξ(w;xi,yi) : loss function
Use a trust region Newton method to minimize f(w) (Lin and Mor ́e, 1999).
39. Trust Region Newton Method 39
- Trust region Newton method is a type of truncated Newton approach.
- We consider the trust region method (Lin and Mor´e, 1999), which is a
truncated Newton method to deal with general bound-constrained
optimization problems (i.e., variables are in certain intervals).
- At each iteration of a trust region Newton method for minimizing f(w), we
have an iterate wt, a size ∆t of the trust region, and a quadratic model.
Most linear classification models consider the following optimization problem
To discuss Newton methods, we need the Hessian of f(w):
40. Truncated Newton method(skip) 40
- Trust region Newton method is a type of truncated Newton approach.
- To save time, one may use only an“approximate” Newton direction in the
early stages of the outer iterations.
- Such a technique is called truncated Newton method (or inexact Newton
method).
41. Newton Method (skip) 41
- Newton's method assumes that the function can be locally approximated as
a quadratic in the region around the optimum, and uses the first and second
derivatives to find the stationary point.
- In higher dimensions, Newton's method uses the gradient and the Hessian
matrix of second derivatives of the function to be minimized.
To discuss Newton methods, we need the Hessian of f(w):
43. Trust Region Newton Method
At iteration t, given iterate wt and trust region ∆t > 0, solve
43
(6) is the second-order Taylor approximation of
Adjust the trust region size by ρt.
If n is large: ∈ Rn×n is too large to store.
Consider Hessian-free methods.
44. Trust Region Newton Method
At iteration t, given iterate wt and trust region ∆t > 0, solve
44
(6) is the second-order Taylor approximation of
Because is too large to be formed and stored, a Hessian-free
approach of applying CG (Conjugate Gradient) iterations is used to
approximately solve (5) .
At each CG iteration we only need to obtain the Hessian-vector
product with some vector generated by the CG
procedure.
45. Use a conjugate gradient (CG) method.
CG is an iterative method: only needs for some at
each iteration.(it’s not necessary to calculate ∇2f (wt ))
For LR and SVM, at each CG iteration we compute
is the data matrix and D is a diagonal matrix with values
determined by wt.
Trust Region Newton Method 45
Hessian-free methods.
46. - an algorithm for the numerical solution of particular systems of
linear equations, namely those whose matrix is symmetric and
positive-definite.
Conjugate gradient (CG) method (skip) 46
47. Distributed Hessian-vector Products
- (8) is bottleneck which is the product between the Hessian matrix
r2
f(wt
) and the vector si
- This operation can possibly be parallelized in a distributed
environment as parallel vector products.
47
48. Distributed Hessian-vector Products
- first partition the data matrix X and the labels Y into disjoint p parts.
- reformulate the function, the gradient and the Hessian- vector
products of (1) as follows.
48
49. Distributed Hessian-vector Products
Data matrix X is distributedly stored
The functions and are the map functions operating on
the k-th partition. We can observe that for computing (12)-(14), only
the data partition Xkis needed in computing.
49
50. Distributed Hessian-vector Products
- p ≥ (#slave nodes) for parallelization.
- Two communications per operation:
1. Master sends wt and the current v to the slaves.
2. Slaves return X T Di Xi v to master.
- The same scheme for computing function/gradient.
Data matrix X is distributedly stored
50
53. Implementation design 53
- In this section, they study implementation issues for their
software.
- They name their distributed TRON implementation Spark
LIBLINEAR because algorithmically it is an extension of
the TRON implementation in the software LIBLINEAR[10]
54. Implementation design 54
- In this section, they study implementation issues for their
software.
- They name their distributed TRON implementation Spark
LIBLINEAR because algorithmically it is an extension of
the TRON implementation in the software LIBLINEAR[10]
- Spark is implemented in Scala, we use the same
language. So, The implementation of Spark LIBLINEAR
involves complicated design issues resulting from Java,
Scala and Spark.
55. Implementation design 55
- Spark is implemented in Scala, we use the same
language. So, The implementation of Spark LIBLINEAR
involves complicated design issues resulting from Java,
Scala and Spark.
56. Implementation design 56
- Spark is implemented in Scala, we use the same
language. So, The implementation of Spark LIBLINEAR
involves complicated design issues resulting from Java,
Scala and Spark.
For example, in contrast to traditional languages
like C and C++, similar expressions in Scala may
easily behave differently.
57. Implementation design 57
- Spark is implemented in Scala, we use the same
language. So, The implementation of Spark LIBLINEAR
involves complicated design issues resulting from Java,
Scala and Spark.
For example, in contrast to traditional languages
like C and C++, similar expressions in Scala may
easily behave differently.
It is hence easy to introduce overheads in
developing Scala programs
58. Implementation design 58
- They analyze the following different implementation
issues for efficient computation, communication and
memory usage.
•Programming language:
◦ Loop structure
◦ Data encapsulation
•Operations on RDD:
◦ Using mapPartitions rather than map
◦ Caching intermediate information or not
•Communication:
◦ Using broadcast variables
◦ The cost of the reduce function
59. Implementation design 59
- They analyze the following different implementation
issues for efficient computation, communication and
memory usage.
•Programming language:
◦ Loop structure
◦ Data encapsulation
•Operations on RDD:
◦ Using mapPartitions rather than map
◦ Caching intermediate information or not
•Communication:
◦ Using broadcast variables
◦ The cost of the reduce function
related to Java and Scala
related to Spark
60. 60
- From (12)-(14), clearly the computational bottleneck at
each node is on the products between the data matrix Xk
(or Xk
T
) and a vector v.
To compute this matrix-vector product, a loop to conduct
inner products between all xi ∈ Xk and v is executed many
times, it is the main computation in this algorithm.
Scala Issue: Loop structures
61. 61
- From (12)-(14), clearly the computational bottleneck at
each node is on the products between the data matrix Xk
(or Xk
T
) and a vector v.
To compute this matrix-vector product, a loop to conduct
inner products between all xi ∈ Xk and v is executed many
times, it is the main computation in this algorithm.
- Although a for loop is the most straightforward way to
implement an inner product, unfortunately, it is known
that in Scala, a for loop may be slower than a while
loop.2
Scala Issue: Loop structures
62. 62
- To study this issue, we discuss three methods to
implement the inner product: for-generator, for-range
and while.
for-generator
for-range
while
Scala Issue: Loop structures
63. 63Experiment Data Information
density: avg. ratio of non-zero features per instance.
ps : used for the experiments of loops and encapsulation
pe : applied for the rest. In the experiments of loops and encapsulation
66. Scala Issue: Loop structures 66
two nodesone node
The reason is that when data is split between two nodes,
each node requires conducting fewer loops.
67. 67
- The translation comes with overheads and the combination
becomes complicated when more operations are applied.
- The optimization of a for expression has not been a focus
in Scala development because this expression is too
imperative to consist with the functional programming
principle.
- In contrast, a while loop is a loop rather than an
expression.
Scala Issue: Loop structures
68. 68
- The translation comes with overheads and the combination
becomes complicated when more operations are applied.
- The optimization of a for expression has not been a focus
in Scala development because this expression is too
imperative to consist with the functional programming
principle.
- In contrast, a while loop is a loop rather than an
expression.
- The while loop is chosen to implement their software.
Scala Issue: Loop structures
69. 69
- follow LIBLINEAR to represent data as a sparse matrix,
where only non-zero entries are stored.
This strategy is important to handle large-scale data.
Scala Issue: Encapsulation
70. 70
- follow LIBLINEAR to represent data as a sparse matrix,
where only non-zero entries are stored.
This strategy is important to handle large-scale data.
Scala Issue: Encapsulation
For example, for a given 5-dimensional feature
vector (2, 0, 0, 8, 0), only two index- value pairs of
non-zero entries are stored.
71. 71
- follow LIBLINEAR to represent data as a sparse matrix,
where only non-zero entries are stored.
This strategy is important to handle large-scale data.
Scala Issue: Encapsulation
For example, for a given 5-dimensional feature
vector (2, 0, 0, 8, 0), only two index- value pairs of
non-zero entries are stored.
- Investigate how to store the index-value information such as
“1:2” and “4:8” in memory. The discussion is based on two
encapsulation implementations: the Class approach (CA)
and the Array approach (AA).
72. 72
- CA encapsulates a pair of index and feature value into a
class, and maintains an array of class objects for each
instance.
Scala Issue: Encapsulation
73. 73
- CA encapsulates a pair of index and feature value into a
class, and maintains an array of class objects for each
instance.
Scala Issue: Encapsulation
- In contrast, AA directly uses two arrays to store indices and
feature values of an instance.
74. 74
- CA encapsulates a pair of index and feature value into a
class, and maintains an array of class objects for each
instance.
Scala Issue: Encapsulation
- In contrast, AA directly uses two arrays to store indices and
feature values of an instance.
- AA is faster because it directly accesses indices and values,
while the CPU cache must access the pointers of class
objects first if CA is applied
75. 75
- The second term of the Hessian-vector product (7) can be
represented as the following form.
RDD: Using mapPartitions rather than map
where a(yi,xi,w,v) = Di,ixTi v, can be computed by either
map or mapPartitions.
76. 76
- The second term of the Hessian-vector product (7) can be
represented as the following form.
where a(yi,xi,w,v) = Di,ixTi v, can be computed by either
map or mapPartitions.
RDD: Using mapPartitions rather than map
77. 77
- Then map and reduce operations can be directly applied.
- Considerable overheads occur in the map operations
because for each instance xi, an intermediate vector a(xi, yi,
w, v)xi is created.
RDD: Using mapPartitions rather than map
78. 78
- In addition to the above-mentioned overheads, the reduce
function in Algorithm 3 involves complicated computation.
- Then map and reduce operations can be directly applied.
- Considerable overheads occur in the map operations
because for each instance xi, an intermediate vector a(xi, yi,
w, v)xi is created.
RDD: Using mapPartitions rather than map
79. 79
where a(yi,xi,w,v) = Di,ixTi v, can be computed by either
map or mapPartitions.
RDD: Using mapPartitions rather than map
80. 80
- To avoid the overheads and the complicated additions of
sparse vectors, they consider the mapPartitions operation in
Spark.
RDD: Using mapPartitions rather than map
81. 81
- This setting ensures that computing a Hessian-vector
product involves only p intermediate vectors.
- The overheads of using map- Partitions is less than that of
using map with l intermediate vectors.
RDD: Using mapPartitions rather than map
82. 82
- Note that the technique of using mapPartitions can also be
applied to compute the gradient.
RDD: Using mapPartitions rather than map
83. 83RDD: Using mapPartitions rather than map
- Use 16 nodes in this experiment.
- the longer running time of map implies its higher
computation cost
84. 84
- The calculations of (12)-(14) all involve the vector (Yk Xk w).
- Instinctively, if this vector is cached and shared between
different operations, the training procedure can be more
efficient.
- In fact, the single-machine package LIBLINEAR uses this
strategy in their implementation of TRON.
RDD: Caching intermediate information or not
85. 85
- Spark does not allow any single operation to gather
information from two different RDDs and run a user-
specified function such as (12), (13) or (14).
- It is necessary to create one new RDD per iteration to store
both the training data and the information to be cached.
- Unfortunately, this approach incurs severe overheads in
copying the training data from the original RDD to the new
one.
RDD: Caching intermediate information or not
86. 86
- Based on the above discussion, they decide not to cache (Yk
Xk w) because recomputing them is more cost-effective.
- This example demonstrates that specific properties of a
parallel programming framework may strongly affect the
implementation.
RDD: Caching intermediate information or not
87. 87
- In Algorithm 2, communication occurs at two places. The
first one is sending w and v from the master machine to the
slave machines, and the second one is reducing the results
of (12)-(14) to the master machine.
RDD: Communication
88. 88
- In Spark, when an RDD is split into partitions, one single
operation on this RDD is divided into tasks working on
different partitions.
- Under this setting, many redundant communications occur
because just need to send a copy to each slave machine
but not each partition.
- In such a case where each partition shares the same
information from the master, it is recommended to use
broadcast variables
RDD: Using Broadcast Variables
89. 89
- Use broadcast variables to improve.
RDD: Using Broadcast Variables
Read-only variables shared among partitions in the
same node.
Cached in the slave machines.
90. 90RDD: The Cost of the reduce Function
- Slaves to master: Spark by default collect results from each
partition separately.
- Use the coalesce function: Merge partitions on the same
node before communication.
91. 91RDD: The Cost of the reduce Function
Use 16 nodes in this experiment.
Fig. 4. Broadcast variables and coalesce: We present running time (in seconds) versus the relative objective value difference. We run LR with C = 1
on 16 nodes. Note that broadcast-cl represents the implementation with both broadcast variables and the coalesce function.
92. 92RDD: The Cost of the reduce Function
- for data with few features like covtype and webspam,
adopting broadcast variables slightly degrades the
efficiency because the communication cost is low and
broadcast variables introduce some overheads.
- Regarding the coalesce function, it is bene- ficial for all data
sets.
94. Related Works 94
- MLlib is a machine learning library implemented in Apache
Spark.
- A stochastic gradient method for LR and SVM (but default
batch size is the whole data).
95. Related Works 95
Use 16 nodes in this experiment.
Fig. 6. Comparison with MLlib: We present running time (in seconds, log scale) versus the relative objective value difference. We run LR with C = 1
on 16 nodes.
96. Related Works 96
- The convergence of MLlib is rather slow in comparison with
Spark LIBLINEAR.
- The reason is that the GD method is known to have slow
convergence, while TRON enjoys fast quadratic local
convergence for LR. Note that as MLlib requires more
iterations to converge, the communication cost is also
higher.
97. Related Works 97
- A C++/MPI implementation by Zhuang et al. (2014) of the
distributed trust region Newton algorithm in this paper.
- No fault tolerance.
- Should be faster than our implementation:
More computational e cient: implemented in C++.
More communicational e cient: the slave-slave
structure with all-reduce only communicates once
per operation.
- Should be faster, but need to know how large is the
difference.
101. Related Works 101
m means multi-core
- using multiple cores is not beneficial on yahoo-japan and
yahoo-korea.
- A careful profiling shows that the bottleneck of the training
time on these data sets is communication and using more
cores does not reduce this cost.
103. Discussions and Conclusions 103
- Consider a distributed trust region Newton algorithm on
Spark for training LR and linear SVM.
104. Discussions and Conclusions 104
- Consider a distributed trust region Newton algorithm on
Spark for training LR and linear SVM.
- Many implementation issues are thoroughly studied with
careful empirical examinations.
105. Discussions and Conclusions 105
- Consider a distributed trust region Newton algorithm on
Spark for training LR and linear SVM.
- Many implementation issues are thoroughly studied with
careful empirical examinations.
- Implementation in this paper on Spark is competitive with
state-of-the-art packages. (2014)
106. Discussions and Conclusions 106
- Consider a distributed trust region Newton algorithm on
Spark for training LR and linear SVM.
- Many implementation issues are thoroughly studied with
careful empirical examinations.
- Implementation in this paper on Spark is competitive with
state-of-the-art packages. (2014)
- Spark LIBLINEAR is an distributed extension of LIBLINEAR
and it is available.