A talk on algorithms for parallel big data analytics in Spark. We present an algorithm to speed up ALS for collaborative filtering (think "the Netflix prize"), and show how this leads to significant speedup when implemented efficiently in parallel in Spark.
Best Paper Award at ICPADS 2015, Melbourne.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
Spotify uses a range of Machine Learning models to power its music recommendation features including the Discover page and Radio. Due to the iterative nature of training these models they suffer from IO overhead of Hadoop and are a natural fit to the Spark programming paradigm. In this talk I will present both the right way as well as the wrong way to implement collaborative filtering models with Spark. Additionally, I will deep dive into how Matrix Factorization is implemented in the MLlib library.
Recommender Systems with Apache Spark's ALS FunctionWill Johnson
A quick visual guide to recommender systems (user based, item based, and matrix factorization) and the code behind making an apache spark MatrxFactorization Model with the ALS function.
Algorithmic Music Recommendations at SpotifyChris Johnson
In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
There is increasing need for large-scale recommendation systems. Typical solutions rely on periodically retrained batch algorithms, but for massive amounts of data, training a new model could take hours. This is a problem when the model needs to be more up-to-date. For example, when recommending TV programs while they are being transmitted the model should take into consideration users who watch a program at that time.
The promise of online recommendation systems is fast adaptation to changes, but methods of online machine learning from streams is commonly believed to be more restricted and hence less accurate than batch trained models. Combining batch and online learning could lead to a quickly adapting recommendation system with increased accuracy. However, designing a scalable data system for uniting batch and online recommendation algorithms is a challenging task. In this talk we present our experiences in creating such a recommendation engine with Apache Flink and Apache Spark.
Spotify uses a range of Machine Learning models to power its music recommendation features including the Discover page and Radio. Due to the iterative nature of training these models they suffer from IO overhead of Hadoop and are a natural fit to the Spark programming paradigm. In this talk I will present both the right way as well as the wrong way to implement collaborative filtering models with Spark. Additionally, I will deep dive into how Matrix Factorization is implemented in the MLlib library.
Recommender Systems with Apache Spark's ALS FunctionWill Johnson
A quick visual guide to recommender systems (user based, item based, and matrix factorization) and the code behind making an apache spark MatrxFactorization Model with the ALS function.
Algorithmic Music Recommendations at SpotifyChris Johnson
In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
Building Data Pipelines for Music Recommendations at SpotifyVidhya Murali
In this talk, we will get into the architectural and functional details as to how we build scalable and robust data pipelines for music recommendations at Spotify. We will also discuss some of the challenges and an overview of work to address these challenges.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines.
Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning.
Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting.
Abstract Summary:
Harnessing Neural Networks:
Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times.
In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf
Attention Neural Net Model Fundamentals: Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.
This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.
Entity Resolution is the task of disambiguating manifestations of real world entities through linking and grouping and is often an essential part of the data wrangling process. There are three primary tasks involved in entity resolution: deduplication, record linkage, and canonicalization; each of which serve to improve data quality by reducing irrelevant or repeated data, joining information from disparate records, and providing a single source of information to perform analytics upon. However, due to data quality issues (misspellings or incorrect data), schema variations in different sources, or simply different representations, entity resolution is not a straightforward process and most ER techniques utilize machine learning and other stochastic approaches.
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
paper at ICML 2019; "L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA"
openr eview link : https://openreview.net/forum?id=S1E3Ko09F7
Artificial Neural Networks have been very successfully used in several machine learning applications. They are often the building blocks when building deep learning systems. We discuss the hypothesis, training with backpropagation, update methods, regularization techniques.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
Teaching K-Means New Tricks: Over 50 years old, the k-means algorithm remains one of the most popular clustering algorithms. In this talk we’ll cover some recent developments, including better initialization, the notion of coresets, clustering at scale, and clustering with outliers.
Josh Patterson, Principal at Patterson Consulting: Introduction to Parallel Iterative Machine Learning Algorithms on Hadoop’s NextGeneration YARN Framework
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
Neural Turing Machines: Perils and Promise: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.
Recommender Systems from A to Z – Model EvaluationCrossing Minds
The third meetup will be about evaluating different models for our recommender system. We will review the strategies we have to check if a model is under fitting or overfitting. After that, we will present and analyze the losses that are typically used in recommendation systems to train models. We will compare regression, classification, and rank based losses and when it's convenient to use each one. Finally, we are going to cover all the metrics that are typically used to evaluate the performance of different recommendation systems and how to test that the models are giving good results in production.
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
The Swift scripting language was created to provide a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting to achieve this common high-level task. The result was a highly portable programming model based on implicitly parallel functional dataflow. The same Swift script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving workflow computations from laptop to distributed and/or high performance systems.
Swift has proven to be very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling. It’s more recently been adapted to serve as a programming model for much finer-grain in-memory workflow on extreme scale systems, where it can perform task rates in the millions to billion-per-second.
In this talk, we describe the state of Swift’s implementation, present several Swift applications, and discuss ideas for of the future evolution of the programming model on which it’s based.
Building Data Pipelines for Music Recommendations at SpotifyVidhya Murali
In this talk, we will get into the architectural and functional details as to how we build scalable and robust data pipelines for music recommendations at Spotify. We will also discuss some of the challenges and an overview of work to address these challenges.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
Multi-algorithm Ensemble Learning at Scale: Software, Hardware and Algorithmic Approaches: Multi-algorithm ensemble machine learning methods are often used when the true prediction function is not easily approximated by a single algorithm. The Super Learner algorithm, also known as stacking, combines multiple, typically diverse, base learning algorithms into a single, powerful prediction function through a secondary learning process called metalearning. Although ensemble methods offer superior performance over their singleton counterparts, there is an implicit computational cost to ensembles, as it requires training and cross-validating multiple base learning algorithms.
We will demonstrate a variety of software- and hardware-based approaches that lead to more scalable ensemble learning software, including a highly scalable implementation of stacking called “H2O Ensemble”, built on top of the open source, distributed machine learning platform, H2O. H2O Ensemble scales across multi-node clusters and allows the user to create ensembles of deep neural networks, Gradient Boosting Machines, Random Forest, and others. As for algorithm-based approaches, we will present two algorithmic modifications to the original stacking algorithm that further reduce computation time — Subsemble algorithm and the Online Super Learner algorithm. This talk will also include benchmarks of the implementations of these new stacking variants.
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines.
Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning.
Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting.
Abstract Summary:
Harnessing Neural Networks:
Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times.
In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15MLconf
Attention Neural Net Model Fundamentals: Neural networks have regained popularity over the last decade because they are demonstrating real world value in different applications (e.g. targeted advertising, recommender engines, Siri, self driving cars, facial recognition). Several model types are currently explored in the field with recurrent neural networks (RNN) and convolution neural networks (CNN) taking the top focus. The attention model, a recently developed RNN variant, has started to play a larger role in both natural language processing and image analysis research.
This talk will cover the fundamentals of the attention model structure and how its applied to visual and speech analysis. I will provide an overview of the model functionality and math including a high-level differentiation between soft and hard types. The goal is to give you enough of an understanding of what the model is, how it works and where to apply it.
Entity Resolution is the task of disambiguating manifestations of real world entities through linking and grouping and is often an essential part of the data wrangling process. There are three primary tasks involved in entity resolution: deduplication, record linkage, and canonicalization; each of which serve to improve data quality by reducing irrelevant or repeated data, joining information from disparate records, and providing a single source of information to perform analytics upon. However, due to data quality issues (misspellings or incorrect data), schema variations in different sources, or simply different representations, entity resolution is not a straightforward process and most ER techniques utilize machine learning and other stochastic approaches.
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
paper at ICML 2019; "L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR STRUCTURED DATA"
openr eview link : https://openreview.net/forum?id=S1E3Ko09F7
Artificial Neural Networks have been very successfully used in several machine learning applications. They are often the building blocks when building deep learning systems. We discuss the hypothesis, training with backpropagation, update methods, regularization techniques.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
Teaching K-Means New Tricks: Over 50 years old, the k-means algorithm remains one of the most popular clustering algorithms. In this talk we’ll cover some recent developments, including better initialization, the notion of coresets, clustering at scale, and clustering with outliers.
Josh Patterson, Principal at Patterson Consulting: Introduction to Parallel Iterative Machine Learning Algorithms on Hadoop’s NextGeneration YARN Framework
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
Neural Turing Machines: Perils and Promise: Daniel Shank is a Senior Data Scientist at Talla, a company developing a platform for intelligent information discovery and delivery. His focus is on developing machine learning techniques to handle various business automation tasks, such as scheduling, polls, expert identification, as well as doing work on NLP. Before joining Talla as the company’s first employee in 2015, Daniel worked with TechStars Boston and did consulting work for ThriveHive, a small business focused marketing company in Boston. He studied economics at the University of Chicago.
Recommender Systems from A to Z – Model EvaluationCrossing Minds
The third meetup will be about evaluating different models for our recommender system. We will review the strategies we have to check if a model is under fitting or overfitting. After that, we will present and analyze the losses that are typically used in recommendation systems to train models. We will compare regression, classification, and rank based losses and when it's convenient to use each one. Finally, we are going to cover all the metrics that are typically used to evaluate the performance of different recommendation systems and how to test that the models are giving good results in production.
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
The Swift scripting language was created to provide a simple, compact way to write parallel scripts that run many copies of ordinary programs concurrently in various workflow patterns, reducing the need for complex parallel programming or arcane scripting to achieve this common high-level task. The result was a highly portable programming model based on implicitly parallel functional dataflow. The same Swift script runs on multi-core computers, clusters, grids, clouds, and supercomputers, and is thus a useful tool for moving workflow computations from laptop to distributed and/or high performance systems.
Swift has proven to be very general, and is in use in domains ranging from earth systems to bioinformatics to molecular modeling. It’s more recently been adapted to serve as a programming model for much finer-grain in-memory workflow on extreme scale systems, where it can perform task rates in the millions to billion-per-second.
In this talk, we describe the state of Swift’s implementation, present several Swift applications, and discuss ideas for of the future evolution of the programming model on which it’s based.
1. Big Data Analytics
- Big Data
- Spark: Big Data Analytics
- Resilient Distributed Datasets (RDD)
- Spark libraries (SQL, DataFrames, MLlib for machine learning, GraphX, and Streaming)
- PFP: Parallel FP-Growth
2. Ubiquitous Computing
- Edge Computing
- Cloudlet
- Fog computing
- Internet of Things (IoT)
- Virtualization
- Virtual Conferencing
- Virtual Events (2D, 3D, and Hybrid)
Data streaming fundamentals- EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
Stream processing refers to the set of techniques and tools which are used to analyze and performs actions on real-time collected data, such as continuous time series being generated by sensors. The analysis of data streams brings two main benefits, obtain an understanding of the observed environment, including the forces and the structure of physical phenomena, train a model and produce forecasting information. Giuseppe will introduce main concepts underlying stream processing and time series data management.
Visit: https://www.eudat.eu/eudat-summer-school
This was the deck I used for the Hadoop Meetup talk at Bangalore on 18th of July 2013. The talk was titled "Big-data Analytics: Need to Look Beyond Hadoop?"
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
HTML slides and longer abstract can be found at https://github.com/ljdursi/EuroMPI2016.
For years, the academic science and engineering community was almost alone in pursuing very large-scale numerical computing, and MPI was the lingua franca for such work. But starting in the mid-2000s, we were no longer alone. First internet-scale companies like Google and Yahoo! started performing fairly basic analytics tasks at enormous scale, and since then others have begun tackling increasingly complex and data-heavy machine-learning computations, which involve very familiar scientific computing primitives such as linear algebra, unstructured mesh decomposition, and numerical optimization. These new communities have created programming environments which emphasize what we’ve learned about computer science and programmability since 1994 – with greater levels of abstraction and encapsulation, separating high-level computation from the low-level implementation details.
At about the same time, new academic research communities began using computing at scale to attack their problems - but in many cases, an ideal distributed-memory application for them begins to look more like the new concurrent distributed databases than a large CFD simulation, with data structures like dynamic hash tables and Bloom trees playing more important roles than rectangular arrays or unstructured meshes. These new academic communities are among the first to adopt emerging big-data technologies over traditional HPC options; but as big-data technologies improve their tightly-coupled number-crunching capabilities, they are unlikely to be the last.
In this talk, I sketch out the landscape of distributed technical computing frameworks and environments, and look to see where MPI and the MPI community fits in to this new ecosystem.
This deck was presented at the Spark meetup at Bangalore. The key idea behind the presentation was to focus on limitations of Hadoop MapReduce and introduce both Hadoop YARN and Spark in this context. An overview of the other aspects of the Berkeley Data Analytics Stack was also provided.
This contains the agenda of the Spark Meetup I organised in Bangalore on Friday, the 23rd of Jan 2014. It carries the slides for the talk I gave on distributed deep learning over Spark
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
The affect of service quality and online reviews on customer loyalty in the E...
Speeding up Distributed Big Data Recommendation in Spark
1. AUSTRALIA CHINA INDIA ITALY MALAYSIA SOUTH AFRICA monash.edu
Algorithmic Acceleration of Parallel ALS
for Collaborative Filtering:
“Speeding up Distributed Big Data
Recommendation in Spark”
Hans De Sterck1,2, Manda Winlaw2, Mike Hynes2,
Anthony Caterini2
1 Monash University, School of Mathematical Sciences
2 University of Waterloo, Canada, Applied Mathematics
ICPADS 2015, Melbourne, December 2015
2. hans.desterck@monash.edu
ICPADS
2015
a talk on algorithms for parallel big data
analytics ...
1. distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
2. recommendation – the Netflix prize problem
3. our contribution: an algorithm to speed up ALS
for recommendation
4. our contribution: efficient parallel speedup of ALS
recommendation in Spark
3. hans.desterck@monash.edu
ICPADS
2015
1. distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ my research background:
– scalable
scien>fic
compu>ng
algorithms
(HPC)
– e.g.,
parallel
algebraic
mul>grid
(AMG)
for
solving
linear
systems
Ax=b
– e.g.,
on
Blue
Gene
(100,000s
of
cores),
MPI
4. hans.desterck@monash.edu
ICPADS
2015
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ more recently: there is a new game of large-scale
distributed computing in town!
– Google
PageRank
(1998)
(already
17
years...)
• commodity
hardware
(fault-‐tolerant
...)
• compute
where
the
data
is
(data-‐locality)
• scalability
is
essen>al!
(just
like
in
HPC)
• beginning
of
“Big
Data”,
“Cloud”,
“Data
Analy>cs”,
...
– new
Big
Data
analy>cs
applica>ons
are
now
appearing
everywhere!
web
crawl
5. hans.desterck@monash.edu
ICPADS
2015
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ “Data Analytics” has grown its own “eco-system”,
“culture”, “software stack” (very different from HPC!)
• MapReduce
• Hadoop
• Spark,
...
• data
locality
• “implicit”
communica>on
(restricted
(vs
MPI),
“shuffle”)
• not
fast
(vs
HPC),
but
scalable
• fault-‐tolerant
(replicate
data,
restart
tasks)
(from
“Spark:
In-‐Memory
Cluster
Compu>ng
for
Itera>ve
and
Interac>ve
Applica>ons”)
6. hans.desterck@monash.edu
ICPADS
2015
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ MapReduce/Hadoop:
– major
disadvantage
for
itera>ve
algorithms:
writes
everything
to
disk
between
itera>ons!,
extremely
slow
(and:
not
programmer-‐friendly)
è only
very
simple
algorithms
are
feasible
in
MapReduce
■ the Spark “revolution”:
– store
state
between
itera>ons
in
memory
– more
general
opera>ons
than
Hadoop/MapReduce
7. hans.desterck@monash.edu
ICPADS
2015
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
8. hans.desterck@monash.edu
ICPADS
2015
distributed computing frameworks for Big Data
analytics – Spark (vs HPC, MPI, Hadoop, ...)
■ the Spark “revolution”:
– store
state
between
itera>ons
in
memory
– more
general
opera>ons
than
Hadoop/MapReduce
èmuch
faster
than
Hadoop!
(but
s>ll
much
slower
than
MPI)
• data
locality
• scalable
• fault-‐tolerant
• “implicit”
communica>on
(restricted
(vs
MPI),
“shuffle”)
sea change (vs Hadoop): more advanced iterative algorithms for
Data Analytics/Machine Learning are feasible in Spark
9. hans.desterck@monash.edu
ICPADS
2015
k
2. recommendation – the Netflix prize problem
■ sparse ratings matrix R
■ k latent features: user factors U, movie factors M
■ similar to SVD, but only match known ratings
■ minimize f=||R – UTM||2’ , and UTM gives predicted
ratings (collaborative filtering)
R
n
users
m
movies
1
2
5
i
j
≈
UT
i
j
x
x
x
x
x
x
M
k
10. hans.desterck@monash.edu
ICPADS
2015
k
recommendation – the Netflix prize problem
minimize f=||R – UTM||2’ : alternating least squares (ALS)
■ minimize ||R – U(0)T M(0)||2’ : freeze U(0), compute M(0) (LS)
■ minimize ||R – U(1)TM(0)||2’ : freeze M(0), compute U(1) (LS)
■ ... : local least squares problems (parallelizable)
R
n
users
m
movies
1
2
5
i
j
≈
UT
i
j
x
x
x
x
x
x
M
k
11. hans.desterck@monash.edu
ICPADS
2015
recommendation – the Netflix prize problem
minimize f=||R – UTM||2’ : alternating least squares (ALS)
■ ALS can converge very slowly (block nonlinear Gauss-Seidel)
(g
=
grad
f
=
0)
12. hans.desterck@monash.edu
ICPADS
2015
3. our contribution: an algorithm to speed up ALS
for recommendation
min f(U,M)=||R – UTM||2’ , or g(U,M) = grad f(U,M) = 0
■ nonlinear conjugate gradient (NCG) optimization
algorithm for min f(x):
13. hans.desterck@monash.edu
ICPADS
2015
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
define a preconditioned gradient direction:
(De
Sterck
and
Winlaw,
2015)
14. hans.desterck@monash.edu
ICPADS
2015
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
(NCG
accelerates
ALS)
15. hans.desterck@monash.edu
ICPADS
2015
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
16. hans.desterck@monash.edu
ICPADS
2015
our contribution: an algorithm to speed up ALS
for recommendation
min f(x)=||R – UTM||2’ , or g(x) = grad f(x) = 0
■ our idea: use ALS as a nonlinear preconditioner for NCG
ALS-‐NCG
is
much
faster
than
the
widely
used
ALS!
17. hans.desterck@monash.edu
ICPADS
2015
4. our contribution: efficient parallel speedup of
ALS recommendation in Spark
■ Spark “Resilient Distributed Datasets” (RDDs)
– par>>oned
collec>on
of
(key,
value)
pairs
– can
be
cached
in
memory
– built
using
data
flow
operators
on
other
RDDs
(map,
join,
group-‐by-‐key,
reduce-‐by-‐key,
...)
– fault-‐tolerance:
rebuild
from
lineage
– “implicit”
communica>on
(shuffling)
(≠
MPI)
key
(value1,
value2,
...)
0
1
2
3
18. hans.desterck@monash.edu
ICPADS
2015
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ efficient Spark programming: similar challenges as efficient
GPU programming with CUDA!
– of
course,
they
have
different
design
objec>ves
(GPU:
close
to
metal,
as
fast
as
possible;
Spark:
scalable,
fault-‐tolerant,
data
locality...)
– but
...
similari>es
in
how
one
gets
good
performance:
• Spark,
CUDA:
it
is
easy
to
write
code
that
produces
the
correct
result
(but
may
be
very
far
from
achievable
speed)
•
Spark,
CUDA:
it
is
very
hard
to
write
efficient
code!
– implementa>on
choices
that
are
crucial
for
performance
are
most
ofen
not
explicit
in
the
language
– programmer
needs
very
extensive
“under
the
hood”
knowledge
to
write
efficient
code
– this
is
a
research
topic
(also
for
Spark),
moving
target
19. hans.desterck@monash.edu
ICPADS
2015
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ existing implementation of ALS in Spark (Chris Johnson,
Spotify) minimize f=||R – UTM||2’
– store
both
R
and
RT
– local
LS
problems:
to
update
user
factor
i,
need
all
movie
factors
j
that
i
has
rated
(shuffle!)
(efficient)
R
0
1
2
3
0
1
2
3
RT
0
1
2
3
M
U
1
0
2
3
i
j1 j2
20. hans.desterck@monash.edu
ICPADS
2015
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ our work: efficient parallel implementation of ALS-NCG in Spark
minimize f(x)=||R – UTM||2’
– store
our
vectors
x
and
g
consistent
with
ALS
RDDs,
and
employ
similar
efficient
shuffling
scheme
for
gradient
– BLAS
vector
opera>ons
– line
search:
f(x)
is
a
polynomial
of
degree
4:
compute
coefficients
once
in
parallel
U
21. hans.desterck@monash.edu
ICPADS
2015
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: linear granularity scaling for ALS-NCG as for ALS
(no
new
parallel
boFlenecks
for
the
more
advanced
algorithm)
22. hans.desterck@monash.edu
ICPADS
2015
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: ALS-NCG much faster than ALS (20M MovieLens
data, 8 nodes/128 cores)
23. hans.desterck@monash.edu
ICPADS
2015
our contribution: efficient parallel speedup of ALS
recommendation in Spark
■ performance: ALS-NCG speeds up ALS on 16 nodes/256 cores
in Spark for 800M ratings by a factor of about 5
(great
speedup,
in
parallel,
in
Spark,
for
large
problem
on
256
cores)
24. hans.desterck@monash.edu
ICPADS
2015
some general conclusions ...
■ Spark enables advanced algorithms for Big Data analytics
(linear algebra, optimization, machine learning, ...) (lots of
work: investigate algorithms, implementations, scalability, ...
in Spark)
■ Spark offers a suitable environment for compute-intensive
work!
■ slower than MPI/HPC, but data locality, fault-tolerance,
situated within Big Data “eco-system” (HDFS data, familiar
software stack, ...)
■ will HPC and Big Data hardware/software converge? (also
for “exascale” ...), and if so, which aspects of the Spark
(and others ...) or MPI/HPC approaches will prevail?