Title: Factorization Machines
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
Matrix Factorization In Recommender SystemsYONG ZHENG
The document discusses matrix factorization techniques for recommender systems. It begins with an overview of recommender systems and their use of matrix factorization for dimensionality reduction. Principal component analysis and singular value decomposition are described as early linear algebra techniques used for this purpose. The document then focuses on how these techniques evolved into basic and extended matrix factorization methods in recommender systems, using the Netflix Prize competition as an example.
Tutorial: Context In Recommender SystemsYONG ZHENG
This document provides an overview of a tutorial on context-aware recommender systems. The tutorial will cover traditional recommendation techniques, context-aware recommendation which incorporates additional contextual information such as time and location, and context suggestion. It includes an agenda with topics, background information on recommender systems and evaluation metrics, and descriptions of techniques for context-aware recommendation including context filtering and modeling.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
1. Deep learning techniques such as convolutional neural networks, recurrent neural networks, and autoencoders can be applied to recommender systems.
2. Convolutional neural networks are commonly used to extract features from images, audio, and video that can then be used for recommendation. Recurrent neural networks can model user sessions as sequences of clicks.
3. Autoencoders learn lower-dimensional representations of items that capture similarities and can be used to make recommendations, especially for cold start problems where little is known about new users or items.
Matrix Factorization Techniques For Recommender SystemsLei Guo
The document discusses matrix factorization techniques for recommender systems. It begins by describing common recommender system strategies like content-based and collaborative filtering approaches. It then introduces matrix factorization methods, which characterize both users and items by vectors of latent factors inferred from rating patterns. The basic matrix factorization model approximates user ratings as the inner product of user and item vectors in the joint latent factor space. Learning algorithms like stochastic gradient descent and alternating least squares are used to compute the user and item vectors by minimizing a regularized error function on known ratings.
The document discusses attention models and their applications. Attention models allow a model to focus on specific parts of the input that are important for predicting the output. This is unlike traditional models that use the entire input equally. Three key applications are discussed: (1) Image captioning models that attend to relevant regions of an image when generating each word of the caption, (2) Speech recognition models that attend to different audio fragments when predicting text, and (3) Visual attention models for tasks like saliency detection and fixation prediction that learn to focus on important regions of an image. The document also covers techniques like soft attention, hard attention, and spatial transformer networks.
The document provides an introduction to word embeddings and two related techniques: Word2Vec and Word Movers Distance. Word2Vec is an algorithm that produces word embeddings by training a neural network on a large corpus of text, with the goal of producing dense vector representations of words that encode semantic relationships. Word Movers Distance is a method for calculating the semantic distance between documents based on the embedded word vectors, allowing comparison of documents with different words but similar meanings. The document explains these techniques and provides examples of their applications and properties.
Matrix Factorization In Recommender SystemsYONG ZHENG
The document discusses matrix factorization techniques for recommender systems. It begins with an overview of recommender systems and their use of matrix factorization for dimensionality reduction. Principal component analysis and singular value decomposition are described as early linear algebra techniques used for this purpose. The document then focuses on how these techniques evolved into basic and extended matrix factorization methods in recommender systems, using the Netflix Prize competition as an example.
Tutorial: Context In Recommender SystemsYONG ZHENG
This document provides an overview of a tutorial on context-aware recommender systems. The tutorial will cover traditional recommendation techniques, context-aware recommendation which incorporates additional contextual information such as time and location, and context suggestion. It includes an agenda with topics, background information on recommender systems and evaluation metrics, and descriptions of techniques for context-aware recommendation including context filtering and modeling.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
1. Deep learning techniques such as convolutional neural networks, recurrent neural networks, and autoencoders can be applied to recommender systems.
2. Convolutional neural networks are commonly used to extract features from images, audio, and video that can then be used for recommendation. Recurrent neural networks can model user sessions as sequences of clicks.
3. Autoencoders learn lower-dimensional representations of items that capture similarities and can be used to make recommendations, especially for cold start problems where little is known about new users or items.
Matrix Factorization Techniques For Recommender SystemsLei Guo
The document discusses matrix factorization techniques for recommender systems. It begins by describing common recommender system strategies like content-based and collaborative filtering approaches. It then introduces matrix factorization methods, which characterize both users and items by vectors of latent factors inferred from rating patterns. The basic matrix factorization model approximates user ratings as the inner product of user and item vectors in the joint latent factor space. Learning algorithms like stochastic gradient descent and alternating least squares are used to compute the user and item vectors by minimizing a regularized error function on known ratings.
The document discusses attention models and their applications. Attention models allow a model to focus on specific parts of the input that are important for predicting the output. This is unlike traditional models that use the entire input equally. Three key applications are discussed: (1) Image captioning models that attend to relevant regions of an image when generating each word of the caption, (2) Speech recognition models that attend to different audio fragments when predicting text, and (3) Visual attention models for tasks like saliency detection and fixation prediction that learn to focus on important regions of an image. The document also covers techniques like soft attention, hard attention, and spatial transformer networks.
The document provides an introduction to word embeddings and two related techniques: Word2Vec and Word Movers Distance. Word2Vec is an algorithm that produces word embeddings by training a neural network on a large corpus of text, with the goal of producing dense vector representations of words that encode semantic relationships. Word Movers Distance is a method for calculating the semantic distance between documents based on the embedded word vectors, allowing comparison of documents with different words but similar meanings. The document explains these techniques and provides examples of their applications and properties.
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
The document discusses deep learning for natural language processing. It provides 5 reasons why deep learning is well-suited for NLP tasks: 1) it can automatically learn representations from data rather than relying on human-designed features, 2) it uses distributed representations that address issues with symbolic representations, 3) it can perform unsupervised feature and weight learning on unlabeled data, 4) it learns multiple levels of representation that are useful for multiple tasks, and 5) recent advances in methods like unsupervised pre-training have made deep learning models more effective for NLP. The document outlines some successful applications of deep learning to tasks like language modeling and speech recognition.
GloVe is an unsupervised learning algorithm for obtaining vector representations of words. It combines the advantages of global matrix factorization and local context window models by training only on the nonzero elements of a word-word co-occurrence matrix. The GloVe model represents word meanings as vectors such that the ratio of the probabilities of any two words appearing together is approximated by the ratio of the dot product of their vector representations. Experiments show GloVe outperforms other models on word analogy, similarity and named entity recognition tasks.
Recommender systems are software agents that analyze a user's preferences through transactions and provide personalized recommendations accordingly. There are several recommendation paradigms including non-personalized rules, personalized rules based on user data, and transaction-based collaborative filtering that learns from user interactions. Context-based recommender systems also consider additional information like time, location, or device to provide adaptive recommendations. Common techniques used in recommender systems include content-based filtering that recommends similar items, collaborative filtering that finds users with similar tastes, and demographic-based recommendations.
Bpr bayesian personalized ranking from implicit feedbackPark JunPyo
This document discusses recommendation systems and the Bayesian Personalized Ranking (BPR) framework. It introduces the goal of recommendation systems to increase product sales through relevance, novelty, serendipity and diversity. It also discusses different recommendation approaches, including collaborative filtering, content-based filtering and knowledge-based filtering. A key part of the document is describing the BPR framework, which uses a Bayesian approach to learn a personalized ranking model from implicit feedback data. It formalizes the recommendation problem as optimizing a posterior distribution over the preferences of users through matrix factorization.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Deep generative models can generate synthetic images, speech, text and other data types. There are three popular types: autoregressive models which generate data step-by-step; variational autoencoders which learn the distribution of latent variables to generate data; and generative adversarial networks which train a generator and discriminator in an adversarial game to generate high quality samples. Generative models have applications in image generation, translation between domains, and simulation.
This document provides an overview of Word2Vec, a neural network model for learning word embeddings developed by researchers led by Tomas Mikolov at Google in 2013. It describes the goal of reconstructing word contexts, different word embedding techniques like one-hot vectors, and the two main Word2Vec models - Continuous Bag of Words (CBOW) and Skip-Gram. These models map words to vectors in a neural network and are trained to predict words from contexts or predict contexts from words. The document also discusses Word2Vec parameters, implementations, and other applications that build upon its approach to word embeddings.
This document provides an overview of representation learning techniques for natural language processing (NLP). It begins with introducing the speakers and objectives of the workshop, which is to provide a deep dive into state-of-the-art text representation techniques and how to apply them to solve NLP problems. The workshop covers four modules: 1) archaic techniques, 2) word vectors, 3) sentence/paragraph/document vectors, and 4) character vectors. It emphasizes that representation learning is key to NLP as it transforms raw text into a numeric form that machine learning models can understand.
The document discusses the BERT model for natural language processing. It begins with an introduction to BERT and how it achieved state-of-the-art results on 11 NLP tasks in 2018. The document then covers related work on language representation models including ELMo and GPT. It describes the key aspects of the BERT model, including its bidirectional Transformer architecture, pre-training using masked language modeling and next sentence prediction, and fine-tuning for downstream tasks. Experimental results are presented showing BERT outperforming previous models on the GLUE benchmark, SQuAD 1.1, SQuAD 2.0, and SWAG. Ablation studies examine the importance of the pre-training tasks and the effect of model size.
This document proposes a calibrated recommendations approach that aims to provide recommendations that reflect all of a user's interests in correct proportions. Standard recommender systems trained for accuracy can lead to unbalanced recommendations that amplify a user's main interests and crowd out lesser interests. The calibrated recommendations approach uses a post-processing re-ranking step to optimize a submodular calibration metric, balancing accuracy and fairness by recommending items from all a user's interests in their correct proportions. Experiments on MovieLens data show that calibration can be improved significantly without degrading accuracy much.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Understanding how high powered ML models arrive at their predictions is an important aspect of Machine Learning, and SHAP is a powerful tool that enables practitioners to understand how different features combine to help a model arrive at a prediction.
This slidedeck is from a presentation given at pydata global on the theoretical foundations of SHAP as well as how to use its library. Link to the presentation can be found here: https://pydata.org/global2021/schedule/presentation/3/behind-the-black-box-how-to-understand-any-ml-model-using-shap/
This document discusses machine learning algorithms for ranking problems. It introduces supervised learning to rank methods including pointwise, pairwise and listwise approaches. Pointwise methods predict relevance scores independently but don't consider order. Pairwise approaches consider relative order but have high computational costs. Listwise methods aim to optimize entire orderings but have complexity issues. Practical challenges include defining objective metrics, generating training labels, and handling new items with limited data. Semi-supervised learning and matrix factorization can help address labeling problems.
Matrix factorization techniques can be used to address some of the limitations of traditional collaborative filtering approaches for recommender systems. Matrix factorization decomposes the user-item rating matrix into the product of two lower-dimensional matrices, one representing latent factors for users and the other for items. This reduced dimensionality addresses data sparsity and scalability issues. Specifically, singular value decomposition is often used to perform this matrix factorization, which can approximate the original rating matrix while ignoring less important singular values and factor vectors. The decomposed matrices can then be multiplied to predict unknown user ratings.
This document introduces Factorization Machines, a general model that can mimic many successful factorization models. Factorization Machines allow feature vectors to be easily input and enjoy benefits of factorizing interactions between variables. The model has properties like expressiveness, multi-linearity, and scalable complexity. It relates to models like matrix factorization, tensor factorization, SVD++, and nearest neighbor models. Experiments show Factorization Machines outperform other models on rating prediction, context-aware recommendation, and tag recommendation tasks.
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
The document summarizes a presentation on recommender systems given by Xavier Amatriain. It begins with introductions to recommender systems and collaborative filtering. Traditional collaborative filtering approaches include user-based and item-based methods. User-based CF finds similar users to a target user and recommends items they liked. Item-based CF finds similar items to those a target user liked and predicts ratings. Both approaches address sparsity and scalability challenges with dimensionality reduction techniques.
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
The document discusses deep learning for natural language processing. It provides 5 reasons why deep learning is well-suited for NLP tasks: 1) it can automatically learn representations from data rather than relying on human-designed features, 2) it uses distributed representations that address issues with symbolic representations, 3) it can perform unsupervised feature and weight learning on unlabeled data, 4) it learns multiple levels of representation that are useful for multiple tasks, and 5) recent advances in methods like unsupervised pre-training have made deep learning models more effective for NLP. The document outlines some successful applications of deep learning to tasks like language modeling and speech recognition.
GloVe is an unsupervised learning algorithm for obtaining vector representations of words. It combines the advantages of global matrix factorization and local context window models by training only on the nonzero elements of a word-word co-occurrence matrix. The GloVe model represents word meanings as vectors such that the ratio of the probabilities of any two words appearing together is approximated by the ratio of the dot product of their vector representations. Experiments show GloVe outperforms other models on word analogy, similarity and named entity recognition tasks.
Recommender systems are software agents that analyze a user's preferences through transactions and provide personalized recommendations accordingly. There are several recommendation paradigms including non-personalized rules, personalized rules based on user data, and transaction-based collaborative filtering that learns from user interactions. Context-based recommender systems also consider additional information like time, location, or device to provide adaptive recommendations. Common techniques used in recommender systems include content-based filtering that recommends similar items, collaborative filtering that finds users with similar tastes, and demographic-based recommendations.
Bpr bayesian personalized ranking from implicit feedbackPark JunPyo
This document discusses recommendation systems and the Bayesian Personalized Ranking (BPR) framework. It introduces the goal of recommendation systems to increase product sales through relevance, novelty, serendipity and diversity. It also discusses different recommendation approaches, including collaborative filtering, content-based filtering and knowledge-based filtering. A key part of the document is describing the BPR framework, which uses a Bayesian approach to learn a personalized ranking model from implicit feedback data. It formalizes the recommendation problem as optimizing a posterior distribution over the preferences of users through matrix factorization.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
Deep generative models can generate synthetic images, speech, text and other data types. There are three popular types: autoregressive models which generate data step-by-step; variational autoencoders which learn the distribution of latent variables to generate data; and generative adversarial networks which train a generator and discriminator in an adversarial game to generate high quality samples. Generative models have applications in image generation, translation between domains, and simulation.
This document provides an overview of Word2Vec, a neural network model for learning word embeddings developed by researchers led by Tomas Mikolov at Google in 2013. It describes the goal of reconstructing word contexts, different word embedding techniques like one-hot vectors, and the two main Word2Vec models - Continuous Bag of Words (CBOW) and Skip-Gram. These models map words to vectors in a neural network and are trained to predict words from contexts or predict contexts from words. The document also discusses Word2Vec parameters, implementations, and other applications that build upon its approach to word embeddings.
This document provides an overview of representation learning techniques for natural language processing (NLP). It begins with introducing the speakers and objectives of the workshop, which is to provide a deep dive into state-of-the-art text representation techniques and how to apply them to solve NLP problems. The workshop covers four modules: 1) archaic techniques, 2) word vectors, 3) sentence/paragraph/document vectors, and 4) character vectors. It emphasizes that representation learning is key to NLP as it transforms raw text into a numeric form that machine learning models can understand.
The document discusses the BERT model for natural language processing. It begins with an introduction to BERT and how it achieved state-of-the-art results on 11 NLP tasks in 2018. The document then covers related work on language representation models including ELMo and GPT. It describes the key aspects of the BERT model, including its bidirectional Transformer architecture, pre-training using masked language modeling and next sentence prediction, and fine-tuning for downstream tasks. Experimental results are presented showing BERT outperforming previous models on the GLUE benchmark, SQuAD 1.1, SQuAD 2.0, and SWAG. Ablation studies examine the importance of the pre-training tasks and the effect of model size.
This document proposes a calibrated recommendations approach that aims to provide recommendations that reflect all of a user's interests in correct proportions. Standard recommender systems trained for accuracy can lead to unbalanced recommendations that amplify a user's main interests and crowd out lesser interests. The calibrated recommendations approach uses a post-processing re-ranking step to optimize a submodular calibration metric, balancing accuracy and fairness by recommending items from all a user's interests in their correct proportions. Experiments on MovieLens data show that calibration can be improved significantly without degrading accuracy much.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Understanding how high powered ML models arrive at their predictions is an important aspect of Machine Learning, and SHAP is a powerful tool that enables practitioners to understand how different features combine to help a model arrive at a prediction.
This slidedeck is from a presentation given at pydata global on the theoretical foundations of SHAP as well as how to use its library. Link to the presentation can be found here: https://pydata.org/global2021/schedule/presentation/3/behind-the-black-box-how-to-understand-any-ml-model-using-shap/
This document discusses machine learning algorithms for ranking problems. It introduces supervised learning to rank methods including pointwise, pairwise and listwise approaches. Pointwise methods predict relevance scores independently but don't consider order. Pairwise approaches consider relative order but have high computational costs. Listwise methods aim to optimize entire orderings but have complexity issues. Practical challenges include defining objective metrics, generating training labels, and handling new items with limited data. Semi-supervised learning and matrix factorization can help address labeling problems.
Matrix factorization techniques can be used to address some of the limitations of traditional collaborative filtering approaches for recommender systems. Matrix factorization decomposes the user-item rating matrix into the product of two lower-dimensional matrices, one representing latent factors for users and the other for items. This reduced dimensionality addresses data sparsity and scalability issues. Specifically, singular value decomposition is often used to perform this matrix factorization, which can approximate the original rating matrix while ignoring less important singular values and factor vectors. The decomposed matrices can then be multiplied to predict unknown user ratings.
This document introduces Factorization Machines, a general model that can mimic many successful factorization models. Factorization Machines allow feature vectors to be easily input and enjoy benefits of factorizing interactions between variables. The model has properties like expressiveness, multi-linearity, and scalable complexity. It relates to models like matrix factorization, tensor factorization, SVD++, and nearest neighbor models. Experiments show Factorization Machines outperform other models on rating prediction, context-aware recommendation, and tag recommendation tasks.
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
The document summarizes a presentation on recommender systems given by Xavier Amatriain. It begins with introductions to recommender systems and collaborative filtering. Traditional collaborative filtering approaches include user-based and item-based methods. User-based CF finds similar users to a target user and recommends items they liked. Item-based CF finds similar items to those a target user liked and predicts ratings. Both approaches address sparsity and scalability challenges with dimensionality reduction techniques.
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
This document provides an example of building a predictive model for product recommendations. It outlines using a k-Nearest Neighbor (kNN) algorithm and singular value decomposition (SVD) for dimensionality reduction on order history and cart event data to create a recommendation engine. It discusses selecting features from the data, normalizing the data, using kNN to find similar items, and reducing the data dimensions with SVD before applying kNN. It also introduces using a synthetic dataset to test and tune the model and compares different experimental setups like random, kNN, and SVD+kNN recommendations. The goal is to increase business metrics like revenue, conversion rate, and average order value through effective product recommendations.
Recommender system algorithm and architectureLiang Xiang
1) The document discusses recommender system algorithms and architecture. It covers common recommendation techniques like collaborative filtering, content-based filtering, and graph-based recommendations.
2) It also discusses challenges like cold starts for new users and items. For new users, it recommends using demographic data or initial feedback to understand interests. For new items, it suggests using content information or initial user feedback.
3) The document proposes a feature-based recommendation framework that connects users, items, and latent features to address challenges like heterogeneous data and cold starts. This framework provides explanations but does not support user-based methods.
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
The document discusses techniques for generating recommendations based on item co-occurrence analysis. It describes how to build a user-item history matrix from log files and transform it into an item-item co-occurrence matrix. It discusses using anomalous co-occurrences as indicators to make recommendations and scaling the analysis using interaction cuts and frequency limits. It also describes how to update the co-occurrence matrix incrementally in real-time to enable online recommendations.
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
Abstract: How graphs became just another big data primitive
Graph-shaped data is used in product recommendation systems, social network analysis, network threat detection, image de-noising, and many other important applications. And, a growing number of these applications will benefit from parallel distributed processing for graph featuring engineering, model training, and model serving. But today’s graph tools are riddled with limitations and shortcomings, such as a lack of language bindings, streaming support, and seamless integration with other popular data services. In this talk, we’ll argue that the key to doing more with graphs is doing less with specialized systems and more with systems already good at handling data of other shapes. We’ll examine some practical data science workflows to further motivate this argument and we’ll talk about some of the things that Intel is doing with the open source community and industry to make graphs just another big data primitive.
Scott Clark, Software Engineer, Yelp at MLconf SFMLconf
Abstract: Introducing the Metric Optimization Engine (MOE); an open source, black box, Bayesian Global Optimization engine for optimal experimental design.
In this talk we will introduce MOE, the Metric Optimization Engine. MOE is an efficient way to optimize a system’s parameters, when evaluating parameters is time-consuming or expensive. It can be used to help tackle a myriad of problems including optimizing a system’s click-through or conversion rate via A/B testing, tuning parameters of a machine learning prediction method or expensive batch job, designing an engineering system or finding the optimal parameters of a real-world experiment.
MOE is ideal for problems in which the optimization problem’s objective function is a black box, not necessarily convex or concave, derivatives are unavailable, and we seek a global optimum, rather than just a local one. This ability to handle black-box objective functions allows us to use MOE to optimize nearly any system, without requiring any internal knowledge or access. To use MOE, we simply need to specify some objective function, some set of parameters, and any historical data we may have from previous evaluations of the objective function. MOE then finds the set of parameters that maximize (or minimize) the objective function, while evaluating the objective function as few times as possible. This is done internally using Bayesian Global Optimization on a Gaussian Process model of the underlying system and finding the points of highest Expected Improvement to sample next. MOE provides easy to use Python, C++, CUDA and REST interfaces to accomplish these goals and is fully open source. We will present the motivation and background, discuss the implementation and give real-world examples.
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFMLconf
Abstract:
One of the challenges in big data analytics lies in being able to reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. We need data science techniques which an represent and reason effectively with this form of rich and multi-relational graph data. In this presentation, I will describe some common collective inference patterns needed for graph data including: collective classification (predicting missing labels for nodes in a network), link prediction (predicting potential edges), and entity resolution (determining when two nodes refer to the same underlying entity). I will describe three key capabilities required: relational feature construction, collective inference, and scaling. Finally, I briefly describe some of the cutting edge analytic tools being developed within the machine learning, AI, and database communities to address these challenges.
Introduction to Factorization Machines model with an example. Motivations - why you should have it in your toolbox, model and it expressiveness, use case for context-aware recommendations and Field-Aware Factorization Machines.
Quoc Le, Software Engineer, Google at MLconf SFMLconf
Title: Deep Learning for Language Understanding
Abstract:
Many current language understanding algorithms rely on expert knowledge to engineer models and features. In this talk, I will discuss how to use Deep Learning to understand texts without much prior knowledge. In particular, our algorithms will learn the vector representations of words. These vector representations can be used to solve word analogy or translate unknown words between languages. Our algorithms also learn vector representations of sentences and documents. These vector representations preserve the semantics of sentences and documents and therefore can be used for machine translation, text classification, information retrieval and sentiment analysis.
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFMLconf
Abstract:
Apache Spark’s MLlib is a terrific library for fitting large-scale machine learning models. However, translating high-level problem statements like “learn a classifier” into a working model presently requires significant manual effort (via ad hoc parameter tuning) and computational resources (to fit several models). We present our work on the MLbase optimizer – a system designed on top of Spark to quickly and automatically search through a hyperparameter space and find a good model. By leveraging performance enhancements, better search algorithms, and statistical heuristics, our system offers an order of magnitude speedup over standard methods.
Evan Estola – Data Scientist, Meetup.com at MLconf ATLMLconf
Beyond Collaborative Filtering: using Machine Learning to power recommendations at Meetup
Collaborative filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filtering. From sampling to deployment, we’ve had to invent our approach with few non-academic and non-toy examples to follow. At Meetup we’re all about sharing information and empowering communities, so I’ll present the details of our model as well as some of the new features we are still developing.
To download slides:
http://www.intelligentmining.com/category/knowledge-base/
These are my notes for a presentation I did internally at IM. It covers both the multinomial and multi-variate Bernoulli event models in Naive Bayes text classification.
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
This document summarizes RecSys 2016, a conference on recommender systems held in Boston from September 15-19, 2016. It describes the keynote speakers, sessions on deep learning and algorithms, workshops on topics like cold-starting recommendations, and lessons learned from building real-life recommender systems. The document highlights sessions on embedding techniques, context-driven recommender systems, and using navigation data to improve recommendations in real-time.
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15MLconf
Lessons learned from Running Hundreds of Kaggle Competitions: At Kaggle, we've run hundreds of machine learning competitions and seen over 80,000 data scientists make submissions. One thing is clear: winning competitions isn't random. We've learned that certain tools and methodologies work consistently well on different types of problems. Many participants make common mistakes (such as overfitting) that should be actively avoided. Similarly, competition hosts have their own set of pitfalls (such as data leakage).
In this talk, I'll share what goes into a winning competition toolkit along with some war stories on what to avoid. Additionally, I’ll share what we’re seeing on the collaborative side of competitions. Our community is showing an increasing amount of collaboration in developing machine learning models and analytic solutions. I'll showcase examples of this and discuss how these types of collaboration will improve how data science is learned and applied.
To download please go to: http://www.intelligentmining.com/category/knowledge-base/
Slides as presented by Alex Lin to the NYC Predictive Analytics Meetup group: http://www.meetup.com/NYC-Predictive-Analytics/ on Dec. 10, 2009.
A product family with a common platform paradigm can increase the flexibility and responsiveness of the product- manufacturing process and help take away market share from competitors that develop one product at a time. The recently developed Comprehensive Product Platform Planning (CP3 ) method allows (i) the formation of sub-families of products, and (ii) the simultaneous identification and quantification of platform/scaling design variables. The CP3 model is founded on a generalized commonality matrix representation of the product-platform-plan. In this paper, a new commonality index is developed and introduced in CP3 to simultaneously account for the degree of inter-product commonalities and for the overlap between groups of products sharing different platform variables. To maximize both the performance of the product family and the new commonality measure, we develop and apply an advanced mixed-discrete Particle Swarm Optimization (MDPSO) algorithm. In the MDPSO algo- rithm, the discrete variables are updated using a deterministic nearest-feasible-vertex criterion after each iteration of the conventional PSO. Such an approach is expected to avoid the undesirable discrepancy in the rate of evolution of discrete and continuous variables. To prevent a premature stagnation of solutions (likely in conventional PSO), while solving the high dimensional MINLP problem presented by CP3, we introduce a new adaptive diversity-preservation technique. This technique first characterizes the population diversity and then applies a stochastic update of the discrete variables based on the estimated diversity measure. The potential of the new CP3 optimization methodology is illustrated through its application to design a family of universal electric motors. The optimized platform plans provide helpful insights into the importance of accounting for the overlap between different product platforms, when quantifying the effective commonality in the product family.
Paper presented at the 6th International Work-Conference on Ambient Assisted Living.
Abstract: Due to the increasing demand of multi-camera setup and long-term monitoring in vision applications, real-time multi-view action recognition has gain a great interest in recent years. In this paper, we propose a multiple kernel learning based fusion framework that employs a motion-based person detector for finding regions of interest and local descriptors with bag-of-words quantisation for feature representation. The experimental results on a multi-view action dataset suggest that the proposed framework significantly outperforms simple fusion techniques and state-of-the-art methods.
Development of a family of products that satisfies different sectors of the market introduces significant challenges to today’s manufacturing industries – from development time to aftermarket services. A product family with a common platform paradigm offers a powerful solution to these daunting challenges. The Comprehensive Product Platform Planning (CP3) framework formulates a flexible product family model that (i) seeks to eliminate traditional boundaries between modular and scalable families, (ii) allows the formation of sub-families of products, and (iii) yield the optimal depth and number of platforms. In this paper, the CP3 framework introduces a solution strategy that obviates common assumptions; namely (i) the identification of platform/non-platform design variables and the determination of variable values are separate processes, and (ii) the cost reduction of creating product platforms is independent of the total number of each product manufactured. A new Cost Decay Function (CDF) is developed to approximate the reduction in cost with increasing commonalities among products, for a specified capacity of production. The Mixed Integer Non-Liner Programming (MINLP) problem, presented by the CP3 model, is solved using a novel Platform Segregating Mapping Function (PSMF). The proposed CP3 framework is implemented on a family of universal electric motors.
The document discusses applications of machine learning for robot navigation and control. It describes how surrogate models can be used for predictive modeling in engineering applications like aircraft design. Dimension reduction techniques are used to reduce high-dimensional design parameters to a lower-dimensional space for faster surrogate model evaluation. For robot navigation, regression models on image manifolds are used for visual localization by mapping images to robot positions. Manifold learning is also applied to find low-dimensional representations of valid human hand poses from images to enable easier robot control.
This document summarizes Kanika Anand's master's thesis which examines global optimization of noisy computer simulators using surrogate models. The thesis compares two improvement functions - one proposed by Picheny et al. and one by Ranjan - for choosing new points to minimize a simulator output observed with noise. Gaussian process and Bayesian additive regression tree models are used as surrogates. Four test functions acting as simulators are optimized using either a one-shot design or genetic algorithm to find new points. Results show how well the surrogate minimum matches the true minimum and distance between the two minimizers under different settings.
Importance sampling has been widely used to improve the efficiency of deterministic computer simulations where the simulation output is uniquely determined, given a fixed input. To represent complex system behavior more realistically, however, stochastic computer models are gaining popularity. Unlike deterministic computer simulations, stochastic simulations produce different outputs even at the same input. This extra degree of stochasticity presents a challenge for reliability assessment in engineering system designs. Our study tackles this challenge by providing a computationally efficient method to estimate a system's reliability. Specifically, we derive the optimal importance sampling density and allocation procedure that minimize the variance of a reliability estimator. The application of our method to a computationally intensive, aeroelastic wind turbine simulator demonstrates the benefits of the proposed approaches.
This document discusses methods for estimating the output gap and decomposing it into observable components. It provides a unified framework by formulating most output gap estimation methods as linear filters. This allows the output gap estimate to be expressed as a weighted average of observed macroeconomic data over time. The document demonstrates how to decompose an output gap estimate into the contributions made by different data series, like output, inflation, and unemployment. It also shows how to analyze how output gap estimates are revised as new data is incorporated using this linear filter framework. The framework provides insight into which data each method uses and how it weights them to estimate an unobserved output gap.
This document discusses methods for estimating the output gap and decomposing it into observable components. It provides a unified framework by representing most estimation methods as linear filters. This allows the output gap estimate to be expressed as a weighted average of observed macroeconomic data over time. The document demonstrates how to decompose an output gap estimate into the contributions made by different data series, like output, inflation, and unemployment. It also shows how to analyze how estimates are revised as new data is incorporated. Understanding estimates as linear filters provides insight into which data drives the estimate and how sensitive it is to data revisions. The document applies these concepts to specific estimation techniques, including univariate filters, multivariate filters, VAR models, and DSGE models.
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
The document provides a curriculum vitae for Yi-Cheng Chen that includes basic information, education history, and research interests. It notes that Chen received a B.S. from Yuan Ze University in 2000, an M.S. from National Taiwan University of Science and Technology in 2002, and a Ph.D. from National Chiao Tung University in 2012 under the advisement of Professors Suh-Yin Lee and Wen-Chih Peng. Chen's Ph.D. dissertation focused on time interval-based sequential pattern mining. The CV outlines Chen's current research interests as temporal pattern mining, social network analysis, smart home applications, and cloud computing.
This document discusses the need for and challenges of 3D optical proximity correction (OPC) models. It explains that traditional 2D OPC models have limitations, particularly for post-tone development resists where resist profiles vary in height. A 3D model can better account for effects like vertical acid diffusion during processing. However, calibration of such models is difficult due to limitations in metrology data, which typically only provides critical dimension measurements and not full resist profile information.
This document discusses image analysis using wavelet transformation. It provides an overview of digital image processing and compares Fourier transforms, short-term Fourier transforms, and wavelet transforms. Wavelet transforms provide better time-frequency localization than Fourier transforms. The document demonstrates Haar wavelets and how they can be used to decompose an image into different frequency subbands. It discusses applications of wavelet transforms such as image compression, denoising, and feature extraction. The document includes MATLAB code for performing wavelet decomposition on an image.
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
Talk at EURECOM, France.
It overviews regression in several of its forms: regularized, constrained, and mixed. It builds the bridge between machine learning and dynamical models.
The document discusses modeling nonlinear digital integrated circuits (ICs) using system identification techniques. It explores using parametric models with different representations including local linear state-space models. Models are estimated from input-output port measurements and validated. Local linear state-space models provided the best results with good accuracy, a unique solution, and verified local stability, while also allowing efficient simulation. The models were successfully applied to simulate a mobile data link system.
This work was presented at 51st AIAA/SDM conference, Apr 14, 2010 in Orlando. The work presented in this paper was performed in collaboration with Prof. Achille Messac and Dr. Ritesh Khire.
This document discusses probabilistic error bounds for order reduction of smooth nonlinear models. It begins with motivation for using reduced order models (ROM) in computationally intensive applications and the need for error metrics. It then provides background on Dixon's theory for probabilistic error bounds, which has mostly been used for linear models. The document outlines snapshot and gradient-based reduction algorithms to reduce the response and parameter interfaces of a model. It defines different types of errors that can occur from reducing these interfaces and discusses propagating the errors across interfaces using Dixon's theory. Numerical tests and results are briefly mentioned along with conclusions.
Prpagation of Error Bounds Across reduction interfacesMohammad
This document summarizes the motivation, background, algorithms, and theory behind developing probabilistic error bounds for order reduction of smooth nonlinear models. It discusses how reduced order models (ROM) play an important role in computationally intensive applications and the need to provide error metrics with ROM predictions. It then describes snapshot and gradient-based reduction algorithms used at the response and parameter interfaces, respectively. It introduces different types of errors that can occur from reducing the response space only, parameter space only, or both spaces simultaneously, and how Dixon's theory can be used to estimate these relative errors.
Special Plenary Lecture at the International Conference on VIBRATION ENGINEERING AND TECHNOLOGY OF MACHINERY (VETOMAC), Lisbon, Portugal, September 10 - 13, 2018
http://www.conf.pt/index.php/v-speakers
Propagation of uncertainties in complex engineering dynamical systems is receiving increasing attention. When uncertainties are taken into account, the equations of motion of discretised dynamical systems can be expressed by coupled ordinary differential equations with stochastic coefficients. The computational cost for the solution of such a system mainly depends on the number of degrees of freedom and number of random variables. Among various numerical methods developed for such systems, the polynomial chaos based Galerkin projection approach shows significant promise because it is more accurate compared to the classical perturbation based methods and computationally more efficient compared to the Monte Carlo simulation based methods. However, the computational cost increases significantly with the number of random variables and the results tend to become less accurate for a longer length of time. In this talk novel approaches will be discussed to address these issues. Reduced-order Galerkin projection schemes in the frequency domain will be discussed to address the problem of a large number of random variables. Practical examples will be given to illustrate the application of the proposed Galerkin projection techniques.
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...Sergey Staroletov
The document discusses applying model checking to verify a hybrid model of an air collision avoidance maneuver that involves floating point calculations. It proposes representing floating point numbers as integers in Promela to enable modeling the maneuver's dynamics and implementing trigonometric and other functions. The goal is to model check the system's safety property that the distance between aircraft remains above a safe threshold during the maneuver.
The Comprehensive Product Platform Planning (CP3) framework presents a flexible mathematical model of the platform planning process, which allows (i) the formation of sub-families of products, and (ii) the simultaneous identification and quantification of plat- form/scaling design variables. The CP3 model is founded on a generalized commonality matrix that represents the product platform plan, and yields a mixed binary-integer non- linear programming problem. In this paper, we develop a methodology to reduce the high dimensional binary integer problem to a more tractable integer problem, where the com- monality matrix is represented by a set of integer variables. Subsequently, we determine the feasible set of values for the integer variables in the case of families with 3 − 7 kinds of products. The cardinality of the feasible set is found to be orders of magnitude smaller than the total number of unique combinations of the commonality variables. In addition, we also present the development of a generalized approach to Mixed-Discrete Non-Linear Optimization (MDNLO) that can be implemented through standard non-gradient based op- timization algorithms. This MDNLO technique is expected to provide a robust and compu- tationally inexpensive optimization framework for the reduced CP3 model. The generalized approach to MDNLO uses continuous optimization as the primary search strategy, how- ever, evaluates the system model only at the feasible locations in the discrete variable space.
Similar to Steffen Rendle, Research Scientist, Google at MLconf SF (20)
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
Understanding Human Impact: Social and Equity Assessments for AI Technologies
Social and Equity Impact Assessments have broad applications but can be a useful tool to explore and mitigate for Machine Learning fairness issues and can be applied to product specific questions as a way to generate insights and learnings about users, as well as impacts on society broadly as a result of the deployment of new and emerging technologies.
In this presentation, my goal is to advocate for and highlight the need to consult community and external stakeholder engagement to develop a new knowledge base and understanding of the human and social consequences of algorithmic decision making and to introduce principles, methods and process for these types of impact assessments.
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
The Brain’s Guide to Dealing with Context in Language Understanding
Like the visual cortex, the regions of the brain involved in understanding language represent information hierarchically. But whereas the visual cortex organizes things into a spatial hierarchy, the language regions encode information into a hierarchy of timescale. This organization is key to our uniquely human ability to integrate semantic information across narratives. More and more, deep learning-based approaches to natural language understanding embrace models that incorporate contextual information at varying timescales. This has not only led to state-of-the art performance on many difficult natural language tasks, but also to breakthroughs in our understanding of brain activity.
In this talk, we will discuss the important connection between language understanding and context at different timescales. We will explore how different deep learning architectures capture timescales in language and how closely their encodings mimic the brain. Along the way, we will uncover some surprising discoveries about what depth does and doesn’t buy you in deep recurrent neural networks. And we’ll describe a new, more flexible way to think about these architectures and ease design space exploration. Finally, we’ll discuss some of the exciting applications made possible by these breakthroughs.
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
Applying Computer Vision to Reduce Contamination in the Recycling Stream
With China’s recent refusal of most foreign recyclables, North American waste haulers are scrambling to figure out how to make on-shore recycling cost-effective in order to continue providing recycling services. Recyclables that were once being shipped to China for manual sorting are now primarily being redirected to landfills or incinerators. Without a solution, a nearly $5 billion annual recycling market could come to a halt.
Purity in the recycling stream is key to this effort as contaminants in the stream can increase the cost of operations, damage equipment and reduce the ability to create pure commodities suitable for creating recycled goods. This market disruption as a result of China’s new regulations, however, provides us the chance to re-examine and improve our current disposal & collection habits with modern monitoring & artificial intelligence technology.
Using images from our in-dumpster cameras, Compology has developed an ML-based process that helps identify, measure and alert for contaminants in recycling containers before they are picked-up, helping keep the recycling stream clean.
Our convolutional neural network flags potential instances of contamination inside a dumpster, enabling garbage haulers to know which containers have the wrong type of material inside. This allows them to provide targeted, timely education, and when appropriate, assess fines, to improve recycling compliance at the businesses and residences they serve, helping keep recycling services financially viable.
In this presentation, we will walk through our ML-based contamination measurement and scoring process by showing how Waste Management, a national waste hauler, has experienced 57% contamination reduction in nearly 2,000 containers over six months, This progress shows significant strides towards financially viable recycling services.
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
Quantum Computing: a Treasure Hunt, not a Gold Rush
Quantum computers promise a significant step up in computational power over conventional computers, but also suffer a number of counterintuitive limitations --- both in their computational model and in leading lab implementations. In this talk, we review how quantum computers compete with conventional computers and how conventional computers try to hold their ground. Then we outline what stands in the way of successful quantum ML applications.
Josh Wills - Data Labeling as Religious ExperienceMLconf
The document discusses obtaining labeled data and introduces weak supervision as an alternative to full manual labeling. It notes that weak supervision uses labeling functions to generate noisy training labels at scale, which can then be combined using a generative model to infer true labels. The document also briefly mentions Snorkel, a system for creating labeling functions, and Snuba, its successor which focuses on scaling to very large datasets.
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
Project GaitNet: Ushering in the ImageNet moment for human Gait kinematics
The emergence of the upright human bipedal gait can be traced back 4 to 2.8 million years ago, to the now extinct hominin Australopithecus afarensis. Fine grained analysis of gait using the modern MEMS sensors found on all smartphones not just reveals a lot about the person’s orthopedic and neuromuscular health status, but also has enough idiosyncratic clues that it can be harnessed as a passive biometric. While there were many siloed attempts made by the machine learning community to model Bipedal Gait sensor data, these were done with small datasets oft collected in restricted academic environs. In this talk, we will introduce the ImageNet moment for human gait analysis by presenting 'Project GaitNet', the largest ever planet-sized motion sensor based human bipedal gait dataset ever curated. We’ll also present the associated state-of-the-art results in classifying humans harnessing novel deep neural architectures and the related success stories we have enjoyed in transfer-learning into disparate domains of human kinematics analysis.
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
Machine Learning Methods in Detecting Alzheimer’s Disease from Speech and Language
Alzheimer's disease affects millions of people worldwide, and it is important to predict the disease as early and as accurate as possible. In this talk, I will discuss development of novel ML models that help classifying healthy people from those who develop Alzheimer's, using short samples of human speech. As an input to the model, features of different modalities are extracted from speech audio samples and transcriptions: (1) syntactic measures, such as e.g. production rules extracted from syntactic parse trees, (2) lexical measures, such as e.g. features of lexical richness and complexity and lexical norms, and (3) acoustic measures, such as e.g. standard Mel-frequency cepstral coefficients. I will present the ML model that detects cognitive impairment by reaching agreement among modalities. The resulting model is able to achieve state of the art performance in both supervised and semi-supervised manner, using manual transcripts of human speech. Additionally, I will discuss potential limitations of any fully-automated speech-based Alzheimer's disease detection model, focusing mostly on the analysis of the impact of a not-so-accurate automatic speech recognition (ASR) on the classification performance. To illustrate this, I will present the experiments with controlled amounts of artificially generated ASR errors and explain how the deletion errors affect Alzheimer's detection performance the most, due to their impact on the features of syntactic and lexical complexity.
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
Optimized Image Classification on the Cheap
In this talk, we anchor on building an image classifier trained on the Stanford Cars dataset to evaluate two approaches to transfer learning -fine tuning and feature extraction- and the impact of hyperparameter optimization on these techniques. Once we define the most performant transfer learning technique for Stanford Cars, we will double the size of the dataset through image augmentation to boost the classifier’s performance. We will use Bayesian optimization to learn the hyperparameters associated with image transformations using the downstream image classifier’s performance as the guide. In conjunction with model performance, we will also focus on the features of these augmented images and the downstream implications for our image classifier.
To both maximize model performance on a budget and explore the impact of optimization on these methods, we apply a particularly efficient implementation of Bayesian optimization to each of these architectures in this comparison. Our goal is to draw on a rigorous set of experimental results that can help us answer the question: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models?
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
The Importance of Modeling Data Collection
Data sets used in machine learning are often collected in a systematically biased way - certain data points are more likely to be collected than others. We call this "observation bias". For example, in health care, we are more likely to see lab tests when the patient is feeling unwell than otherwise. Failing to account for observation bias can, of course, result in poor predictions on new data. By contrast, properly accounting for this bias allows us to make better use of the data we do have.
In this presentation, we discuss practical and theoretical approaches to dealing with observation bias. When the nature of the bias is known, there are simple adjustments we can make to nonparametric function estimation techniques, such as Gaussian Process models. We also discuss the scenario where the data collection model is unknown. In this case, there are steps we can take to estimate it from observed data. Finally, we demonstrate that having a small subset of data points that are known to be collected at random - that is, in an unbiased way - can vastly improve our ability to account for observation bias in the rest of the data set.
My hope is that attendees of this presentation will be aware of the perils of observation bias in their own work, and be equipped with tools to address it.
The Uncanny Valley of ML
Every so often, the conundrum of the Uncanny Valley re-emerges as advanced technologies evolve from clearly experimental products to refined accepted technologies. We have seen its effects in robotics, computer graphics, and page load times. The debate of how to handle the new technology detracts from its benefits. When machine learning is added to human decision systems a similar effect can be measured in increased response time and decreased accuracy. These systems include radiology, judicial assignments, bus schedules, housing prices, power grids and a growing variety of applications. Unfortunately, the Uncanny Valley of ML can be hard to detect in these systems and can lead to degraded system performance when ML is introduced, at great expense. Here, we'll introduce key design principles for introducing ML into human decision systems to navigate around the Uncanny Valley and avoid its pitfalls.
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
Deep Learning Architectures for Semantic Relation Detection Tasks
Recognizing and distinguishing specific semantic relations from other types of semantic relations is an essential part of language understanding systems. Identifying expressions with similar and contrasting meanings is valuable for NLP systems which go beyond recognizing semantic relatedness and require to identify specific semantic relations. In this talk, I will first present novel techniques for creating labelled datasets required for training deep learning models for classifying semantic relations between phrases. I will further present various neural network architectures that integrate morphological features into integrated path-based and distributional relation detection algorithms and demonstrate that this model outperforms state-of-the-art models in distinguishing semantic relations and is capable of efficiently handling multi-word expressions.
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
This document discusses Netflix's global deep learning recommender system model. It describes how Netflix recommends content to over 150 million members across 190 countries using personalized recommendations. The system utilizes collaborative filtering techniques like soft clustering models to group users with similar tastes and generate weighted popularity votes. It also leverages topic models to model users' tastes as distributions over topics and content. The challenges of scaling these models globally to account for factors like country-specific catalogs and trends over time are discussed. The solution presented is to incrementally train the models by first censoring unavailable content and adding contextual variables, then periodically training warm start models with new embeddings and parameters to efficiently update the models at scale.
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI World
The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query.
We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
The document discusses challenges related to building AI systems at scale using large, multi-modal datasets. It presents an approach for efficient classification of datasets with an extremely large number of classes. The key challenges are handling data scale, selecting relevant information, and ensuring safety. An objective function is designed for training tree-based classifiers that favors balanced, pure splits, leading to efficient trees with logarithmic depth and small error. This approach allows online training and can be used for classification or density estimation problems while learning representations.
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
This document discusses using AI to detect illicit online sales of opioids through social media analysis. It provides background on laws targeting online drug sales without prescriptions. While policy guidelines aim to regulate this, internet effects remain inadequately addressed. The document then presents a pipeline using natural language processing and machine learning to analyze over 1 million tweets, isolate topics related to illicit online pharmacies, and identify characteristics of relevant tweets to build models that can automatically detect emerging bad actors selling drugs online. The goal is to analyze social media content quickly to help address this important problem.
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
This document discusses using a Bayesian neural network to classify light curves from the Transiting Exoplanet Survey Satellite (TESS) mission to identify exoplanet candidates. It describes challenges in classifying large numbers of light curves, and how a Bayesian neural network approach provides probabilistic predictions and confidence levels to help identify promising exoplanet candidates while avoiding many false positives seen in other methods. The Bayesian network achieved 91% accuracy and 83% precision in tests on simulated TESS data.
Neel Sundaresan - Teaching a machine to codeMLconf
1. Recommend using the 'AdamOptimizer' class to optimize the loss since it is commonly used for training neural networks.
2. Suggest mapping the input data to floating point tensors using 'tf.cast()' for compatibility with TensorFlow operations.
3. Advise normalizing the input data to speed up training by using 'tf.keras.utils.normalize()'
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
Soumith Chintala of Facebook AI presented on PyTorch, an open source machine learning framework. PyTorch allows users to define neural networks as Python programs and supports automatic differentiation to calculate gradients. Key features include GPU acceleration for tensors, distributed training across hundreds of GPUs, and TorchScript for optimizing Python models and deploying to C++. PyTorch aims to bridge the gap between research prototyping and production use through tools like TorchScript that transition eager Python code to a static graph mode optimized for deployment.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Presentation of the OECD Artificial Intelligence Review of Germany
Steffen Rendle, Research Scientist, Google at MLconf SF
1. Factorization Models & Polynomial Regression Factorization Machines Applications Summary
Factorization Machines
Steen Rendle
Current aliation: Google Inc.
Work was done at University of Konstanz
MLConf, November 14, 2014
Steen Rendle 1 / 53
3. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization
Example for data: Matrix Factorization:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk
k is the rank of the reconstruction.
Steen Rendle 3 / 53
4. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization
Example for data: Matrix Factorization:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk
^y(u; i) = ^yu;i =
Xk
f =1
wu;f hi ;f = hwu; hi i
k is the rank of the reconstruction.
Steen Rendle 3 / 53
5. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
Steen Rendle 4 / 53
6. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
^ySVD++(u; i) :=
*
vu +
X
j2N(u)
vj ; vi
+
^yFact-KNN(u; i ) :=
1
jR(u)j
X
j2R(u)
ru;j hvi ; vj i
Steen Rendle 4 / 53
7. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Matrix Factorization Extensions
Example for data: Examples for models:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
^yMF(u; i ) :=
Xk
f =1
vu;f vi ;f = hvu; vi i
^ySVD++(u; i) :=
*
vu +
X
j2N(u)
vj ; vi
+
^yFact-KNN(u; i ) :=
1
jR(u)j
X
j2R(u)
ru;j hvi ; vj i
Rating
Matrix
time
^ytimeSVD(u; i ; t) := hvu + vu;t ; vi i
^ytimeTF(u; i ; t) :=
Xk
f =1
vu;f vi ;f vt;f
: : :
Steen Rendle 4 / 53
8. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Tensor Factorization
Example for data: Examples for models:
Triples of Subject, Predicate, Object
^yPARAFAC(s; p; o) :=
Xk
f =1
vs;f vp;f vo;f
^yPITF(s; p; o) := hvs ; vpi + hvs ; voi + hvp; voi
: : :
Steen Rendle 5 / 53
[illustration from Drumond et al. 2012]
9. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Sequential Factorization Models
Example for data: Examples for models:
Bt Bt3
b
b a b
a c
User 1 ?
c
e
c c
a
?
d c e e ?
?
User 2
User 3
User 4
Bt2
Bt1
a
^yFMC(u; i ; t) :=
X
l2Bt1
hvi ; vl i
^yFPMC(u; i ; t) := hvu; vi i +
X
l2Bt1
hvi ; vl i
: : :
Steen Rendle 6 / 53
10. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Models: Discussion
I Advantages
I Can estimate interactions between two (or more) variables even if
the cross is not observed.
I E.g. user movie, current product next product, user query
url, : : :
Steen Rendle 7 / 53
11. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Models: Discussion
I Advantages
I Can estimate interactions between two (or more) variables even if
the cross is not observed.
I E.g. user movie, current product next product, user query
url, : : :
I Downsides
I Factorization models are usually build speci
12. cally for each problem.
I Learning algorithms and implementations are tailored to individual
models.
Steen Rendle 7 / 53
14. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Data and Variable Representation
Many standard ML approaches work with real valued feature vectors as
input. It allows to represent, e.g.:
I any number of variables
I categorical domains by using dummy indicator variables
I numerical domains
I set-categorical domains by using dummy indicator variables
Using this representation allows to apply a wide variety of standard
models (e.g. linear regression, SVM, etc.).
Steen Rendle 9 / 53
15. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Linear Regression
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation:
^y(x) := w0 +
Xp
i=1
wi xi
I Model parameters:
w0 2 R; w 2 Rp
O(p) model parameters.
Steen Rendle 10 / 53
16. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Polynomial Regression
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
wi ;j xi xj
I Model parameters:
w0 2 R; w 2 Rp; W 2 Rpp
O(p2) model parameters.
Steen Rendle 11 / 53
18. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
Steen Rendle 13 / 53
19. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
Movie
TI NH SW ST ...
5 3 1 ? ...
? ? 4 5 ...
1 ? 5 ? ...
... ... ... ... ...
A
B
C
...
User
,
# User Movie Rating
1 Alice Titanic 5
2 Alice Notting Hill 3
3 Alice Star Wars 1
4 Bob Star Wars 4
5 Bob Star Trek 5
6 Charlie Titanic 1
7 Charlie Star Wars 5
. . . . . . . . . . . .
Steen Rendle 13 / 53
20. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Representation: Matrix/ Tensor vs. Feature Vectors
Matrix/ Tensor data can be represented by feature vectors:
# User Movie Rating
1 Alice Titanic 5
2 Alice Notting Hill 3
3 Alice Star Wars 1
4 Bob Star Wars 4
5 Bob Star Trek 5
6 Charlie Titanic 1
7 Charlie Star Wars 5
. . . . . . . . . . . .
)
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Steen Rendle 13 / 53
21. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Steen Rendle 14 / 53
22. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Linear regression: ^y(x) = w0 + wu + wi
Steen Rendle 14 / 53
23. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Linear regression: ^y(x) = w0 + wu + wi
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i
Steen Rendle 14 / 53
24. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
1 0 0 ...
1 0 0 ...
x(3) 1 0 0 ... 0 0 1 0 ...
0 1 0 ...
0 1 0 ...
0 0 1 ...
1
0
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
...
...
...
...
...
0 0 1 ... 0 0 1 0 ...
A B C ... TI NH SW ST ...
x(1)
x(2)
x(4)
x(5)
x(6)
x(7)
Feature vector x
User Movie
Target y
5
3
1 y(3)
4
5
1
5
y(1)
y(2)
y(4)
y(5)
y(6)
y(7)
Applying regression models to this data leads to:
Linear regression: ^y(x) = w0 + wu + wi
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i
Matrix factorization: ^y(u; i) = hwu; hi i
Steen Rendle 14 / 53
25. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
Steen Rendle 15 / 53
26. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
Steen Rendle 15 / 53
27. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
Steen Rendle 15 / 53
28. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
I n p2: number of cases is much smaller than number of model
parameters.
Steen Rendle 15 / 53
29. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
I n p2: number of cases is much smaller than number of model
parameters.
I Max.-likelihood estimator for a pairwise eect is:
wi ;j =
(
y w0 wi wu; if (i ; j ; y) 2 S:
not de
31. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Application to Sparse Feature Vectors
For the data of the example:
I Linear regression has no user-item interaction.
I ) Linear regression is not expressive enough.
I Polynomial regression includes pairwise interactions but cannot
estimate them from the data.
I n p2: number of cases is much smaller than number of model
parameters.
I Max.-likelihood estimator for a pairwise eect is:
wi ;j =
(
y w0 wi wu; if (i ; j ; y) 2 S:
not de
32. ned; else
I Polynomial regression cannot generalize to any unobserved pairwise
eect.
Steen Rendle 15 / 53
34. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
35. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Compared to Polynomial regression:
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
wi ;j xi xj
I Model parameters:
w0 2 R; w 2 Rp; W 2 Rpp
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
36. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 2):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
37. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machine (FM)
I Let x 2 Rp be an input vector with p predictor variables.
I Model equation (degree 3):
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
+
Xp
i=1
Xp
ji
Xp
lj
Xk
f =1
v(3)
i ;f v(3)
j ;f v(3)
l ;f xi xj xl
I Model parameters:
w0 2 R; w 2 Rp; V 2 Rpk ; V(3) 2 Rpk
Steen Rendle 17 / 53
[Rendle 2010, Rendle 2012]
38. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Factorization Machines: Discussion
I FMs work with real valued input.
I FMs include variable interactions like polynomial regression.
I Model parameters for interactions are factorized.
I Number of model parameters is O(k p) (instead of O(p2) for poly.
regr.).
Steen Rendle 18 / 53
47. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)
Steen Rendle 26 / 53
48. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)
I Ecient computation can be done in: O(p k)
Steen Rendle 26 / 53
49. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Computation Complexity
Factorization Machine model equation:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
I Trivial computation: O(p2 k)
I Ecient computation can be done in: O(p k)
I Making use of many zeros in x even in: O(Nz (x) k), where Nz (x) is
the number of non-zero elements in vector x.
Steen Rendle 26 / 53
50. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Ecient Computation
The model equation of an FM can be computed in O(p k).
Steen Rendle 27 / 53
51. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Ecient Computation
The model equation of an FM can be computed in O(p k).
Proof:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
= w0 +
Xp
i=1
wi xi +
1
2
Xk
f =1
2
4
Xp
i=1
xi vi ;f
!2
Xp
i=1
(xi vi ;f )2
3
5
Steen Rendle 27 / 53
52. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Ecient Computation
The model equation of an FM can be computed in O(p k).
Proof:
^y(x) := w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
ji
hvi ; vj i xi xj
= w0 +
Xp
i=1
wi xi +
1
2
Xk
f =1
2
4
Xp
i=1
xi vi ;f
!2
Xp
i=1
(xi vi ;f )2
3
5
I In the sums over i , only non-zero xi elements have to be summed up
) O(Nz (x) k).
I (The complexity of polynomial regression is O(Nz (x)2).)
Steen Rendle 27 / 53
53. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Multilinearity
FMs are multilinear:
8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x)
where g() and h() do not depend on the value of .
Steen Rendle 28 / 53
54. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Multilinearity
FMs are multilinear:
8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x)
where g() and h() do not depend on the value of .
E.g. for second order eects ( = vl ;f ):
^y(x; vl;f ) :=
g(vl;f )(x)
z }| {
w0 +
Xp
i=1
wi xi +
Xp
i=1
Xp
j=i+1
Xk
f 0=1
(f 06=f )_(l62fi ;jg)
vi ;f 0 vj;f 0 xi xj
+ vl;f xl
X
i=1;i6=l
vi ;f xi
| {z }
h(vl;f )(x)
Steen Rendle 28 / 53
56. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Learning
Using these properties, learning algorithms can be developed:
I L2-regularized regression and classi
57. cation:
I Stochastic gradient descent [Rendle, 2010]
I Alternating least squares/ Coordinate Descent [Rendle et al., 2011,
Rendle 2012]
I Markov Chain Monte Carlo (for Bayesian FMs) [Freudenthaler et al.
2011, Rendle 2012]
I L2-regularized ranking:
I Stochastic gradient descent [Rendle, 2010]
All the proposed learning algorithms have a runtime of O(k Nz (X) i ),
where i is the number of iterations and Nz (X) the number of non-zero
elements in the design matrix X.
Steen Rendle 30 / 53
58. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Stochastic Gradient Descent (SGD)
I For each training case (x; y) 2 S, SGD updates the FM model
parameter using:
0 =
(^y(x) y)h()(x) + ()
I is the learning rate / step size.
I () is the regularization value of the parameter .
I SGD can easily be applied to other loss functions.
Steen Rendle 31 / 53
[Rendle, 2010]
59. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Coordinate Descent (CD)
I CD updates each FM model parameter using:
0 =
P
(x;y)2S
y g()(x)
h()(x)
P
(x;y)2S h2
()(x) + ()
I Using caches of intermediate results, the runtime for updating all
model parameters is O(k Nz (X)).
I CD can be extended to classi
61. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Gibbs Sampling (MCMC)
I Gibbs sampling with a block for each FM model parameter :
jS; n fg N
P
(x;y)2S
y g()(x)
h()(x)
P
(x;y)2S h2
()(x) + ()
;
1
P
(x;y)2S h2
()(x) + ()
!
I Mean is the same as for CD ) computational complexity is also
O(k Nz (X)).
I MCMC can be extended to classi
62. cation using link functions.
Steen Rendle 33 / 53
[Freudenthaler et al. 2011, Rendle 2012]
63. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Learning Regularization Values
v ,v
yi
w ,w
wj
w0
xij
i=1,...,n
v j
j=1,...,p
, 0 ,0
yi
wj
w0
w
xij
i=1,...,n
v j
j=1,...,p
w0 ,w0
0 ,0
w0 ,w0
w v v
Standard FM with priors. Two level FM with hyperpriors.
Steen Rendle 34 / 53
[Freudenthaler et al., 2011]
65. Factorization Models Polynomial Regression Factorization Machines Applications Summary
libFM Software
libFM is an implementation of FMs
I Model: second-order FMs
I Learning/ inference: SGD, ALS, MCMC
I Classi
66. cation and regression
I Uses the same data format as LIBSVM, LIBLINEAR [Lin et. al],
SVMlight [Joachims].
I Supports variable grouping.
I Open source: GPLv3.
Steen Rendle 36 / 53
[http://www.libfm.org/]
67. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 37 / 53
68. Factorization Models Polynomial Regression Factorization Machines Applications Summary
(Context-aware) Rating Prediction
I Main variables:
I User ID (categorical)
I Item ID (categorical)
I Additional variables:
I time
I mood
I user pro
69. le
I item meta data
I . . .
I Examples: Net
ix prize, Movielens, KDDCup 2011
+
♪ + +
Song
User Time Mood
Steen Rendle 38 / 53
70. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Net
ix Prize
Netflix Prize: Prediction Error
Public Leaderboard
RMS Error
0.86 0.87 0.88 0.89 0.90
user, movie
user, movie, day
user, movie, impl.
user, movie,
day, impl.
SGD Matrix
Factorization
user, movie,
day, impl.,
freq, lin. day
$1M Prize
I k = 128 factors, 512 MCMC samples (no burnin phase, initialization
from random)
I MCMC inference (no hyperparameters (learning rate, regularization)
to specify)
Steen Rendle 39 / 53
71. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Net
ix Prize
Method (Name) Ref. Learning Method k Quiz RMSE
Models using user ID and item ID
Probabilistic Matrix Factorization [14, 13] Batch GD 40 *0.9170
Probabilistic Matrix Factorization [14, 13] Batch GD 150 0.9211
Matrix Factorization [6] Variational Bayes 30 *0.9141
Matchbox [15] Variational Bayes 50 *0.9100
ALS-MF [7] ALS 100 0.9079
ALS-MF [7] ALS 1000 *0.9018
SVD/ MF [3] SGD 100 0.9025
SVD/ MF [3] SGD 200 *0.9009
Bayesian Probablistic Matrix Factorization
[13] MCMC 150 0.8965
(BPMF)
Bayesian Probablistic Matrix Factorization
(BPMF)
[13] MCMC 300 *0.8954
FM, pred. var: user ID, movie ID - MCMC 128 0.8937
Models using implicit feedback
Probabilistic Matrix Factorization with Cons-
traints
[14] Batch GD 30 *0.9016
SVD++ [3] SGD 100 0.8924
SVD++ [3] SGD 200 *0.8911
BSRM/F [18] MCMC 100 0.8926
BSRM/F [18] MCMC 400 *0.8874
FM, pred. var: user ID, movie ID, impl. - MCMC 128 0.8865
Steen Rendle 40 / 53
72. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Net
ix Prize
Method (Name) Ref. Learning Method k Quiz RMSE
Models using time information
Bayesian Probabilistic Tensor Factorization
[17] MCMC 30 *0.9044
(BPTF)
FM, pred. var: user ID, movie ID, day - MCMC 128 0.8873
Models using time and implicit feedback
timeSVD++ [5] SGD 100 0.8805
timeSVD++ [5] SGD 200 *0.8799
FM, pred. var: user ID, movie ID, day, impl. - MCMC 128 0.8809
FM, pred. var: user ID, movie ID, day, impl. - MCMC 256 0.8794
Assorted models
BRISMF/UM NB corrected [16] SGD 1000 *0.8904
BMFSI plus side information [8] MCMC 100 *0.8875
timeSVD++ plus frequencies [4] SGD 200 0.8777
timeSVD++ plus frequencies [4] SGD 2000 *0.8762
FM, pred. var: user ID, movie ID, day, impl.,
- MCMC 128 0.8779
freq., lin. day
FM, pred. var: user ID, movie ID, day, impl.,
freq., lin. day
- MCMC 256 0.8771
Steen Rendle 40 / 53
73. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 41 / 53
74. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Link Prediction in Social Networks
I Main variables:
I Actor A ID
I Actor B ID
I Additional variables:
I pro
75. les
I actions
I . . .
+
Actor A Actor B
Steen Rendle 42 / 53
76. Factorization Models Polynomial Regression Factorization Machines Applications Summary
KDDCup 2012: Track 1
KDDCup 2012 Track 1: Prediction Quality
Public Leaderboard Private Leaderboard
Mean Average Precision @3
0.32 0.34 0.36 0.38 0.40 0.42
none
gender, age, ...
keywords
friends
all
none
gender, age, ...
keywords
friends
all
Top 1
Top 5
Top 10
Top 100
I k = 22 factors, 512 MCMC samples (no burnin phase, initialization
from random)
I MCMC inference (no hyperparameters (learning rate, regularization)
to specify)
Steen Rendle 43 / 53
[Awarded 2nd place (out of 658 teams)]
77. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 44 / 53
78. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Clickthrough Prediction
I Main variables:
I User ID
I Query ID
I Ad/ Link ID
I Additional variables:
I query tokens
I user pro
79. le
I . . .
+
keyword... +
Link 1
Link 2
Link 3
User Query Ad/ Link
Steen Rendle 45 / 53
80. Factorization Models Polynomial Regression Factorization Machines Applications Summary
KDDCup 2012: Track 2
Model Inference wAUC (public) wAUC (private)
ID-based model (k = 0) SGD 0.78050 0.78086
Attribute-based model (k = 8) MCMC 0.77409 0.77555
Mixed model (k = 8) SGD 0.79011 0.79321
Final ensemble n/a 0.79857 0.80178
Ensemble
I Rank positions (not predicted clickthrough rates) are used.
I The MCMC attribute-based model and dierent variations of the
. SGD models are included.
Steen Rendle 46 / 53
[Awarded 3rd place (out of 171 teams)]
81. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 47 / 53
82. Factorization Models Polynomial Regression Factorization Machines Applications Summary
ECML/PKDD Discovery Challenge 2013
I Problem: Recommend given names.
I Main variables:
I User ID
I Name ID
I Additional variables:
I session info
I string representation for each name
I . . .
I FM approach won 1st place (online track) and 2nd (oine track).
Steen Rendle 48 / 53
83. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 49 / 53
84. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Student Performance Prediction
I Main variables:
I Student ID
I Question ID
I Additional variables:
I question hierarchy
I sequence of questions
I skills required
I . . .
I Examples: KDDCup 2010, Grockit Challenge4 (FM placed 1st/241)
+
?
Student Question
4http://www.kaggle.com/c/WhatDoYouKnow
Steen Rendle 50 / 53
85. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Outline
Factorization Models Polynomial Regression
Factorization Machines
Applications
Recommender Systems
Link Prediction in Social Networks
Clickthrough Prediction
Personalized Ranking
Student Performance Prediction
Kaggle Competitions
Summary
Steen Rendle 51 / 53
86. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Kaggle Competitions
FMs have been successfully applied to several Kaggle competitions:
I Criteon Display Advertising Challenge: 1st place (team '3 idiots').
I Blue Book for Bulldozers: 1st place (team 'Leustagos Titericz').
I EMI Music Data Science Hackathon: 2nd place (team 'lns').
Steen Rendle 52 / 53
87. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Summary
I Factorization machines combine linear/polynomial regression with
factorization models.
I Feature interactions are learned with a low rank representation.
I Estimation of unobserved interactions is possible.
I Factorization machines can be computed eciently and have high
prediction quality.
Steen Rendle 53 / 53
88. Factorization Models Polynomial Regression Factorization Machines Applications Summary
L. Drumond, S. Rendle, and L. Schmidt-Thieme.
Predicting rdf triples in incomplete knowledge bases with tensor
factorization.
In Proceedings of the 27th Annual ACM Symposium on Applied
Computing, SAC '12, pages 326{331, New York, NY, USA, 2012.
ACM.
C. Freudenthaler, L. Schmidt-Thieme, and S. Rendle.
Bayesian factorization machines.
In NIPS workshop on Sparse Representation and Low-rank
Approximation, 2011.
Y. Koren.
Factorization meets the neighborhood: a multifaceted collaborative
89. ltering model.
In KDD '08: Proceeding of the 14th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 426{434,
New York, NY, USA, 2008. ACM.
Y. Koren.
The bellkor solution to the net
ix grand prize.
2009.
Steen Rendle 53 / 53
91. ltering with temporal dynamics.
In KDD '09: Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 447{456,
New York, NY, USA, 2009. ACM.
Y. J. Lim and Y. W. Teh.
Variational Bayesian approach to movie rating prediction.
In Proceedings of KDD Cup and Workshop, 2007.
I. Pilaszy, D. Zibriczky, and D. Tikk.
Fast als-based matrix factorization for explicit and implicit feedback
datasets.
In RecSys '10: Proceedings of the fourth ACM conference on
Recommender systems, pages 71{78, New York, NY, USA, 2010.
ACM.
I. Porteous, A. Asuncion, and M. Welling.
Bayesian matrix factorization with side information and dirichlet
process mixtures.
In Proceedings of the Twenty-Fourth AAAI Conference on Arti
93. Factorization Models Polynomial Regression Factorization Machines Applications Summary
S. Rendle.
Factorization machines.
In Proceedings of the 2010 IEEE International Conference on Data
Mining, ICDM '10, pages 995{1000, Washington, DC, USA, 2010.
IEEE Computer Society.
S. Rendle.
Factorization machines with libFM.
ACM Trans. Intell. Syst. Technol., 3(3):57:1{57:22, May 2012.
S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme.
Factorizing personalized markov chains for next-basket
recommendation.
In WWW '10: Proceedings of the 19th international conference on
World wide web, pages 811{820, New York, NY, USA, 2010. ACM.
S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme.
Fast context-aware recommendations with factorization machines.
In Proceedings of the 34th ACM SIGIR Conference on Reasearch and
Development in Information Retrieval. ACM, 2011.
R. Salakhutdinov and A. Mnih.
Steen Rendle 53 / 53
94. Factorization Models Polynomial Regression Factorization Machines Applications Summary
Bayesian probabilistic matrix factorization using Markov chain Monte
Carlo.
In Proceedings of the 25th international conference on Machine
learning, ICML '08, pages 880{887, New York, NY, USA, 2008.
ACM.
R. Salakhutdinov and A. Mnih.
Probabilistic matrix factorization.
In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in
Neural Information Processing Systems 20, pages 1257{1264,
Cambridge, MA, 2008. MIT Press.
D. H. Stern, R. Herbrich, and T. Graepel.
Matchbox: large scale online bayesian recommendations.
In Proceedings of the 18th international conference on World wide
web, WWW '09, pages 111{120, New York, NY, USA, 2009. ACM.
G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk.
Scalable collaborative
95. ltering approaches for large recommender
systems.
J. Mach. Learn. Res., 10:623{656, June 2009.
Steen Rendle 53 / 53
96. Factorization Models Polynomial Regression Factorization Machines Applications Summary
L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J. G. Carbonell.
Temporal collaborative
97. ltering with bayesian probabilistic tensor
factorization.
In Proceedings of the SIAM International Conference on Data
Mining, pages 211{222. SIAM, 2010.
S. Zhu, K. Yu, and Y. Gong.
Stochastic relational models for large-scale dyadic data using
MCMC.
In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors,
Advances in Neural Information Processing Systems 21, pages
1993{2000, 2009.
Steen Rendle 53 / 53