Introduction to Factorization Machines model with an example. Motivations - why you should have it in your toolbox, model and it expressiveness, use case for context-aware recommendations and Field-Aware Factorization Machines.
An overview of some deep learning methods for recommender systems along with an intro to the relevant deep learning methods such as convolutional neural networks (CNN's), recurrent neural networks (RNN's), autoencoders, restricted boltzmann machines (RBM's) and more.
How does Netflix recommend movies? In this presentation we go over a very common technique for recommendations called matrix factorization to predict what rating a user will give a movie. It sounds like a complicated mathematical concept, but all it consists of is finding a set of intermediate features such as action, comedy, etc., and using them to help us determine the ratings.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Introduction to Factorization Machines model with an example. Motivations - why you should have it in your toolbox, model and it expressiveness, use case for context-aware recommendations and Field-Aware Factorization Machines.
An overview of some deep learning methods for recommender systems along with an intro to the relevant deep learning methods such as convolutional neural networks (CNN's), recurrent neural networks (RNN's), autoencoders, restricted boltzmann machines (RBM's) and more.
How does Netflix recommend movies? In this presentation we go over a very common technique for recommendations called matrix factorization to predict what rating a user will give a movie. It sounds like a complicated mathematical concept, but all it consists of is finding a set of intermediate features such as action, comedy, etc., and using them to help us determine the ratings.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
The slides from the Learning to Rank for Recommender Systems tutorial given at ACM RecSys 2013 in Hong Kong by Alexandros Karatzoglou, Linas Baltrunas and Yue Shi.
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
The Factorization Machines algorithm for building recommendation system - Paw...Evention
One of successful examples of data science applications in the Big Data domain are recommendation systems. The goal of my talk is to present the Factorization Machines algorithm, available in the SAS Viya platform.
The Factorization Machines is a good choice for making predictions and recommendations based on large sparse data, in particular specific for the Big Data. In practical part of the presentation, a low level granularity data from the NBA league will be used to build an application recommending optimal game strategies as well as predicting results of league games.
Counterfactual Learning for RecommendationOlivier Jeunen
Slides for our presentation at the REVEAL workshop for RecSys '19 in Copenhagen and a Data Science Leuven Meetup, titled "Counterfactual Learning for Recommendation".
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
Nearest neighbors refers to something that is conceptually very simple. For a set of points in some space (possibly many dimensions), we want to find the closest k neighbors quickly.
This presentation covers a library called Annoy built my me that that helps you do (approximate) nearest neighbor queries in high dimensional spaces. We're going through vector models, how to measure similarity, and why nearest neighbor queries are useful.
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
In this talk, we present a general multi-armed bandit framework for recommendations on the Netflix homepage. We present two example case studies using MABs at Netflix - a) Artwork Personalization to recommend personalized visuals for each of our members for the different titles and b) Billboard recommendation to recommend the right title to be watched on the Billboard.
Personalized Page Generation for Browsing RecommendationsJustin Basilico
Talk from First Workshop on Recommendation Systems for TV and Online Video at RecSys 2014 in Foster City, CA on 2014-10-10 about how we personalize the layout of the Netflix homepage to make it easier for people to browse the recommendations to quickly find something to watch and enjoy.
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
The slides from the Learning to Rank for Recommender Systems tutorial given at ACM RecSys 2013 in Hong Kong by Alexandros Karatzoglou, Linas Baltrunas and Yue Shi.
Talk with Yves Raimond at the GPU Tech Conference on Marth 28, 2018 in San Jose, CA.
Abstract:
In this talk, we will survey how Deep Learning methods can be applied to personalization and recommendations. We will cover why standard Deep Learning approaches don't perform better than typical collaborative filtering techniques. Then we will survey we will go over recently published research at the intersection of Deep Learning and recommender systems, looking at how they integrate new types of data, explore new models, or change the recommendation problem statement. We will also highlight some of the ways that neural networks are used at Netflix and how we can use GPUs to train recommender systems. Finally, we will highlight promising new directions in this space.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
The Factorization Machines algorithm for building recommendation system - Paw...Evention
One of successful examples of data science applications in the Big Data domain are recommendation systems. The goal of my talk is to present the Factorization Machines algorithm, available in the SAS Viya platform.
The Factorization Machines is a good choice for making predictions and recommendations based on large sparse data, in particular specific for the Big Data. In practical part of the presentation, a low level granularity data from the NBA league will be used to build an application recommending optimal game strategies as well as predicting results of league games.
Counterfactual Learning for RecommendationOlivier Jeunen
Slides for our presentation at the REVEAL workshop for RecSys '19 in Copenhagen and a Data Science Leuven Meetup, titled "Counterfactual Learning for Recommendation".
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
Nearest neighbors refers to something that is conceptually very simple. For a set of points in some space (possibly many dimensions), we want to find the closest k neighbors quickly.
This presentation covers a library called Annoy built my me that that helps you do (approximate) nearest neighbor queries in high dimensional spaces. We're going through vector models, how to measure similarity, and why nearest neighbor queries are useful.
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
In this talk, we present a general multi-armed bandit framework for recommendations on the Netflix homepage. We present two example case studies using MABs at Netflix - a) Artwork Personalization to recommend personalized visuals for each of our members for the different titles and b) Billboard recommendation to recommend the right title to be watched on the Billboard.
Personalized Page Generation for Browsing RecommendationsJustin Basilico
Talk from First Workshop on Recommendation Systems for TV and Online Video at RecSys 2014 in Foster City, CA on 2014-10-10 about how we personalize the layout of the Netflix homepage to make it easier for people to browse the recommendations to quickly find something to watch and enjoy.
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, back propagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and derivatives is helpful in order to derive the maximum benefit from this session.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
In machine learning, training large models on a massive amount of data usually improves results. Our customers report, however, that training such models and deploying them is either operationally prohibitive or outright impossible for them. We created a collection of machine learning algorithms that scale to any amount of data, including k-means clustering for data segmentation, factorization machines for recommendations, time-series forecasting, linear regression, topic modeling, and image classification. This talk will discuss those algorithms, understand where and how they can be used.
Visual diagnostics for more effective machine learningBenjamin Bengfort
The model selection process is a search for the best combination of features, algorithm, and hyperparameters that maximize F1, R2, or silhouette scores after cross-validation. This view of machine learning often leads us toward automated processes such as grid searches and random walks. Although this approach allows us to try many combinations, we are often left wondering if we have actually succeeded.
By enhancing model selection with visual diagnostics, data scientists can inject human guidance to steer the search process. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the high dimensional realm that our models operate. As we continue to tune our models, trying to minimize both bias and variance, these glimpses allow us to be more strategic in our choices. The result is more effective modeling, speedier results, and greater understanding of underlying processes.
Visualization is an integral part of the data science workflow, but visual diagnostics are directly tied to machine learning transformers and models. The Yellowbrick library extends the scikit-learn API providing a Visualizer object, an estimator that learns from data and produces a visualization as a result. In this talk, we will explore feature visualizers, visualizers for classification, clustering, and regression, as well as model analysis visualizers. We'll work through several examples and show how visual diagnostics steer model selection, making machine learning more effective.
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL – NT2 – providing a Matlab -like syntax for parallel numerical computations inside a C++ library.
Main issues addressed here is how liimtaions of classical DSEL generation and multithreaded code generation can be overcome.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and elementary calculus (derivatives), are helpful in order to derive the maximum benefit from this session.
Next we'll see a simple neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. (Bonus points if you know Zorn's Lemma, the Well-Ordering Theorem, and the Axiom of Choice.)
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
Title: Factorization Machines
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
Abstract:
Developing accurate recommender systems for a specific problem setting seems to be a complicated and time-consuming task: models have to be defined, learning algorithms derived and implementations written. In this talk, I present the factorization machine (FM) model which is a generic factorization approach that allows to be adapted to problems by feature engineering. Efficient FM learning algorithms are discussed among them SGD, ALS/CD and MCMC inference including automatic hyperparameter selection. I will show on several tasks, including the Netflix prize and KDDCup 2012, that FMs are flexible and generate highly competitive accuracy. With FMs these results can be achieved by simple data preprocessing and without any tuning of regularization parameters or learning rates.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Pluribus One
Many modern face verification algorithms use a small set of reference templates to save memory and computa- tional resources. However, both the reference templates and the combination of the corresponding matching scores are heuristically chosen. In this paper, we propose a well- principled approach, named sparse support faces, that can outperform state-of-the-art methods both in terms of recog- nition accuracy and number of required face templates, by jointly learning an optimal combination of matching scores and the corresponding subset of face templates. For each client, our method learns a support vector machine using the given matching algorithm as the kernel function, and de- termines a set of reference templates, that we call support faces, corresponding to its support vectors. It then dras- tically reduces the number of templates, without affecting recognition accuracy, by learning a set of virtual faces as well-principled transformations of the initial support faces. The use of a very small set of support face templates makes the decisions of our approach also easily interpretable for designers and end users of the face verification system.
Similar to Factorization Machines and Applications in Recommender Systems (20)
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
2. Introduction to FM
In 2010, Steffen Rendle (currently a senior research scientist
at Google), introduced a seminal paper [Rendle - FM].
Many years have passed since such an impactful algorithm has
been introduced in the world of ML /Recommender Systems/.
FMs are a new model class which combines the advantages of
Support Vector Machines(SVM)/polynomial regression and
factorization models.
2 / 29
3. Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
3 / 29
5. Kaggle Competitions
Kaggle Competitions won with Factorization Machines:
Criteo’s CTR on display ads contest - 2014
https:
//www.kaggle.com/c/criteo-display-ad-challenge
http://www.csie.ntu.edu.tw/~r01922136/
kaggle-2014-criteo.pdf
Avazu’s CTR prediction context
https://www.kaggle.com/c/avazu-ctr-prediction
http://www.csie.ntu.edu.tw/~r01922136/slides/
kaggle-avazu.pdf
In both competitions the LogLoss function for evaluation of the
classifiers has been used - the aim is to minimize the Logrithmic
Loss.
LogLoss = − 1
N
N
i=1[yi log(pi ) − (1 − yi ) log(1 − pi )]
with binary classification
3 / 29
6. Task Description
We are given (training and testing) dataset /images,
documents,.../
D = {(x(l), y(l))}l∈L ⊂ Rn × T
The aim is to find a function ˆy : Rn −→ T (target space),
which estimates the testing data /unknown to the trained
model/.
In recommender systems (Online advertising) we deal with
sparse input vectors x(l) ∈ Rn.
T can be {−1, 1}, {0, 1} (classification), R (regression) or
some categorical space (ranks for an item).
4 / 29
8. Ad Classification
Classification from implicit feedback - the user does not rate
explicitly..
But we can ”guess” items with which features he likes or does
not care about.
Country Day Ad Type Clicked ?
USA 3/3/15 MOVIE 1
China 1/7/14 GAME 0
China 3/3/15 GAME 1
6 / 29
9. Standard (dummy) encoding
USA China 3/3/15 1/7/14 MOVIE GAME Clicked ?
1 0 1 0 1 0 1
0 1 0 1 0 1 0
0 1 1 0 0 1 1
Very large feature space
Very sparse samples (feature values)
7 / 29
10. (De)motivation!
Often features are more important in pairs:
”Country == USA”&”Day == Thanksgiving”
If we create a new feature (conjunction) for every pair of
features..?!?
8 / 29
11. (De)motivation!
Often features are more important in pairs:
”Country == USA”&”Day == Thanksgiving”
If we create a new feature (conjunction) for every pair of
features..?!?
Feature space: insanely large
If originally n features −→ additional n
2 = n(n−1)
2!
Samples: still sparse
8 / 29
13. SVD with ALS
Singular Value Decomposition (Matrix Factorization) with
Alternate Least Squares.
The most basic matrix factorization model for recommender
systems models the rating ˆr a user u would give to an item i by:
ˆrui = xT
u yi ,
where
xT
u = (x1
u , · · · , xN
u ) - factor vector associated to the user
yT
i = (x1
y , · · · , yN
i ) - factor vector associated to the item
The dimension N of the factors xT
u , yT
i is the rank of the
model (factorization).
10 / 29
18. Advantages of Factorization Models
Factorization models:
Factorization of higher order interactions
Efficient parameter estimation and superior performance
Even for sparse data where just a few or no observations for
those higher order effects are available
Main applications:
Collaborative Filtering (in Recommender Systems)
Link Prediction (in Social Networks)
15 / 29
19. Disadvantages of Factorization Models
BUT are not general prediction models. There are many
factorization models designed for specific tasks.
Standard models:
Matrix Factorization (MF)
Tensor Factorization (TF)
Specific tasks:
SVD++ [Koren, Bell - Advances in CF]
timeSVD++ [Koren, Bell - Advances in CF]
PIFT (Pairwise Interaction Tensor Factorization)
FPMC (Factorizing Personalized Markov Chains)
And others...
16 / 29
20. Advantages of FMs
FMs allow parameter estimation under very sparse data where
SVMs fail.
FMs have linear (and even ”almost constant”!) complexity
and don’t rely on support vectors like SVMs.
FMs are a general predictor that can work with any real
valued feature vector. In contrast, other state-of-the-art
factorization models work only on very restricted input data.
Through suitable feature engineering, FMs can mimic
state-of-the-art models like MF, SVD++, timeSVD++,
PARAFAC, PITF, FPMC and others.
17 / 29
22. Notations
For any x ∈ Rn let us define:
m(x) to be the number of the nonzero elements of the feature
vector;
¯mD to be the average of m(x(l)
) for all l ∈ L;
Since we deal with huge sparsity, we have that ¯mD n - the
dimensionality of the feature space.
One reason for huge sparsity is that the underlying problem
deals with large categorical variable domains.
19 / 29
23. Definition of Factorization Machine Model of degree 2
ˆy(x) := w0 +
n
i=1
wi xi +
n
i=1
n
j=i+1
vi , vj xi xj (1)
where the model parameters that have to be estimated are:
w0 ∈ R − global baias (mean rating over all items/users)
w = (w1, . . . , wn) ∈
Rn − weights of the corresponding features
vi := (vi,1, . . . , vi,k) within V ∈ Rn×k describes the i-th
variable(feature) with k factors.
20 / 29
24. Complexity of the model equation
The model equation ˆy(x) for every x ∈ D can be computed in
linear time O(kn) and under sparsity O(km(x))
Proof:
n
i=1
n
j=i+1 vi , vj xi xj =
1
2
n
i=1
n
j=1 vi , vj xi xj − 1
2
n
i=1 vi , vi xi xi =
1
2[ n
i=1
n
j=1
k
f =1 vi,f vj,f xi xj − n
i=1
k
f =1 vi,f vi,f xi xi ] =
1
2
k
f =1[( n
i=1 vi,f xi )( n
j=1 vj,f xj ) − n
i=1 v2
i,f x2
i ] =
1
2
k
f =1[( n
i=1 vi,f xi )2 − n
i=1 v2
i,f x2
i ]
21 / 29
25. Complexity of parameter updates
The needed computation to get the partial derivatives of ˆy(x) is
done while computing the computation of ˆy(x). Therefore,
All parameter updates for a case (x, y) can be done in O(kn)
and under sparsity O(km(x)).
∂ˆy(x)
∂θ
=
1, if θ is w0
xi , if θ is wi
xi
n
j=1 vj,f xj − vi,f x2
i if θ is vi,f
(2)
22 / 29
26. General applications of FM
FMs can be applied to a variety of prediction tasks.
Regression: ˆy(x) can be directly applied as a regression
predictor
Binary classification: The sign of ˆy(x) is used and the
parameters are optimized for hinge loss or logit loss.
Ranking: Input vectors x are ordered by the score of ˆy(x)
and optimization is done over pairs of instance vectors with a
pairwise classification loss ([Joachims - Ranking CTR]).
23 / 29
28. Library: libFM (C++)
Author: Steffen Rendle
Web site: http://www.libfm.org/
(very poor description of the library)
Source code (C++): https://github.com/srendle/libfm
Very professionaly written source code! (Linux and MacOS X)
libFM features
stochastic gradient descent (SGD)
alternating least squares (ALS)
Bayesian inference using Markov Chain Monte Carlo (MCMC)
25 / 29
29. Library: fastFM (Python)
Author: Immanuel Bayer
Web site: http://ibayer.github.io/fastFM/
Very detailed tutorial for the usage of the library!
Source code (Python):
https://github.com/ibayer/fastFM
Python (2.7 and 3.5) with the well known scikit-learn API.
Performence critical code in C and wrapped with Cython.
Task Solver Loss
Regression ALS, SGD, MCMC Square Loss
Classification ALS, SGD, MCMC Probit(MAP), Probit, Sigmuid
Ranking SGD [Rendle - BPR]
26 / 29
30. Library: LIBFFM (C++)
A library for Field-aware Factorization Machines from
the Machine Learning group at National Taiwan University
Author: YuChin Juan and colleagues
Web site:
https://www.csie.ntu.edu.tw/~cjlin/libffm/
(very poor description of the library)
Source code (C++):
https://github.com/guestwalk/libffm
FFM is prone to overfitting and not very stable for the
moment.
27 / 29
32. References
Steffen Rendle Factorization Machines. 2010. http://www.
ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and
Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking
from implicit feedback. UAI ’09, pages 452 2009
Koren Y. and Bell R. Advances in Collaborative Filtering.
Blondel M. Convex Factorization Machines.
http://www.mblondel.org/publications/mblondel-
ecmlpkdd2015.pdf
Thorsten Joachims, Optimizing Search Engines using
Clickthrough Data 2002.
29 / 29