This is a tutorial given in the International Conference on Machine Learning. The slides consist of four parts. Please look for Part 1, Part 3 and Part 4 to get a complete picture of this technology.
Recommender Systems Tutorial (Part 3) -- Online ComponentsBee-Chung Chen
This is a tutorial given in the International Conference on Machine Learning. The slides consist of four parts. Please look for Part 1, Part 2 and Part 4 to get a complete picture of this technology.
Recommender Systems Tutorial (Part 1) -- IntroductionBee-Chung Chen
This is a tutorial given in the International Conference on Machine Learning. The slides consist of four parts. Please look for Part 2, Part 3 and Part 4 to get a complete picture of this technology.
Webpage Personalization and User Profilingyingfeng
This document discusses techniques for webpage personalization and user profiling. It describes common properties of web personalization problems, including optimizing metrics like click-through rate (CTR) using large-scale sparse data. It then covers online logistic regression (OLR) and generalized matrix factorization (GMF) frameworks for CTR prediction. Experimental results on real-world datasets show that user profiles generated from matrix factorization models can provide significant click lifts over other profile methods when used as features for OLR.
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
This document provides an overview of a hands-on tutorial on machine learning in Python. It discusses various machine learning algorithms including linear regression, logistic regression, and regularization. It explains key concepts such as model selection, cross-validation, preprocessing, and evaluation metrics. Examples are provided to illustrate linear regression, regularization techniques like Ridge and Lasso regression, and logistic regression. The document encourages participants to practice these techniques on exercises.
Adversarial learning for neural dialogue generationKeon Kim
This document summarizes an adversarial learning approach for neural dialogue generation. The model uses a generator and discriminator, where the generator produces responses and the discriminator determines if they are human-like. The generator is trained to maximize rewards from the discriminator using policy gradients. Two methods are introduced to assign rewards at each generation step to address issues with the baseline approach. Teacher forcing is also used to directly expose the generator to human responses during training. The results showed this adversarial training approach generates higher quality responses than previous baselines.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
This document provides an overview of generative adversarial networks (GANs). It begins by explaining the basic GAN framework of having a generator and discriminator. It then discusses several GAN variations including DCGAN, EBGAN, WGAN, and BEGAN. Applications of GANs mentioned include image synthesis, image-to-image translation, and domain adaptation. The document notes that evaluating GANs is difficult as log-likelihood does not correlate with visual quality and qualitative assessments can be misleading. It reviews metrics like Inception Score and discusses challenges training GANs such as instability. In conclusion, the document covers the key concepts of GANs, related methods, applications, evaluation challenges, and training techniques.
Recommender Systems Tutorial (Part 3) -- Online ComponentsBee-Chung Chen
This is a tutorial given in the International Conference on Machine Learning. The slides consist of four parts. Please look for Part 1, Part 2 and Part 4 to get a complete picture of this technology.
Recommender Systems Tutorial (Part 1) -- IntroductionBee-Chung Chen
This is a tutorial given in the International Conference on Machine Learning. The slides consist of four parts. Please look for Part 2, Part 3 and Part 4 to get a complete picture of this technology.
Webpage Personalization and User Profilingyingfeng
This document discusses techniques for webpage personalization and user profiling. It describes common properties of web personalization problems, including optimizing metrics like click-through rate (CTR) using large-scale sparse data. It then covers online logistic regression (OLR) and generalized matrix factorization (GMF) frameworks for CTR prediction. Experimental results on real-world datasets show that user profiles generated from matrix factorization models can provide significant click lifts over other profile methods when used as features for OLR.
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
This document provides an overview of a hands-on tutorial on machine learning in Python. It discusses various machine learning algorithms including linear regression, logistic regression, and regularization. It explains key concepts such as model selection, cross-validation, preprocessing, and evaluation metrics. Examples are provided to illustrate linear regression, regularization techniques like Ridge and Lasso regression, and logistic regression. The document encourages participants to practice these techniques on exercises.
Adversarial learning for neural dialogue generationKeon Kim
This document summarizes an adversarial learning approach for neural dialogue generation. The model uses a generator and discriminator, where the generator produces responses and the discriminator determines if they are human-like. The generator is trained to maximize rewards from the discriminator using policy gradients. Two methods are introduced to assign rewards at each generation step to address issues with the baseline approach. Teacher forcing is also used to directly expose the generator to human responses during training. The results showed this adversarial training approach generates higher quality responses than previous baselines.
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
Words are no longer sufficient in delivering the search results users are looking for, particularly in relation to image search. Text and languages pose many challenges in describing visual details and providing the necessary context for optimal results. Machine Learning technology opens a new world of search innovation that has yet to be applied by businesses.
In this session, Mike Ranzinger of Shutterstock will share a technical presentation detailing his research on composition aware search. He will also demonstrate how the research led to the launch of AI technology allowing users to more precisely find the image they need within Shutterstock’s collection of more than 150 million images. While the company released a number of AI search enabled tools in 2016, this new technology allows users to search for items in an image and specify where they should be located within the image. The research identifies the networks that localize and describe regions of an image as well as the relationships between things. The goal of this research was to improve the future of search using visual data, contextual search functions, and AI. A combination of multiple machine learning technologies led to this breakthrough.
This document provides an overview of generative adversarial networks (GANs). It begins by explaining the basic GAN framework of having a generator and discriminator. It then discusses several GAN variations including DCGAN, EBGAN, WGAN, and BEGAN. Applications of GANs mentioned include image synthesis, image-to-image translation, and domain adaptation. The document notes that evaluating GANs is difficult as log-likelihood does not correlate with visual quality and qualitative assessments can be misleading. It reviews metrics like Inception Score and discusses challenges training GANs such as instability. In conclusion, the document covers the key concepts of GANs, related methods, applications, evaluation challenges, and training techniques.
Generative Adversarial Networks (GANs) are a type of generative model that uses two neural networks - a generator and discriminator - competing against each other. The generator takes noise as input and generates synthetic samples, while the discriminator evaluates samples as real or generated. They are trained together until the generator fools the discriminator. GANs can generate realistic images, do image-to-image translation, and have applications in reinforcement learning. However, training GANs is challenging due to issues like non-convergence and mode collapse.
Embedded based retrieval in modern search ranking systemMarsan Ma
1. The document discusses embedding-based retrieval techniques for search ranking systems. It outlines approaches for training embeddings using triplet loss and hard negative mining, and serving embeddings using approximate nearest neighbor techniques like product quantization.
2. When serving, techniques like product quantization are used to efficiently find the top-K nearest neighbors from large embedding spaces. This improves efficiency over brute force approaches.
3. Major companies like Facebook, Alibaba, and LinkedIn are developing next-generation systems using cross-attention and cost-aware models to further improve ranking performance while maintaining query efficiency.
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
Deep learning has made great strides for problems that can be learned with direct supervision, in which a large dataset with high-quality annotation is provided (e.g., ImageNet). However, collecting such a large dataset is expensive and time-consuming. In this talk, I will discuss our recent works on learning from computer simulation to tackle real-world problems. I will first present our novel Simulated+Unsupervised (S+U) domain adaptation algorithm, which fully leverages the flexibility of data simulators and bidirectional mappings between synthetic and real data. We show that our approach achieves the improved performance on the gaze estimation task, outperforming the prior approach by Shrivastava et al. Next, I will introduce our work on building data-driven vehicle collision prediction algorithms. Today’s vehicle collision prediction algorithms are rule-based and have not benefited from the recent developments in deep learning. This is because it is almost impossible to collect a large amount of collision data from the real world. To address this challenge, we collect a large accident dataset using a popular video game named GTA and train end-to-end classification algorithms which identify dangerous objects from a given image. Furthermore, we devise a novel domain adaptation method with which we can further improve the performance of our algorithm in the real-world.
Machine learning techniques can be applied in formal verification in several ways:
1) To enhance current formal verification tools by automating tasks like debugging, specification mining, and theorem proving.
2) To enable the development of new formal verification tools by applying machine learning to problems like SAT solving, model checking, and property checking.
3) Specific applications include using machine learning for debugging and root cause identification, learning specifications from runtime traces, aiding theorem proving by selecting heuristics, and tuning SAT solver parameters and selection.
Supervised embedding techniques in search ranking systemMarsan Ma
This document discusses using supervised embeddings techniques in search ranking systems. It begins by outlining different types of embedding techniques, including graph-based, content-based, supervised, and unsupervised. It then describes how graph-based embedding models like Item2vec could be implemented in a product pipeline. Finally, it discusses ongoing experiments comparing different embedding models and techniques, showing improvements in metrics like logloss and ROC AUC.
This document provides an overview of generative adversarial networks (GANs). It explains that GANs were introduced in 2014 and involve two neural networks, a generator and discriminator, that compete against each other. The generator produces synthetic data to fool the discriminator, while the discriminator learns to distinguish real from synthetic data. As they train, the generator improves at producing more realistic outputs that match the real data distribution. Examples of GAN applications discussed include image generation, text-to-image synthesis, and face aging.
Faster and cheaper, smart ab experiments - public ver.Marsan Ma
Stratification, CUPED, and covariate adjustment with machine learning can help reduce the sample size needed for A/B tests.
[1] Stratification removes variance among user groups, focusing only on variance within groups.
[2] CUPED uses a user's past metrics as a control variate to reduce variance. The better a user's past aligns with their present, the less data is needed.
[3] Covariate adjustment trains a model on user data. If the model well predicts outcomes, it can reduce remaining variance after accounting for the model, speeding up tests. The better the model predicts, the more variance is reduced.
Lecture slides presented at Northeastern University (December, 2020).
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as webpages, in response to user's need, which may be expressed as a query. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this lecture will be on the fundamentals of neural networks and their applications to learning to rank.
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
Manish Pandey gave a keynote talk on transforming EDA with machine learning and discussed opportunities and challenges. He described how machine learning can be applied across different design abstraction levels from formal verification to silicon engineering. Pandey also discussed using machine learning techniques like reinforcement learning and word embeddings to optimize formal verification, simulation, and mask synthesis. Finally, he outlined challenges with data availability and model development for machine learning in EDA.
Exploiting variable associations to configure efficient local search in large...Shunji Umetani
The document discusses exploiting variable associations to improve the efficiency of local search algorithms for solving large-scale set partitioning problems. It proposes a data mining approach to identify promising neighbor solutions by extracting useful features from problem instances, rather than just the problem formulation. An overview is provided of using this approach to configure efficient local search in large-scale set partitioning problems.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as items to be recommended, in response to user's need. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this tutorial will be on the fundamentals of neural networks and their applications to learning to rank.
Tutorial presented at ACM SIGIR/SIGKDD Africa Summer School on Machine Learning for Data Mining and Search (AFIRM 2020) conference in Cape Town, South Africa.
Dictionary Learning for Massive Matrix Factorizationrecsysfr
The document presents a new algorithm called Subsampled Online Dictionary Learning (SODL) for solving very large matrix factorization problems with missing values efficiently. SODL adapts an existing online dictionary learning algorithm to handle missing values by only using the known ratings for each user, allowing it to process large datasets with billions of ratings in linear time with respect to the number of known ratings. Experiments on movie rating datasets show that SODL achieves similar prediction accuracy as the fastest existing solver but with a speed up of up to 6.8 times on the largest Netflix dataset tested.
TextCNN is a deep learning model for text classification that uses an embedding layer, 1D convolution filters, and max pooling. It was applied to moderate user generated content (UGC) like reviews and Q&A for an e-commerce company. The new TextCNN models improved over old models, achieving higher ROC scores and allowing more UGC to be auto-accepted while maintaining precision. For reviews, the new model could auto-accept 30% more content while reducing bad approvals. For Q&A, it could auto-accept 377% more content while improving precision by 7%.
The document discusses recommender systems and sequential recommendation problems. It covers several key points:
1) Matrix factorization and collaborative filtering techniques are commonly used to build recommender systems, but have limitations like cold start problems and how to incorporate additional constraints.
2) Sequential recommendation problems can be framed as multi-armed bandit problems, where past recommendations influence future recommendations.
3) Various bandit algorithms like UCB, Thompson sampling, and LinUCB can be applied, but extending guarantees to models like matrix factorization is challenging. Offline evaluation on real-world datasets is important.
Deep learning techniques can be used to learn features from data rather than relying on hand-crafted features. This allows neural networks to be applied to problems in computer vision, natural language processing, and other domains. Transfer learning techniques take advantage of features learned from one task and apply them to another related task, even when limited data is available for the second task. Deploying machine learning models in production requires techniques for serving predictions through scalable APIs and caching layers to meet performance requirements.
We presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images.
To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category,
we leverage a pyramidal model where affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks.
PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations.
Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging a correspondence consistency across image pairs.
Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields.
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
PR-325: Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
paper link: https://arxiv.org/abs/2004.00849
youtube link: https://youtu.be/Kgh88DLHHTo
Generative Adversarial Networks (GANs) are a type of generative model that uses two neural networks - a generator and discriminator - competing against each other. The generator takes noise as input and generates synthetic samples, while the discriminator evaluates samples as real or generated. They are trained together until the generator fools the discriminator. GANs can generate realistic images, do image-to-image translation, and have applications in reinforcement learning. However, training GANs is challenging due to issues like non-convergence and mode collapse.
Embedded based retrieval in modern search ranking systemMarsan Ma
1. The document discusses embedding-based retrieval techniques for search ranking systems. It outlines approaches for training embeddings using triplet loss and hard negative mining, and serving embeddings using approximate nearest neighbor techniques like product quantization.
2. When serving, techniques like product quantization are used to efficiently find the top-K nearest neighbors from large embedding spaces. This improves efficiency over brute force approaches.
3. Major companies like Facebook, Alibaba, and LinkedIn are developing next-generation systems using cross-attention and cost-aware models to further improve ranking performance while maintaining query efficiency.
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
Deep learning has made great strides for problems that can be learned with direct supervision, in which a large dataset with high-quality annotation is provided (e.g., ImageNet). However, collecting such a large dataset is expensive and time-consuming. In this talk, I will discuss our recent works on learning from computer simulation to tackle real-world problems. I will first present our novel Simulated+Unsupervised (S+U) domain adaptation algorithm, which fully leverages the flexibility of data simulators and bidirectional mappings between synthetic and real data. We show that our approach achieves the improved performance on the gaze estimation task, outperforming the prior approach by Shrivastava et al. Next, I will introduce our work on building data-driven vehicle collision prediction algorithms. Today’s vehicle collision prediction algorithms are rule-based and have not benefited from the recent developments in deep learning. This is because it is almost impossible to collect a large amount of collision data from the real world. To address this challenge, we collect a large accident dataset using a popular video game named GTA and train end-to-end classification algorithms which identify dangerous objects from a given image. Furthermore, we devise a novel domain adaptation method with which we can further improve the performance of our algorithm in the real-world.
Machine learning techniques can be applied in formal verification in several ways:
1) To enhance current formal verification tools by automating tasks like debugging, specification mining, and theorem proving.
2) To enable the development of new formal verification tools by applying machine learning to problems like SAT solving, model checking, and property checking.
3) Specific applications include using machine learning for debugging and root cause identification, learning specifications from runtime traces, aiding theorem proving by selecting heuristics, and tuning SAT solver parameters and selection.
Supervised embedding techniques in search ranking systemMarsan Ma
This document discusses using supervised embeddings techniques in search ranking systems. It begins by outlining different types of embedding techniques, including graph-based, content-based, supervised, and unsupervised. It then describes how graph-based embedding models like Item2vec could be implemented in a product pipeline. Finally, it discusses ongoing experiments comparing different embedding models and techniques, showing improvements in metrics like logloss and ROC AUC.
This document provides an overview of generative adversarial networks (GANs). It explains that GANs were introduced in 2014 and involve two neural networks, a generator and discriminator, that compete against each other. The generator produces synthetic data to fool the discriminator, while the discriminator learns to distinguish real from synthetic data. As they train, the generator improves at producing more realistic outputs that match the real data distribution. Examples of GAN applications discussed include image generation, text-to-image synthesis, and face aging.
Faster and cheaper, smart ab experiments - public ver.Marsan Ma
Stratification, CUPED, and covariate adjustment with machine learning can help reduce the sample size needed for A/B tests.
[1] Stratification removes variance among user groups, focusing only on variance within groups.
[2] CUPED uses a user's past metrics as a control variate to reduce variance. The better a user's past aligns with their present, the less data is needed.
[3] Covariate adjustment trains a model on user data. If the model well predicts outcomes, it can reduce remaining variance after accounting for the model, speeding up tests. The better the model predicts, the more variance is reduced.
Lecture slides presented at Northeastern University (December, 2020).
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as webpages, in response to user's need, which may be expressed as a query. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this lecture will be on the fundamentals of neural networks and their applications to learning to rank.
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
Manish Pandey gave a keynote talk on transforming EDA with machine learning and discussed opportunities and challenges. He described how machine learning can be applied across different design abstraction levels from formal verification to silicon engineering. Pandey also discussed using machine learning techniques like reinforcement learning and word embeddings to optimize formal verification, simulation, and mask synthesis. Finally, he outlined challenges with data availability and model development for machine learning in EDA.
Exploiting variable associations to configure efficient local search in large...Shunji Umetani
The document discusses exploiting variable associations to improve the efficiency of local search algorithms for solving large-scale set partitioning problems. It proposes a data mining approach to identify promising neighbor solutions by extracting useful features from problem instances, rather than just the problem formulation. An overview is provided of using this approach to configure efficient local search in large-scale set partitioning problems.
Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as items to be recommended, in response to user's need. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this tutorial will be on the fundamentals of neural networks and their applications to learning to rank.
Tutorial presented at ACM SIGIR/SIGKDD Africa Summer School on Machine Learning for Data Mining and Search (AFIRM 2020) conference in Cape Town, South Africa.
Dictionary Learning for Massive Matrix Factorizationrecsysfr
The document presents a new algorithm called Subsampled Online Dictionary Learning (SODL) for solving very large matrix factorization problems with missing values efficiently. SODL adapts an existing online dictionary learning algorithm to handle missing values by only using the known ratings for each user, allowing it to process large datasets with billions of ratings in linear time with respect to the number of known ratings. Experiments on movie rating datasets show that SODL achieves similar prediction accuracy as the fastest existing solver but with a speed up of up to 6.8 times on the largest Netflix dataset tested.
TextCNN is a deep learning model for text classification that uses an embedding layer, 1D convolution filters, and max pooling. It was applied to moderate user generated content (UGC) like reviews and Q&A for an e-commerce company. The new TextCNN models improved over old models, achieving higher ROC scores and allowing more UGC to be auto-accepted while maintaining precision. For reviews, the new model could auto-accept 30% more content while reducing bad approvals. For Q&A, it could auto-accept 377% more content while improving precision by 7%.
The document discusses recommender systems and sequential recommendation problems. It covers several key points:
1) Matrix factorization and collaborative filtering techniques are commonly used to build recommender systems, but have limitations like cold start problems and how to incorporate additional constraints.
2) Sequential recommendation problems can be framed as multi-armed bandit problems, where past recommendations influence future recommendations.
3) Various bandit algorithms like UCB, Thompson sampling, and LinUCB can be applied, but extending guarantees to models like matrix factorization is challenging. Offline evaluation on real-world datasets is important.
Deep learning techniques can be used to learn features from data rather than relying on hand-crafted features. This allows neural networks to be applied to problems in computer vision, natural language processing, and other domains. Transfer learning techniques take advantage of features learned from one task and apply them to another related task, even when limited data is available for the second task. Deploying machine learning models in production requires techniques for serving predictions through scalable APIs and caching layers to meet performance requirements.
We presents a deep architecture for dense semantic correspondence, called pyramidal affine regression networks (PARN), that estimates locally-varying affine transformation fields across images.
To deal with intra-class appearance and shape variations that commonly exist among different instances within the same object category,
we leverage a pyramidal model where affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed within deep networks.
PARN estimates residual affine transformations at each level and composes them to estimate final affine transformations.
Furthermore, to overcome the limitations of insufficient training data for semantic correspondence, we propose a novel weakly-supervised training scheme that generates progressive supervisions by leveraging a correspondence consistency across image pairs.
Our method is fully learnable in an end-to-end manner and does not require quantizing infinite continuous affine transformation fields.
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
PR-325: Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
paper link: https://arxiv.org/abs/2004.00849
youtube link: https://youtu.be/Kgh88DLHHTo
Deep Convolutional GANs - meaning of latent spaceHansol Kang
DCGAN은 GAN에 단순히 conv net을 적용했을 뿐만 아니라, latent space에서도 의미를 찾음.
DCGAN 논문 리뷰 및 PyTorch 기반의 구현.
VAE 세미나 이슈 사항에 대한 리뷰.
my github : https://github.com/messy-snail/GAN_PyTorch
[참고]
https://github.com/znxlwm/pytorch-MNIST-CelebA-GAN-DCGAN
https://github.com/taeoh-kim/Pytorch_DCGAN
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
[RSS2023] Local Object Crop Collision Network for Efficient SimulationDongwonSon1
The document proposes a Local Object Crop Collision Network (LOCC) to efficiently simulate contact between non-convex objects in GPU-based simulators. LOCC uses a neural network to detect collisions by encoding only local crops where collisions occur, leveraging the observation that locally, shapes are similar regardless of global differences. This allows for constant online computation time compared to traditional convex decomposition methods. The LOCC model combined with the Brax physics engine (LOCC-Brax) was shown to be 10x faster than Isaac Gym for simulating 30,000 environments and achieved over 96% accuracy on various test objects, demonstrating improved efficiency and generalization over traditional methods.
A novel deep-learning enabled, metric learning based recommendation model that significantly improves the state-of-the-art recommendation accuracy for a large number of recommendation tasks (news, books, photography, music, etc.)
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
This document summarizes an incremental collaborative filtering approach via evolutionary co-clustering. It introduces incremental collaborative filtering and discusses existing approaches. It then proposes an incremental evolutionary co-clustering method that assigns new users and items to clusters during the online phase to make more accurate predictions. The method uses an ensemble of co-clustering solutions and an evolutionary algorithm to improve performance. Experimental results on a movie rating dataset show the proposed approach achieves better accuracy than other incremental collaborative filtering methods.
The document summarizes how transformers can be applied to different domains beyond natural language processing (NLP), including vision, graphs, and others. It discusses how vision transformers (ViT) apply the transformer model to images by defining image patches as tokens. For graphs, graph transformers (GT) aim to encode global graph structure using positional encodings based on graph Laplacian eigenvectors. While transformers have been highly successful for NLP, properly adapting them to other domains requires injecting appropriate inductive biases. Vision transformers have shown promise but graph transformers still need more justification. Overall, transformers benefit NLP the most, followed by vision and graphs.
Transformer Architectures in Vision
[2018 ICML] Image Transformer
[2019 CVPR] Video Action Transformer Network
[2020 ECCV] End-to-End Object Detection with Transformers
[2021 ICLR] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2007 03-16 modeling and static analysis of complex biological systems dsrDebora Da Rosa
This document discusses modeling complex biological systems using static analysis and the Star Ambients formal language. It begins by providing context on biological modeling challenges in the 21st century due to vast amounts of data. It then introduces static analysis as an alternative to dynamic analysis to address state space explosion issues. The document outlines the Star Ambients language, static analysis methodology using the Succinct Solver, and applications in systems biology modeling pathways and interactions. It proposes a model-driven engineering approach using Star Ambients and automatic translation from UML diagrams. The work differs from prior work by providing a new programming paradigm and mechanisms to obtain precise analysis results with low computational cost.
Brains rely on spiking neural networks for ultra-low-power information processing. Building artificial intelligence with similar efficiency requires learning algorithms to instantiate complex spiking neural networks and brain-inspired neuromorphic hardware to emulate them efficiently. Toward this end, I will briefly introduce surrogate gradients as a general framework for training spiking neural networks and showcase their robustness and self-calibration capabilities on analog neuromorphic hardware. Drawing further inspiration from biology, I will discuss the impact of homeostatic plasticity and network initialization in the excitatory-inhibitory balanced regime on deep spiking neural network training. Finally, I will show how approximations relate surrogate gradients to biologically plausible online learning rules with a minor impact on their effectiveness.
Semi-supervised concept detection by learning the structure of similarity graphsSymeon Papadopoulos
This document proposes a semi-supervised concept detection approach based on graph structure features (GSF) extracted from image similarity graphs. GSF represents images as vectors based on eigenvectors of the graph Laplacian. Two incremental learning schemes are developed to address computational issues. Experiments on synthetic and MIR-Flickr datasets show the approach achieves performance comparable or better than state-of-the-art methods, and benefits from adding unlabeled data. The approach provides an efficient and scalable solution for concept detection in large multimedia collections.
This document summarizes Deep Q-Networks (DQN), a deep reinforcement learning algorithm that was able to achieve human-level performance on many Atari 2600 games. The key ideas of DQN include using a deep neural network to approximate the Q-function, experience replay to increase data efficiency, and a separate target network to stabilize learning. DQN has inspired many follow up algorithms, including double DQN, dueling DQN, prioritized experience replay, and noisy networks for better exploration. DQN was able to learn human-level policies directly from pixels and rewards for many Atari games using the same hyperparameters and network architecture.
A Benchmark for Simulated ManipulationJack Collins
This document proposes a benchmark for validating simulations of robotic manipulation against real-world experiments. The benchmark would include protocols for setting up real and simulated experiments in a standardized way and collecting state data, metrics for comparing the two, and example manipulation tasks that could expose flaws in a simulator. The benchmark is intended to be useful for the manipulation research community and developers of simulation engines and physics models by providing a standardized way to evaluate how well simulations match reality.
Query dependent ranking using k nearest neighboriyo
The document summarizes a paper on query dependent ranking using k-nearest neighbor algorithms. It introduces the motivation for query dependent ranking rather than a single ranking model. It then outlines the k-nearest neighbor algorithm, experiments comparing it to baselines on two datasets, and conclusions on improving efficiency and exploring other features and metrics.
This document discusses using machine learning to objectively assess quality of experience (QoE). It begins with a brief introduction to machine learning and outlines the steps to set up an ML-based objective metric: defining the feature space, selecting an ML paradigm, and robust model selection and testing. It then provides an example of using features related to image structure and color to select an algorithm for image restoration. The document concludes with a SWOT analysis of using machine learning for objective QoE assessment.
do adversarially robust image net models transfer betterLEE HOSEONG
The document discusses an experiment comparing the transfer learning performance of standard ImageNet models versus adversarially robust ImageNet models. The experiment finds that robust models consistently match or outperform standard models on a variety of downstream transfer learning tasks, despite having lower accuracy on ImageNet. Further analysis shows robust models improve with increased width and that the optimal level of robustness depends on properties of the downstream task like dataset granularity. Overall, the findings suggest adversarially robust models transfer learned representations better than standard models.
The document introduces various computer vision topics including convolutional neural networks, popular CNN architectures, data augmentation, transfer learning, object detection, neural style transfer, generative adversarial networks, and variational autoencoders. It provides overviews of each topic and discusses concepts such as how convolutions work, common CNN architectures like ResNet and VGG, why data augmentation is important, how transfer learning can utilize pre-trained models, how object detection algorithms like YOLO work, the content and style losses used in neural style transfer, how GANs use generators and discriminators, and how VAEs describe images with probability distributions. The document aims to discuss these topics at a practical level and provide insights through examples.
Similar to Recommender Systems Tutorial (Part 2) -- Offline Components (20)
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
2. Problem
Algorithm selects Item j with item features xj
(keywords, content categories, ...)
User i visits (i, j) : response yij
with (explicit rating, implicit click/no-click)
user features xi
(demographics,
browse history,
Predict the unobserved entries based on
search history, …) features and the observed entries
Deepak Agarwal & Bee-Chung Chen @ ICML’11 2
3. Model Choices
• Feature-based (or content-based) approach
– Use features to predict response
• (regression, Bayes Net, mixture models, …)
– Limitation: need predictive features
• Bias often high, does not capture signals at granular levels
• Collaborative filtering (CF aka Memory based)
– Make recommendation based on past user-item interaction
• User-user, item-item, matrix factorization, …
• See [Adomavicius & Tuzhilin, TKDE, 2005], [Konstan, SIGMOD’08
Tutorial], etc.
– Better performance for old users and old items
– Does not naturally handle new users and new items (cold-start)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 3
4. Collaborative Filtering (Memory based methods)
User-User Similarity
Item-Item similarities, incorporating both
Estimating Similarities
• Pearson’s correlation
• Optimization based (Koren et al)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 4
5. How to Deal with the Cold-Start Problem
• Heuristic-based approaches
– Linear combination of regression and CF models
– Filterbot
• Add user features as psuedo users and do collaborative filtering
- Hybrid approaches
- Use content based to fill up entries, then use CF
• Matrix Factorization
– Good performance on Netflix (Koren, 2009)
• Model-based approaches
– Bilinear random-effects model (probabilistic matrix factorization)
• Good on Netflix data [Ruslan et al ICML, 2009]
– Add feature-based regression to matrix factorization
• (Agarwal and Chen, 2009)
– Add topic discovery (from textual items) to matrix factorization
• (Agarwal and Chen, 2009; Chun and Blei, 2011)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 5
6. Per-item regression models
• When tracking users by cookies, distribution of visit patters
could get extremely skewed
– Majority of cookies have 1-2 visits
• Per item models (regression) based on user covariates
attractive in such cases
Deepak Agarwal & Bee-Chung Chen @ ICML’11 6
7. Several per-item regressions: Multi-task learning
• Agarwal,Chen and Elango, KDD, 2010
•Low dimension
•(5-10),
•B estimated
•retrospective data
Affinity to
old items
Deepak Agarwal & Bee-Chung Chen @ ICML’11 7
9. Motivation
• Data measuring k-way interactions pervasive
– Consider k = 2 for all our discussions
• E.g. User-Movie, User-content, User-Publisher-Ads,….
– Power law on both user and item degrees
• Classical Techniques
– Approximate matrix through a singular value decomposition (SVD)
• After adjusting for marginal effects (user pop, movie pop,..)
– Does not work
• Matrix highly incomplete, severe over-fitting
– Key issue
• Regularization of eigenvectors (factors) to avoid overfitting
Deepak Agarwal & Bee-Chung Chen @ ICML’11 9
10. Early work on complete matrices
• Tukey’s 1-df model (1956)
– Rank 1 approximation of small nearly complete matrix
• Criss-cross regression (Gabriel, 1978)
• Incomplete matrices: Psychometrics (1-factor model only;
small data sets; 1960s)
• Modern day recommender problems
– Highly incomplete, large, noisy.
Deepak Agarwal & Bee-Chung Chen @ ICML’11 10
11. Latent Factor Models
•u •s
p
o •v
r
t •Affinity = u’v
•“newsy”
y
•item
•“sporty” •s
•z
•“newsy” •Affinity = s’z
Deepak Agarwal & Bee-Chung Chen @ ICML’11 11
12. Factorization – Brief Overview
• Latent user factors: • Latent movie factors:
(αi , ui=(ui1,…,uin)) (βj , vj=(v j1,….,v jn))
Interaction
E ( yij ) = µ + α i + β j + ui′Bv j
• (Nn + Mm) will overfit for moderate
values of n,m
parameters
Regularization
• Key technical issue:
Deepak Agarwal & Bee-Chung Chen @ ICML’11 12
13. Latent Factor Models: Different Aspects
• Matrix Factorization
– Factors in Euclidean space
– Factors on the simplex
• Incorporating features and ratings simultaneously
• Online updates
Deepak Agarwal & Bee-Chung Chen @ ICML’11 13
14. Maximum Margin Matrix Factorization (MMMF)
• Complete matrix by minimizing loss (hinge,squared-error)
on observed entries subject to constraints on trace norm
– Srebro, Rennie, Jakkola (NIPS 2004)
• Convex, Semi-definite programming (expensive, not scalable)
• Fast MMMF (Rennie & Srebro, ICML, 2005)
– Constrain the Frobenious norm of left and right eigenvector
matrices, not convex but becomes scalable.
• Other variation: Ensemble MMMF (DeCoste, ICML2005)
– Ensembles of partially trained MMMF (some improvements)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 14
15. Matrix Factorization for Netflix prize data
• Minimize the objective function
∑ (r − ui v j ) + λ ( ∑ ui + ∑ v j )
T 2 2 2
ij
ij∈obs i j
• Simon Funk: Stochastic Gradient Descent
• Koren et al (KDD 2007): Alternate Least Squares
– They move to SGD later in the competition
Deepak Agarwal & Bee-Chung Chen @ ICML’11 15
16. Probabilistic Matrix Factorization
(Ruslan & Minh, 2008, NIPS)
au av •Optimization is through Gradient
Descent (Iterated conditional modes)
ui vj
•Other variations like constraining the
mean through sigmoid, using “who-rated-
whom”
•Combining with Boltzmann Machines
also improved performance
rij
rij ~ N (uT v j , σ 2 )
i
u i ~ MVN (0, au I )
σ 2
v j ~ MVN (0, av I )
Deepak Agarwal & Bee-Chung Chen @ ICML’11 16
17. Bayesian Probabilistic Matrix Factorization
(Ruslan and Minh, ICML 2008)
• Fully Bayesian treatment using an MCMC approach
• Significant improvement
• Interpretation as a fully Bayesian hierarchical model
shows why that is the case
– Failing to incorporate uncertainty leads to bias in estimates
– Multi-modal posterior, MCMC helps in converging to a better one
MCEM also more resistant to over-fitting
•r
•Var-comp: au
Deepak Agarwal & Bee-Chung Chen @ ICML’11 17
18. Non-parametric Bayesian matrix completion
(Zhou et al, SAM, 2010)
• Specify rank probabilistically (automatic rank selection)
r
yij ~ N (∑ z k uik v jk , σ ) 2
k =1
z k ~ Ber (π k )
π k ~ Beta(a / r , b(r − 1) / r )
z k ~ Ber (1, a /( a + b( r −1)))
E ( # Factors) = ra /( a + b( r −1))
Deepak Agarwal & Bee-Chung Chen @ ICML’11 18
19. How to incorporate features:
Deal with both warm start and cold-start
• Models to predict ratings for new pairs
– Warm-start: (user, movie) present in the training data with
large sample size
– Cold-start: At least one of (user, movie) new or has small
sample size
• Rough definition, warm-start/cold-start is a continuum.
• Challenges
– Highly incomplete (user, movie) matrix
– Heavy tailed degree distributions for users/movies
• Large fraction of ratings from small fraction of users/movies
– Handling both warm-start and cold-start effectively in the
presence of predictive features
Deepak Agarwal & Bee-Chung Chen @ ICML’11 19
20. Possible approaches
• Large scale regression based on covariates
– Does not provide good estimates for heavy users/movies
– Large number of predictors to estimate interactions
• Collaborative filtering
– Neighborhood based
– Factorization
• Good for warm-start; cold-start dealt with separately
• Single model that handles cold-start and warm-start
– Heavy users/movies → User/movie specific model
– Light users/movies → fallback on regression model
– Smooth fallback mechanism for good performance
Deepak Agarwal & Bee-Chung Chen @ ICML’11 20
22. Regression-based Factorization Model (RLFM)
• Main idea: Flexible prior, predict factors through
regressions
• Seamlessly handles cold-start and warm-start
• Modified state equation to incorporate covariates
Deepak Agarwal & Bee-Chung Chen @ ICML’11 22
23. RLFM: Model
• Rating: yij ~ N ( µij , σ 2 ) Gaussian Model
user i yij ~ Bernoulli ( µij ) Logistic Model (for binary rating)
gives yij ~ Poisson( µij N ij ) Poisson Model (for counts)
item j
t ( µij ) = xij b + α i + β j+ui t v j
t
• Bias of user i: α i = g 0 xi + ε iα , ε iα ~ N (0, σ α )
t 2
• Popularity of item j: β j = d 0 x j + ε β , ε β ~ N (0, σ β )
t
j j
2
• Factors of user i: ui = Gxi + ε iu , ε iu ~ N (0,σ u I )
2
• Factors of item j: vi = Dx j + ε iv , ε iv ~ N (0,σ v I )
2
Could use other classes of regression models
Deepak Agarwal & Bee-Chung Chen @ ICML’11 23
25. Advantages of RLFM
• Better regularization of factors
– Covariates “shrink” towards a better centroid
• Cold-start: Fallback regression model (FeatureOnly)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 25
26. RLFM: Illustration of Shrinkage
Plot the first factor
value for each user
(fitted using Yahoo! FP
data)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 26
27. Induced correlations among observations
Hierarchical random-effects model
Marginal distribution obtained by
integrating out random effects
Deepak Agarwal & Bee-Chung Chen @ ICML’11 27
29. Model fitting: EM for our class of models
Deepak Agarwal & Bee-Chung Chen @ ICML’11 29
30. The parameters for RLFM
• Latent parameters
Δ = ({αi }, {β j }, {ui }, {v j })
• Hyper-parameters
Θ = (b, G, D, A u = a u I, A v = a v I)
Deepak Agarwal & Bee-Chung Chen @ ICML’11 30
33. Computing the E-step
• Often hard to compute in closed form
• Stochastic EM (Markov Chain EM; MCEM)
– Compute expectation by drawing samples from
– Effective for multi-modal posteriors but more expensive
• Iterated Conditional Modes algorithm (ICM)
– Faster but biased hyper-parameter estimates
Deepak Agarwal & Bee-Chung Chen @ ICML’11 33
34. Monte Carlo E-step
• Through a vanilla Gibbs sampler (conditionals closed form)
• Other conditionals also Gaussian and closed form
• Conditionals of users (movies) sampled simultaneously
• Small number of samples in early iterations, large numbers
in later iterations
Deepak Agarwal & Bee-Chung Chen @ ICML’11 34
35. M-step (Why MCEM is better than ICM)
• Update G, optimize
• Update Au=au I
Ignored by ICM, underestimates factor variability
Factors over-shrunk, posterior not explored well
Deepak Agarwal & Bee-Chung Chen @ ICML’11 35
37. Experiment 2: Better handling of Cold-start
• MovieLens-1M; EachMovie
• Training-test split based on timestamp
• Same covariates as in Experiment 1.
Deepak Agarwal & Bee-Chung Chen @ ICML’11 37
38. Experiment 4: Predicting click-rate on articles
• Goal: Predict click-rate on articles for a user on F1 position
• Article lifetimes short, dynamic updates important
• User covariates:
– Age, Gender, Geo, Browse behavior
• Article covariates
– Content Category, keywords
• 2M ratings, 30K users, 4.5 K articles
Deepak Agarwal & Bee-Chung Chen @ ICML’11 38
39. Results on Y! FP data
Deepak Agarwal & Bee-Chung Chen @ ICML’11 39
40. Some other related approaches
• Stern, Herbrich and Graepel, WWW, 2009
– Similar to RLFM, different parametrization and expectation
propagation used to fit the models
• Porteus, Asuncion and Welling, AAAI, 2011
– Non-parametric approach using a Dirichlet process
• Agarwal, Zhang and Mazumdar, Annals of Applied
Statistics, 2011
– Regression + random effects per user regularized through a
Graphical Lasso
Deepak Agarwal & Bee-Chung Chen @ ICML’11 40
41. Add Topic Discovery into
Matrix Factorization
fLDA: Matrix Factorization through Latent Dirichlet Allocation
42. fLDA: Introduction
• Model the rating yij that user i gives to item j as the user’s
affinity to the topics that the item has
User i ’s affinity to topic k
yij = ... + ∑k sik z jk
Pr(item j has topic k) estimated by averaging
the LDA topic of each word in item j
Old items: zjk’s are Item latent factors learnt from data with the LDA prior
New items: zjk’s are predicted based on the bag of words in the items
– Unlike regular unsupervised LDA topic modeling, here the LDA
topics are learnt in a supervised manner based on past rating data
– fLDA can be thought of as a “multi-task learning” version of the
supervised LDA model [Blei’07] for cold-start recommendation
Deepak Agarwal & Bee-Chung Chen @ ICML’11 42
43. LDA Topic Modeling (1)
• LDA is effective for unsupervised topic discovery [Blei’03]
– It models the generating process of a corpus of items (articles)
– For each topic k, draw a word distribution Φk = [Φk1, …, ΦkW] ~ Dir(η)
– For each item j, draw a topic distribution θj = [θj1, …, θjK] ~ Dir(λ)
– For each word, say the nth word, in item j,
• Draw a topic zjn for that word from θj = [θj1, …, θjK]
• Draw a word wjn from Φk = [Φk1, …, ΦkW] with topic k = zjn
Item j
Topic distribution: [θj1, …, θjK] Φ11, …, Φ1W Topic 1
Assume zjn = topic k
…
Per-word topic: zj1, …, zjn, … Φk1, …, ΦkW Topic k
…
ΦK1, …, ΦKW Topic K
Words: wj1, …, wjn, …
Observed
Deepak Agarwal & Bee-Chung Chen @ ICML’11 43
44. LDA Topic Modeling (2)
• Model training:
– Estimate the prior parameters and the posterior topic×word
distribution Φ based on a training corpus of items
– EM + Gibbs sampling is a popular method
• Inference for new items
– Compute the item topic distribution based on the prior parameters
and Φ estimated in the training phase
• Supervised LDA [Blei’07]
– Predict a target value for each item based on supervised LDA
topics
Regression weight for topic k One regression per user
yj = ∑ k
sk z jk vs. yij = ... + ∑k sik z jk
Same set of topics across different regressions
Target value of item j
Pr(item j has topic k) estimated by averaging
the topic of each word in item j
Deepak Agarwal & Bee-Chung Chen @ ICML’11 44
45. fLDA: Model
• Rating: yij ~ N ( µij , σ 2 ) Gaussian Model
user i yij ~ Bernoulli ( µij ) Logistic Model (for binary rating)
gives yij ~ Poisson( µij N ij ) Poisson Model (for counts)
item j
t ( µij ) = xij b + α i + β j + ∑ k sik z jk
t
• Bias of user i: α i = g 0 xi + ε iα , ε iα ~ N (0, σ α )
t 2
• Popularity of item j: β j = d 0 x j + ε β , ε β ~ N (0, σ β )
t
j j
2
• Topic affinity of user i: si = Hxi + ε is , ε is ~ N (0, σ s2 I )
• Pr(item j has topic k): z jk = ∑ n 1( z jn = k ) /( # words in item j )
The LDA topic of the nth word in item j
• Observed words: w jn ~ LDA(λ ,η , z jn )
The nth word in item j
Deepak Agarwal & Bee-Chung Chen @ ICML’11 45
46. Model Fitting
• Given:
– Features X = {xi, xj, xij}
– Observed ratings y = {yij} and words w = {wjn}
• Estimate:
– Parameters: Θ = [b, g0, d0, H, σ2, aα, aβ, As, λ, η]
• Regression weights and prior parameters
– Latent factors: ∆ = {αi, βj, si} and z = {zjn}
• User factors, item factors and per-word topic assignment
• Empirical Bayes approach:
– Maximum likelihood estimate of the parameters:
ˆ
∫
Θ = arg max Pr[ y, w | Θ] = arg max Pr[ y, w, ∆, z | Θ] d∆dz
Θ Θ
– The posterior distribution of the factors:
ˆ
Pr[∆, z | y, Θ]
Deepak Agarwal & Bee-Chung Chen @ ICML’11 46
47. The EM Algorithm
• Iterate through the E and M steps until convergence
ˆ
– Let Θ( n ) be the current estimate
– E-step: Compute
f ( Θ) = E ˆ [log Pr( y , w, ∆, z
( ∆ , z | y , w, Θ n )
| Θ)]
• The expectation is not in closed form
• We draw Gibbs samples and compute the Monte Carlo mean
ˆ
Θ( n +1) = arg max f (Θ)
– M-step: Find
Θ
• It consists of solving a number of regression and optimization
problems
Deepak Agarwal & Bee-Chung Chen @ ICML’11 47
48. Supervised Topic Assignment
The topic of the nth word in item j
Probability of observing yij
Pr( z jn = k | Rest ) given the model
Z kl jn + η
¬
∝ ¬jn
Z k + Wη
jk ( )
Z ¬jn + λ ⋅ ∏i rated j f ( yij | z jn = k )
Same as unsupervised LDA Likelihood of observed ratings
by users who rated item j when
zjn is set to topic k
Deepak Agarwal & Bee-Chung Chen @ ICML’11 48
49. fLDA: Experimental Results (Movie)
• Task: Predict the rating that a user would give a movie
• Training/test split:
Model Test RMSE
– Sort observations by time RLFM 0.9363
– First 75% → Training data fLDA 0.9381
Factor-Only 0.9422
– Last 25% → Test data FilterBot 0.9517
unsup-LDA 0.9520
• Item warm-start scenario MostPopular 0.9726
– Only 2% new items in test data Feature-Only 1.0906
Constant 1.1190
fLDA is as strong as the best method
It does not reduce the performance in warm-start scenarios
Deepak Agarwal & Bee-Chung Chen @ ICML’11 49
50. fLDA: Experimental Results (Yahoo! Buzz)
• Task: Predict whether a user would buzz-up an article
• Severe item cold-start
– All items are new in test data
fLDA significantly
outperforms other
models
Data Statistics
1.2M observations
4K users
10K articles
Deepak Agarwal & Bee-Chung Chen @ ICML’11 50
51. Experimental Results: Buzzing Topics
3/4 topics are interpretable; 1/2 are similar to unsupervised topics
Top Terms (after stemming) Topic
bush, tortur, interrog, terror, administr, CIA, offici, CIA interrogation
suspect, releas, investig, georg, memo, al
mexico, flu, pirat, swine, drug, ship, somali, border, Swine flu
mexican, hostag, offici, somalia, captain
NFL, player, team, suleman, game, nadya, star, high, NFL games
octuplet, nadya_suleman, michael, week
court, gai, marriag, suprem, right, judg, rule, sex, pope, Gay marriage
supreme_court, appeal, ban, legal, allow
palin, republican, parti, obama, limbaugh, sarah, rush, Sarah Palin
gop, presid, sarah_palin, sai, gov, alaska
idol, american, night, star, look, michel, win, dress, American idol
susan, danc, judg, boyl, michelle_obama
economi, recess, job, percent, econom, bank, expect, Recession
rate, jobless, year, unemploy, month
north, korea, china, north_korea, launch, nuclear, North Korea issues
rocket, missil, south, said, russia
Deepak Agarwal & Bee-Chung Chen @ ICML’11 51
52. fLDA Summary
• fLDA is a useful model for cold-start item recommendation
• It also provides interpretable recommendations for users
– User’s preference to interpretable LDA topics
• Future directions:
– Investigate Gibbs sampling chains and the convergence properties of the
EM algorithm
– Apply fLDA to other multi-task prediction problems
• fLDA can be used as a tool to generate supervised features
(topics) from text data
Deepak Agarwal & Bee-Chung Chen @ ICML’11 52
53. Summary
• Regularizing factors through covariates effective
• Regression based factor model that regularizes better and
deals with both cold-start and warm-start in a single
framework in a seamless way looks attractive
• Fitting method scalable; Gibbs sampling for users and
movies can be done in parallel. Regressions in M-step can
be done with any off-the-shelf scalable linear regression
routine
• Distributed computing on Hadoop: Multiple models and
average across partitions
Deepak Agarwal & Bee-Chung Chen @ ICML’11 53
54. Hierarchical smoothing
Advertising application, Agarwal et al. KDD 10
• Product of states for each node pair
Advertiser
pubclass
Conv-id
campaign
Pub-id
z Ad-id
• Spike and Slab prior ( Sz, Ez, λz)
– Known to encourage parsimonious solutions
• Several cell states have no corrections
– Not used before for multi-hierarchy models, only in regression
– We choose P = .5 (and choose “a” by cross-validation)
• a – psuedo number of successes
Deepak Agarwal & Bee-Chung Chen @ ICML’11 54
Editor's Notes
During time interval t User visits to the webpage Some item inventory at our disposal Decision problem: for each visit, select an item to display each display generates a response Goal: choose items to maximize expected clicks