This presentation is about why initialization of matrix factorization methods is important and proposes an interesting initialization method (coined SimFactor). The method revolves around a similarity preserving dimensionality reduction technique. Context-based initialization is introduced as well.
As most of my recommender systems related research, this presentation focuses on implicit feedback (the case where user preferences are not coded explicitely in the data).
Originally presented at the 2nd workshop on Context-awareness in Retrieval and Recommendations (CaRR 2012) in Lisbon.
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Domonkos Tikk
Executive summary: The paper propose a new method to solve the cold start problem for matrix factorization using metadata. The method works with the realistic implicit feedback scenario. With a smart initialization of the feature matrices better performance values were achieved on several data sets.
Paper abstract: The implicit feedback based recommendation problem—
when only the user history is available but there are no ratings—is a much harder task than the explicit feedback based recommendation problem, due to the inherent uncertainty of the interpretation of such user feedbacks. Still, this practically important recommendation task received less attention and therefore there are only a few common implicit feedback based algorithms and benchmark datasets. This paper focuses on a common matrix factorization method for the implicit problem and investigates if recommendation performance can be improved by appropriate initialization of the feature vectors before training. We present a general initialization framework that preserves the similarity between entities (users/items) when creating the initial feature vectors,
where similarity is defined using e.g. context or metadata information. We demonstrate how the proposed initialization framework can be coupled with MF algorithms. The efficiency of the initialization is evaluated using various context and metadata based similarity concepts on two implicit variants of the MovieLens 10M
dataset and one real life implicit database. It is shown that performance gain can attain 10% improvement in recall@50 and in AUC@50.
The document describes methods for improving matrix factorization approaches for implicit feedback databases through better initialization techniques. It proposes two initialization approaches: Naive, which compresses metadata about items/users into feature vectors, and SimFactor, which aims to better preserve similarity information between entities. An experiment on a grocery shopping dataset found SimFactor using user context state data improved recall by up to 6% over random initialization, and another experiment "implicitizing" a movie rating dataset found SimFactor using item context state data improved recall by up to 10%. The results suggest context data can better separate entities for initialization than other metadata.
This document summarizes a research paper about detecting concurrency errors using conflict graphs and the concept of atomic sets. It introduces the motivation to find concurrency errors, describes the approach of using atomic sets and serializability, outlines the implementation using conflict graphs, and summarizes the results and evaluation of the technique on benchmarks. The technique was able to efficiently find bugs with few false positives or negatives compared to prior work.
This document summarizes Google App Engine's datastore and data modeling features. The datastore provides scalable storage and querying. It uses models defined by subclasses of Model. Properties on models define data types and are instances of the Property class. Queries use the Query class to retrieve and filter models. Keys uniquely identify entities and are represented as instances of the Key class.
This document contains lecture notes from a Calculus I class covering Section 5.3 on evaluating definite integrals. The notes discuss using the Evaluation Theorem to calculate definite integrals, writing derivatives as indefinite integrals, and interpreting definite integrals as the net change of a function over an interval. Examples are provided to demonstrate evaluating definite integrals using the midpoint rule approximation. Properties of integrals such as additivity and the relationship between definite and indefinite integrals are also outlined.
Context aware factorization methods for implicit feedback based recommendatio...Balázs Hidasi
Slides I prepared for defending my PhD dissertation on context-aware factorization methods for implicit-feedback based recommendations. Dissertation (in English) can be accessed here: http://hidasi.eu/content/phd.pdf Slides are in Hungarian.
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
Slides for my RecSys 2016 talk on integrating image and textual information into session based recommendations using novel parallel RNN architectures.
Link to the paper: http://www.hidasi.eu/en/publications.html#p_rnn_recsys16
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
I gave this talk at the 1st Budapest RecSys and Personalization Meetup about using deep learning to solve long standing problems of recommender systems. I also presented our approach on using RNNs for session-based recommendations in details.
Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initializat...Domonkos Tikk
Executive summary: The paper propose a new method to solve the cold start problem for matrix factorization using metadata. The method works with the realistic implicit feedback scenario. With a smart initialization of the feature matrices better performance values were achieved on several data sets.
Paper abstract: The implicit feedback based recommendation problem—
when only the user history is available but there are no ratings—is a much harder task than the explicit feedback based recommendation problem, due to the inherent uncertainty of the interpretation of such user feedbacks. Still, this practically important recommendation task received less attention and therefore there are only a few common implicit feedback based algorithms and benchmark datasets. This paper focuses on a common matrix factorization method for the implicit problem and investigates if recommendation performance can be improved by appropriate initialization of the feature vectors before training. We present a general initialization framework that preserves the similarity between entities (users/items) when creating the initial feature vectors,
where similarity is defined using e.g. context or metadata information. We demonstrate how the proposed initialization framework can be coupled with MF algorithms. The efficiency of the initialization is evaluated using various context and metadata based similarity concepts on two implicit variants of the MovieLens 10M
dataset and one real life implicit database. It is shown that performance gain can attain 10% improvement in recall@50 and in AUC@50.
The document describes methods for improving matrix factorization approaches for implicit feedback databases through better initialization techniques. It proposes two initialization approaches: Naive, which compresses metadata about items/users into feature vectors, and SimFactor, which aims to better preserve similarity information between entities. An experiment on a grocery shopping dataset found SimFactor using user context state data improved recall by up to 6% over random initialization, and another experiment "implicitizing" a movie rating dataset found SimFactor using item context state data improved recall by up to 10%. The results suggest context data can better separate entities for initialization than other metadata.
This document summarizes a research paper about detecting concurrency errors using conflict graphs and the concept of atomic sets. It introduces the motivation to find concurrency errors, describes the approach of using atomic sets and serializability, outlines the implementation using conflict graphs, and summarizes the results and evaluation of the technique on benchmarks. The technique was able to efficiently find bugs with few false positives or negatives compared to prior work.
This document summarizes Google App Engine's datastore and data modeling features. The datastore provides scalable storage and querying. It uses models defined by subclasses of Model. Properties on models define data types and are instances of the Property class. Queries use the Query class to retrieve and filter models. Keys uniquely identify entities and are represented as instances of the Key class.
This document contains lecture notes from a Calculus I class covering Section 5.3 on evaluating definite integrals. The notes discuss using the Evaluation Theorem to calculate definite integrals, writing derivatives as indefinite integrals, and interpreting definite integrals as the net change of a function over an interval. Examples are provided to demonstrate evaluating definite integrals using the midpoint rule approximation. Properties of integrals such as additivity and the relationship between definite and indefinite integrals are also outlined.
Context aware factorization methods for implicit feedback based recommendatio...Balázs Hidasi
Slides I prepared for defending my PhD dissertation on context-aware factorization methods for implicit-feedback based recommendations. Dissertation (in English) can be accessed here: http://hidasi.eu/content/phd.pdf Slides are in Hungarian.
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
Slides for my RecSys 2016 talk on integrating image and textual information into session based recommendations using novel parallel RNN architectures.
Link to the paper: http://www.hidasi.eu/en/publications.html#p_rnn_recsys16
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
I gave this talk at the 1st Budapest RecSys and Personalization Meetup about using deep learning to solve long standing problems of recommender systems. I also presented our approach on using RNNs for session-based recommendations in details.
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelBalázs Hidasi
UPDATE: Typo on the 8th slide, last line should be (slides can't be modified on slideshare):
grad(log(p_gamma(x|y))) = (1-gamma)*grad(log(p(x))) + gamma*grad(log(p(x|y)))
My presentation on using generative AI for creative generation for e-commerce. Presented on 14 November 2023 at the TECH meetup series organized by Gravity R&D, a Taboola company. Slides are in Hungarian.
*****
Title/abstract in English:
Mass production of unique product creatives with generative AI
-----
The probability of a user clicking on an online advertisement is greatly influenced by creative's look. Traditional brand level campaigns require only a few creatives that can be produced by humans. However product level recommendations require creatives for every single product. Producing these using human work is infeasible at scale, thus they are often shown in front of simple (e.g. white) backgrounds. This presentation showcases a solution based on generative AI that allows placing products in different environments, which makes the creatives more appealing. I'll talk about the challenges of this approach along with potential solutions, as well as the initial results of our live test.
*****
Eredeti absztrakt:
Az online hirdetések megjelenése nagyban befolyásolja a rákattintás valószínűségét. A tradicionális márka szinten targetált kampányokhoz szükséges egy-két kreatív/banner legyártása még emberi erőforrás igénybevételével is megoldható. Termék szintű ajánlás esetén viszont minden egyes termékhez külön kreatívra van szükség, akár több felbontásban. Nagyszámú kreatív legyártása emberi erővel lassú és drága, ezért gyakori megközelítés a terméket valamilyen egyszerű, például egyszínű, háttér előtt megjeleníteni. Az előadás során bemutatunk egy generatív AI technológián alapuló megoldást, ami lehetővé teszi, hogy a termékeket különféle környezetekben jelenítsük meg, és így érdekesebbé/vonzóbbá tegyük a kreatívokat. Szót ejtünk a megközelítés nehézségeiről, lehetséges megoldásokról, és a módszer hatékonyságát vizsgáló mérésünk előzetes eredményeiről.
The Effect of Third Party Implementations on ReproducibilityBalázs Hidasi
This document examines the reproducibility of implementations of the GRU4Rec recommender algorithm. It analyzes several reimplementations of GRU4Rec in PyTorch, TensorFlow, Keras and benchmarking frameworks. It finds that while some reimplementations capture the overall architecture, they are missing features and hyperparameters described in the original papers. Some implementations also contain errors in their implementation. Offline experiments show performance degradations in the reimplementations compared to the original implementation, with median total performance losses ranging from 7-99% depending on the reimplementation and dataset. Training time comparisons show that versions with missing features require less time to train than a feature-complete version.
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task.
We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%.
In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper.
You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847
The code is available on GitHub: https://github.com/hidasib/GRU4Rec
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Deep learning: the future of recommendationsBalázs Hidasi
An informative talk about deep learning and its potential uses in recommender systems. Presented at the Budapest Startup Safary, 21 April, 2016.
The breakthroughs of the last decade in neural network research and the quick increasing of computational power resulted in the revival of deep neural networks and the field focusing on their training: deep learning. Deep learning methods have succeeded in complex tasks where other machine learning methods have failed, such as computer vision and natural language processing. Recently deep learning has began to gain ground in recommender systems as well. This talk introduces deep learning and its applications, with emphasis on how deep learning methods can solve long standing recommendation problems.
Context-aware preference modeling with factorizationBalázs Hidasi
- The document outlines Balázs Hidasi's research on context-aware recommendation models using factorization techniques.
- It introduces context-aware algorithms like iTALS and iTALSx that estimate preferences using ALS learning and scale linearly with data.
- Methods for speeding up ALS through approximate solutions like ALS-CG and ALS-CD are described, providing significant speed gains.
- A General Factorization Framework (GFF) is presented that allows experimenting with novel context-aware preference models beyond traditional approaches.
Approximate modeling of continuous context in factorization algorithms (CaRR1...Balázs Hidasi
This document proposes two approaches to approximately model continuous context in factorization recommendation algorithms: 1) Fuzzy event modeling which associates events near context boundaries with multiple contexts and 2) Fuzzy context modeling which uses overlapping context states and mixtures of their features. It shows fuzzy context modeling improved recommendation accuracy in implicit feedback datasets by 3-500% when applied to seasonality context in an iTALS algorithm. Future work could apply the approaches to other algorithms and address modeling context as truly continuous.
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
Utilizing additional information in factorization methods is an overview of research into context-aware recommender systems using factorization models. It discusses improving factorization methods from early context-aware tensor models like iTALS and iTALSx to a general factorization framework. The research aims to better model implicit feedback, context, and improve scalability using techniques like conjugate gradient descent learning. Future work includes estimating the utility of context dimensions, modeling continuous context variables, and optimizing models with pairwise ranking loss functions.
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Balázs Hidasi
Ez a diasor egy ismeretterjesztő előadáshoz készült.
Az előadás témája az implicit feedback alapú ajánlás (amikor a felhasználók preferenciái nem olvashatóak ki közvetlenül az adatokból), és a probléma néhány lehetséges megoldása. A prezentáció a probléma ismertetését követően kitér néhány kutatási eredményemre, mint például a mátrix faktorizáció inicializálására, vagy az implicit tenzorfaktorizációra.
Az előadásra 2012. nyarán, a BME Távközlési és Médiainformatikai Tanszéke (TMIT) által szervezett szemináriumon került sor.
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
This document summarizes research on incorporating context awareness into item-to-item recommendation similarities within a factorization framework. It describes four levels of context-aware similarity calculation and reports on experiments comparing the levels using four datasets. The results showed that context awareness generally improved recommendations but the degree of improvement depended heavily on the method and quality of the contextual information. The most context-sensitive method (elementwise product level 2) showed huge improvements or decreases depending on the context, while other methods showed only minor gains. Future work could explore different contexts, similarity measures, and evaluation approaches.
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...Balázs Hidasi
This presentation is about the context-aware recommender algorithm iTALS.
iTALS is a context-aware recommender algorithm for implicit feedback data. The user-item-context(s) setup is modelled in a binary tensor. Weights are also assigned to the cells based on the certainity of their information. An ALS-based algorithm is proposed that is capable of efficiently factorizing this tensor. Additionally a novel context information is introduced: sequentiality. This context allows us to incorporate association rule like information into the factorization framework and to differentiate between items with different repetetiveness patters and thus to make recommendations more accurate.
This presentation was originally given at ECML/PKDD 2012 in Bristol.
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)Balázs Hidasi
A prezentáció témája a ShiftTree névre hallgató, egyedi, model alapú idősor-osztályozó.
A ShiftTree az idősor-osztályozás problémájának egy egyedülálló, modell alapú megközelítése. Az elképzelés alapja, hogy minden idősorhoz egy szemet (kurzort) rendelünk, ami az időtengely egy adott pontjára mutat. Dinamikus attribútumokat hozunk létre úgy, hogy a következő két kérdésre válaszolunk: (1) Hová nézzünk az időtengelyen? (2) Mit nézzünk az adott pontban? Az első kérdésre adott válasz azt mondja meg, hogy hogyan mozgassuk a szemet az időtengely mentén. A második válasz pedig azt definiálja, hogy hogyan számoljuk ki a dinamikus attribútum értékét az adott pontban. Ezeket a dinamikus attribútumokat ezután egy bináris döntési fában használjuk fel.
Ez a diasor a ShiftTree egy korai (2009-es) verzióját mutatja be.
A prezentáció a 2009-es Végzős Konferencián került bemutatásra.
Megjegyzés: valamilyen oknál fogva a SlideShare nem támogatja az animációkat, ezért az animált diák több diára lettek szétszedve.
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)Balázs Hidasi
A prezentáció témája a ShiftTree névre hallgató, egyedi, model alapú idősor-osztályozó.
A ShiftTree az idősor-osztályozás problémájának egy egyedülálló, modell alapú megközelítése. Az elképzelés alapja, hogy minden idősorhoz egy szemet (kurzort) rendelünk, ami az időtengely egy adott pontjára mutat. Dinamikus attribútumokat hozunk létre úgy, hogy a következő két kérdésre válaszolunk: (1) Hová nézzünk az időtengelyen? (2) Mit nézzünk az adott pontban? Az első kérdésre adott válasz azt mondja meg, hogy hogyan mozgassuk a szemet az időtengely mentén. A második válasz pedig azt definiálja, hogy hogyan számoljuk ki a dinamikus attribútum értékét az adott pontban. Ezeket a dinamikus attribútumokat ezután egy bináris döntési fában használjuk fel.
Ez a diasor a legteljesebb, a ShiftTree-ről szóló prezentációk közül. Tartalmaz több kiegészítést, valamint leír néhány olyan megoldást, amik a kutatás során előkerültek, de végül zsákutcának bizonyultak.
A prezentáció egy 2012. februári előadáshoz tartozik, amire az ML@BP rendezvénysorozat keretein belül került sor.
Megjegyzés: valamilyen oknál fogva a SlideShare nem támogatja az animációkat, ezért az animált diák több diára lettek szétszedve.
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)Balázs Hidasi
This slideshow is about the time series classifier algorithm, ShiftTree.
ShiftTree is a unique, model-based approach for time series classification. The basic idea is that we assign a cursor (or eye) to each series and move this to certain positions on the time axis. We generate dynamic attributes by answering two questions: (1) Where to look? (2) What to look at?. The answer to the first question tells us where to move the cursor (e.g.: forward 100 steps, to the previous local maxima, etc), while the second answer defines the calculation of the dynamic attributes (e.g.: value at that point, the weighted avarage of the values around the position, the difference in the current and previous cursor position, etc). These dynamic attributes then used in a binary decision tree.
This slideshow was originally presented at ECML/PKDD 2011 in Athens.
Note that for whatever reasons SlideShare doesn't support animations. Therefore the animated slides were split into multiple slides.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelBalázs Hidasi
UPDATE: Typo on the 8th slide, last line should be (slides can't be modified on slideshare):
grad(log(p_gamma(x|y))) = (1-gamma)*grad(log(p(x))) + gamma*grad(log(p(x|y)))
My presentation on using generative AI for creative generation for e-commerce. Presented on 14 November 2023 at the TECH meetup series organized by Gravity R&D, a Taboola company. Slides are in Hungarian.
*****
Title/abstract in English:
Mass production of unique product creatives with generative AI
-----
The probability of a user clicking on an online advertisement is greatly influenced by creative's look. Traditional brand level campaigns require only a few creatives that can be produced by humans. However product level recommendations require creatives for every single product. Producing these using human work is infeasible at scale, thus they are often shown in front of simple (e.g. white) backgrounds. This presentation showcases a solution based on generative AI that allows placing products in different environments, which makes the creatives more appealing. I'll talk about the challenges of this approach along with potential solutions, as well as the initial results of our live test.
*****
Eredeti absztrakt:
Az online hirdetések megjelenése nagyban befolyásolja a rákattintás valószínűségét. A tradicionális márka szinten targetált kampányokhoz szükséges egy-két kreatív/banner legyártása még emberi erőforrás igénybevételével is megoldható. Termék szintű ajánlás esetén viszont minden egyes termékhez külön kreatívra van szükség, akár több felbontásban. Nagyszámú kreatív legyártása emberi erővel lassú és drága, ezért gyakori megközelítés a terméket valamilyen egyszerű, például egyszínű, háttér előtt megjeleníteni. Az előadás során bemutatunk egy generatív AI technológián alapuló megoldást, ami lehetővé teszi, hogy a termékeket különféle környezetekben jelenítsük meg, és így érdekesebbé/vonzóbbá tegyük a kreatívokat. Szót ejtünk a megközelítés nehézségeiről, lehetséges megoldásokról, és a módszer hatékonyságát vizsgáló mérésünk előzetes eredményeiről.
The Effect of Third Party Implementations on ReproducibilityBalázs Hidasi
This document examines the reproducibility of implementations of the GRU4Rec recommender algorithm. It analyzes several reimplementations of GRU4Rec in PyTorch, TensorFlow, Keras and benchmarking frameworks. It finds that while some reimplementations capture the overall architecture, they are missing features and hyperparameters described in the original papers. Some implementations also contain errors in their implementation. Offline experiments show performance degradations in the reimplementations compared to the original implementation, with median total performance losses ranging from 7-99% depending on the reimplementation and dataset. Training time comparisons show that versions with missing features require less time to train than a feature-complete version.
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task.
We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%.
In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper.
You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847
The code is available on GitHub: https://github.com/hidasib/GRU4Rec
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Deep learning: the future of recommendationsBalázs Hidasi
An informative talk about deep learning and its potential uses in recommender systems. Presented at the Budapest Startup Safary, 21 April, 2016.
The breakthroughs of the last decade in neural network research and the quick increasing of computational power resulted in the revival of deep neural networks and the field focusing on their training: deep learning. Deep learning methods have succeeded in complex tasks where other machine learning methods have failed, such as computer vision and natural language processing. Recently deep learning has began to gain ground in recommender systems as well. This talk introduces deep learning and its applications, with emphasis on how deep learning methods can solve long standing recommendation problems.
Context-aware preference modeling with factorizationBalázs Hidasi
- The document outlines Balázs Hidasi's research on context-aware recommendation models using factorization techniques.
- It introduces context-aware algorithms like iTALS and iTALSx that estimate preferences using ALS learning and scale linearly with data.
- Methods for speeding up ALS through approximate solutions like ALS-CG and ALS-CD are described, providing significant speed gains.
- A General Factorization Framework (GFF) is presented that allows experimenting with novel context-aware preference models beyond traditional approaches.
Approximate modeling of continuous context in factorization algorithms (CaRR1...Balázs Hidasi
This document proposes two approaches to approximately model continuous context in factorization recommendation algorithms: 1) Fuzzy event modeling which associates events near context boundaries with multiple contexts and 2) Fuzzy context modeling which uses overlapping context states and mixtures of their features. It shows fuzzy context modeling improved recommendation accuracy in implicit feedback datasets by 3-500% when applied to seasonality context in an iTALS algorithm. Future work could apply the approaches to other algorithms and address modeling context as truly continuous.
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
Utilizing additional information in factorization methods is an overview of research into context-aware recommender systems using factorization models. It discusses improving factorization methods from early context-aware tensor models like iTALS and iTALSx to a general factorization framework. The research aims to better model implicit feedback, context, and improve scalability using techniques like conjugate gradient descent learning. Future work includes estimating the utility of context dimensions, modeling continuous context variables, and optimizing models with pairwise ranking loss functions.
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Balázs Hidasi
Ez a diasor egy ismeretterjesztő előadáshoz készült.
Az előadás témája az implicit feedback alapú ajánlás (amikor a felhasználók preferenciái nem olvashatóak ki közvetlenül az adatokból), és a probléma néhány lehetséges megoldása. A prezentáció a probléma ismertetését követően kitér néhány kutatási eredményemre, mint például a mátrix faktorizáció inicializálására, vagy az implicit tenzorfaktorizációra.
Az előadásra 2012. nyarán, a BME Távközlési és Médiainformatikai Tanszéke (TMIT) által szervezett szemináriumon került sor.
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
This document summarizes research on incorporating context awareness into item-to-item recommendation similarities within a factorization framework. It describes four levels of context-aware similarity calculation and reports on experiments comparing the levels using four datasets. The results showed that context awareness generally improved recommendations but the degree of improvement depended heavily on the method and quality of the contextual information. The most context-sensitive method (elementwise product level 2) showed huge improvements or decreases depending on the context, while other methods showed only minor gains. Future work could explore different contexts, similarity measures, and evaluation approaches.
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...Balázs Hidasi
This presentation is about the context-aware recommender algorithm iTALS.
iTALS is a context-aware recommender algorithm for implicit feedback data. The user-item-context(s) setup is modelled in a binary tensor. Weights are also assigned to the cells based on the certainity of their information. An ALS-based algorithm is proposed that is capable of efficiently factorizing this tensor. Additionally a novel context information is introduced: sequentiality. This context allows us to incorporate association rule like information into the factorization framework and to differentiate between items with different repetetiveness patters and thus to make recommendations more accurate.
This presentation was originally given at ECML/PKDD 2012 in Bristol.
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)Balázs Hidasi
A prezentáció témája a ShiftTree névre hallgató, egyedi, model alapú idősor-osztályozó.
A ShiftTree az idősor-osztályozás problémájának egy egyedülálló, modell alapú megközelítése. Az elképzelés alapja, hogy minden idősorhoz egy szemet (kurzort) rendelünk, ami az időtengely egy adott pontjára mutat. Dinamikus attribútumokat hozunk létre úgy, hogy a következő két kérdésre válaszolunk: (1) Hová nézzünk az időtengelyen? (2) Mit nézzünk az adott pontban? Az első kérdésre adott válasz azt mondja meg, hogy hogyan mozgassuk a szemet az időtengely mentén. A második válasz pedig azt definiálja, hogy hogyan számoljuk ki a dinamikus attribútum értékét az adott pontban. Ezeket a dinamikus attribútumokat ezután egy bináris döntési fában használjuk fel.
Ez a diasor a ShiftTree egy korai (2009-es) verzióját mutatja be.
A prezentáció a 2009-es Végzős Konferencián került bemutatásra.
Megjegyzés: valamilyen oknál fogva a SlideShare nem támogatja az animációkat, ezért az animált diák több diára lettek szétszedve.
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)Balázs Hidasi
A prezentáció témája a ShiftTree névre hallgató, egyedi, model alapú idősor-osztályozó.
A ShiftTree az idősor-osztályozás problémájának egy egyedülálló, modell alapú megközelítése. Az elképzelés alapja, hogy minden idősorhoz egy szemet (kurzort) rendelünk, ami az időtengely egy adott pontjára mutat. Dinamikus attribútumokat hozunk létre úgy, hogy a következő két kérdésre válaszolunk: (1) Hová nézzünk az időtengelyen? (2) Mit nézzünk az adott pontban? Az első kérdésre adott válasz azt mondja meg, hogy hogyan mozgassuk a szemet az időtengely mentén. A második válasz pedig azt definiálja, hogy hogyan számoljuk ki a dinamikus attribútum értékét az adott pontban. Ezeket a dinamikus attribútumokat ezután egy bináris döntési fában használjuk fel.
Ez a diasor a legteljesebb, a ShiftTree-ről szóló prezentációk közül. Tartalmaz több kiegészítést, valamint leír néhány olyan megoldást, amik a kutatás során előkerültek, de végül zsákutcának bizonyultak.
A prezentáció egy 2012. februári előadáshoz tartozik, amire az ML@BP rendezvénysorozat keretein belül került sor.
Megjegyzés: valamilyen oknál fogva a SlideShare nem támogatja az animációkat, ezért az animált diák több diára lettek szétszedve.
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)Balázs Hidasi
This slideshow is about the time series classifier algorithm, ShiftTree.
ShiftTree is a unique, model-based approach for time series classification. The basic idea is that we assign a cursor (or eye) to each series and move this to certain positions on the time axis. We generate dynamic attributes by answering two questions: (1) Where to look? (2) What to look at?. The answer to the first question tells us where to move the cursor (e.g.: forward 100 steps, to the previous local maxima, etc), while the second answer defines the calculation of the dynamic attributes (e.g.: value at that point, the weighted avarage of the values around the position, the difference in the current and previous cursor position, etc). These dynamic attributes then used in a binary decision tree.
This slideshow was originally presented at ECML/PKDD 2011 in Athens.
Note that for whatever reasons SlideShare doesn't support animations. Therefore the animated slides were split into multiple slides.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Initialization of matrix factorization (CaRR 2012 presentation)
1. ENHANCING MATRIX FACTORIZATION
THROUGH INITIALIZATION FOR
IMPLICIT FEEDBACK DATABASES
Balázs Hidasi
Domonkos Tikk
Gravity R&D Ltd.
Budapest University of Technology and Economics
CARR WORKSHOP, 14TH FEBRUARY 2012, LISBON
3. MATRIX FACTORIZATION
Collaborative Filtering
One of the most common approaches
Approximates the rating matrix as product of low-
rank matrices
Items
Q
Users
R ≈ P
3/19
4. MATRIX FACTORIZATION
Initialize P and Q with small random numbers
Teach P and Q
Alternating Least Squares
Gradient Descent
Etc.
Transforms the data to a feature space
Separately for users and items
Noise reduction
Compression
Generalization
4/19
5. IMPLICIT FEEDBACK
No ratings
User-item interactions (events)
Much noisier
Presence of an event might not be positive feedback
Absence of an event does not mean negative
feedback
No negative feedback is available!
More common problem
MF for implicit feedback
Less accurate results due to noise
Mostly ALS is used 5/19
Scalability problems (rating matrix is dense)
6. CONCEPT
Good MF model
The feature vectors of similar entities are similar
If data is too noisy similar entities won’t be similar by
their features
Start MF from a „good” point
Feature vector similarities are OK
Data is more than just events
Metadata
Info about items/users
Contextual data
In what context did the event occured
Can we incorporate those to help implicit MF? 6/19
7. NAIVE APPROACH
Describe items using any data we have (detailed
later)
Long, sparse vectors for item description
Compress these vectors to dense feature vectors
PCA, MLP, MF, …
Length of desired vectors = Number of features in MF
Use these features as starting points
7/19
8. NAIVE APPROACH
Compression and also noise reduction
Does not really care about similarities
But often feature similarities are not that bad
If MF is used
Half of the results is thrown out
Descriptors
features Descriptor features
≈
Items
Item
Description of items
8/19
9. SIMFACTOR ALGORITHM
Try to preserve similarities better
Starting from an MF of item description
Descriptors
Descriptors features
features
Description of items
≈
Items
Item
(D)
Similarities of items: DD’
Description of items
Some metrics require transformation on D
(D’)
Item
Description of items
similarities
(S)
= (D) 9/19
10. SIMFACTOR ALGORITHM
Descriptors features
features
Item Descriptors features Item
≈ Item
(X)
similarities (Y’) features
(S) (X’)
(Y)
Similarity approximation
features
Item Item
Item
≈
(X)
similarities Y’Y features
(S) (X’)
Y’Y KxK symmetric 10/19
Eigendecomposition
11. SIMFACTOR ALGORITHM
Y’Y = U λ U’
λ diagonal λ = SQRT(λ) * SQRT(λ)
features
Item Item
≈
Item
SQRT SQRT
(X)
similarities U (λ) (λ) U’ features
(S) (X’)
X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F
F is MxK matrix
S F * F’ F used for initialization
Item
similarities ≈ F’ 11/19
F
(S)
12. CREATING THE DESCRIPTION MATRIX
„Any” data about the entity
Vector-space reprezentation
For Items:
Metadata vector (title, category, description, etc)
Event vector (who bought the item)
Context-state vector (in which context state was it
bought)
Context-event (in which context state who bought it)
For Users:
All above except metadata
Currently: Choose one source for D matrix
12/19
Context used: seasonality
13. EXPERIMENTS: SIMILARITY PRESERVATION
Real life dataset: online grocery shopping events
SimFactor RMSE improvement over naive in similarity
approximation
52.36%
48.70%
26.22%
16.86%
13.39% 12.38%
10.81%
Item context state User context state Item context- User context- Item event data User event data Item metadata
event event
13/19
SimFactor approximates similarities better
14. EXPERIMENTS: INITIALIZATION
Using different description matrices
And both naive and SimFactor initialization
Baseline: random init
Evaluation metric: recall@50
14/19
15. EXPERIMENTS: GROCERY DB
Up to 6% improvement
Best methods use SimFactor and user context data
Top5 methods on Grocery DB
5.71%
4.88%
4.30%
4.12% 4.04%
15/19
User context state User context state User context event User event data User context event
(SimFactor) (Naive) (SimFactor) (SimFactor) (Naive)
16. EXPERIMENTS: „IMPLICITIZED” MOVIELENS
Keeping 5 star ratings implicit events
Up to 10% improvement
Best methods use SimFactor and item context data
Top5 methods on MovieLens DB
10%
9.17% 9.17% 9.17% 9.17%
16/19
Item context state User context state Item context event Item context event Item context state
(SimFactor) (SimFactor) (SimFactor) (Naive) (Naive)
17. DISCUSSION OF RESULTS
SimFactor yields better results than naive
Context information yields better results than other
descriptions
Context information separates well between entities
Grocery: User context
People’s routines
Different types of shoppings in different times
MovieLens: Item context
Different types of movies watched on different hours
Context-based similarity
17/19
18. WHY CONTEXT?
Grocery example
Correlation between context states by users low
ITEM USER
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
Mon 1,00 0,79 0,79 0,78 0,76 0,70 0,74 Mon 1,00 0,36 0,34 0,34 0,35 0,29 0,19
Tue 0,79 1,00 0,79 0,78 0,76 0,69 0,73 Tue 0,36 1,00 0,34 0,34 0,33 0,29 0,19
Wed 0,79 0,79 1,00 0,79 0,76 0,70 0,74 Wed 0,34 0,34 1,00 0,36 0,35 0,27 0,17
Thu 0,78 0,78 0,79 1,00 0,76 0,71 0,74 Thu 0,34 0,34 0,36 1,00 0,39 0,30 0,16
Fri 0,76 0,76 0,76 0,76 1,00 0,71 0,72 Fri 0,35 0,33 0,35 0,39 1,00 0,32 0,16
Sat 0,70 0,69 0,70 0,71 0,71 1,00 0,71 Sat 0,29 0,29 0,27 0,30 0,32 1,00 0,33
Sun 0,74 0,73 0,74 0,74 0,72 0,71 1,00 Sun 0,19 0,19 0,17 0,16 0,16 0,33 1,00
Why can context aware algorithms be efficient?
Different recommendations in different context states
18/19
Context differentiates well between entities
Easier subtasks
19. CONCLUSION & FUTURE WORK
SimFactor Similarity preserving compression
Similarity based MF initialization:
Description matrix from any data
Apply SimFactor
Use output as initial features for MF
Context differentitates between entities well
Future work:
Mixed description matrix (multiple data sources)
Multiple description matrix
Using different context information
Using different similarity metrics 19/19
20. THANKS FOR YOUR ATTENTION!
For more of my recommender systems related research visit my website:
http://www.hidasi.eu