This document discusses the key considerations and challenges for productionizing recommender systems. It outlines the full lifecycle from scoping a recommender system project through deployment and continuous monitoring. Some of the main points covered include: defining requirements and key metrics; preparing data through feature engineering, cleaning and transformation; selecting and evaluating recommendation models both offline and through A/B testing; ensuring deployments are robust, scalable and address technical debt; and continuously monitoring systems for data or algorithmic drift once in production.
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Thanks to Maya Hristakeva for creating some of the slides.
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Thanks to Maya Hristakeva for creating some of the slides.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Joint work with Kris Jack, Chief Data Scientist at Mendeley.
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025
Today Operations Management is dominated by concerns in supply chain such as design of a good performance measurement system, revenue or resource sharing, customer centric and/or process view of the supply chain.
Best Practices in Recommender System ChallengesAlan Said
Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
This ppt covers the following topics
Software quality
A framework for product metrics
A product metrics taxonomy
Metrics for the analysis model
Metrics for the design model
Metrics for maintenance
Project Explanation: Book Recommendation System
The goal of this project was to develop a book recommendation system that provides personalized recommendations to users based on their preferences and past reading behavior. The project involved the following key steps:
1. Data Collection: I gathered a comprehensive dataset of books, including information such as titles, authors, genres, and user ratings. This data was obtained from various reliable sources, such as online bookstores or publicly available book datasets.
2. Data Preprocessing: The collected data required cleaning and preprocessing to ensure its quality and consistency. I handled missing values, resolved inconsistencies in book titles or authors, and standardized the data format for further analysis.
3. Exploratory Data Analysis: I performed exploratory data analysis to gain insights into the dataset. This included analyzing book genres, distribution of user ratings, and identifying popular authors or books.
4. Feature Engineering: To capture the preferences and interests of users, I created relevant features from the available data. These features could include book genres, authors, user demographics, or historical reading behavior.
5. Recommendation Model Development: I developed a recommendation model using collaborative filtering techniques or content-based filtering methods. Collaborative filtering utilizes the preferences of similar users to make recommendations, while content-based filtering suggests books based on their attributes and user preferences. I employed popular machine learning algorithms, such as matrix factorization or k-nearest neighbors, to build the recommendation model.
6. Model Evaluation: I evaluated the performance of the recommendation system using metrics such as precision, recall, or mean average precision. I also conducted A/B testing or cross-validation to assess the system's effectiveness and optimize its performance.
7. User Interface Development: I created a user-friendly interface where users could input their preferences and receive personalized book recommendations. The interface provided an intuitive and interactive experience, allowing users to explore recommended books and provide feedback.
8. Deployment and Feedback Loop: The recommendation system was deployed in a production environment, where users could access it and provide feedback on the recommended books. This feedback was incorporated into the system to continually improve its accuracy and relevance over time.
By completing this project, I gained hands-on experience in data collection, preprocessing, exploratory data analysis, and recommendation system development. I demonstrated my ability to leverage machine learning algorithms and user data to build a personalized book recommendation system that enhances user engagement and satisfaction.
Olist Store Analysis
ccording to the data, Olist E-commerce has about 99,440 orders. With about 89,940 orders being delivered, the company has a 90% delivery success rate.
✔Their average product rating is 4.09 stars, with product categories going as high as 4.67 stars and as low as 2.5 stars. 1 Star reviews are on third place in the review score distribution ranking which likely indicates that there could be problems with product quality in some product categories
✔It helps in understanding the spending patterns of customers in sao paulo city .it also helps Olist in identifying high value customers and creating targeted marketing campaigns.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
I believe that our existing models of testing are not fit for purpose – they are inconsistent, controversial, partial, proprietary and stuck in the past. They are not going to support us in the rapidly emerging technologies and approaches. The certification schemes that should represent the interests and integrity of our profession don’t, and we are left with schemes that are popular, but have low value, lower esteem and attract harsh criticism. My goal in proposing the New Model is to stimulate new thinking in this area.
eurostarconferences.com
testhuddle.com
I believe that our existing models of testing are not fit for purpose – they are inconsistent, controversial, partial, proprietary and stuck in the past. They are not going to support us in the rapidly emerging technologies and approaches. The certification schemes that should represent the interests and integrity of our profession don’t, and we are left with schemes that are popular, but have low value, lower esteem and attract harsh criticism. My goal in proposing the New Model is to stimulate new thinking in this area.
eurostarconferences.com
testhuddle.com
Computers and the Internet in sensory quality control
Chris Findlay*
Compusense Inc., 111 Farquhar Street, Guelph, Ontario, Canada N1H 3N4
Accepted 8 February 2002
This talk includes the following items:
1) discussion of the various stages of ML application life cycle - problem formulation, data definitions, modeling, production system design & implementation, testing, deployment & maintenance, online evaluation & evolution.
2) getting the ML problem formulation right
3) key tenets for different stages of application cycle.
Audio for the talk:
https://youtu.be/oBR8flk2TjQ?t=19207
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025.
The link to the ipynb : https://www.kaggle.com/axelderomblay/udacity-workshop-on-recommendation-systems
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
Presentation given for one of Pearson's Data Research teams. It motivates the use of recommender systems, describes common approaches to building and evaluating them and gives examples of how they are used in Mendeley. Joint work with Kris Jack, Chief Data Scientist at Mendeley.
A 1h webinar on RecSys for the Udacity NanoDegree Program "How to become a Data Scientist" : https://in.udacity.com/course/data-scientist-nanodegree--nd025
Today Operations Management is dominated by concerns in supply chain such as design of a good performance measurement system, revenue or resource sharing, customer centric and/or process view of the supply chain.
Best Practices in Recommender System ChallengesAlan Said
Recommender System Challenges such as the Netflix Prize, KDD Cup, etc. have contributed vastly to the development and adoptability of recommender systems. Each year a number of challenges or contests are organized covering different aspects of recommendation. In this tutorial and panel, we present some of the factors involved in successfully organizing a challenge, whether for reasons purely related to research, industrial challenges, or to widen the scope of recommender systems applications.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
The slide contains some high level information about some machine learning algorithms, cross validation and feature extraction techniques. It also contains high level techniques about high available and scalable ML products.
This ppt covers the following topics
Software quality
A framework for product metrics
A product metrics taxonomy
Metrics for the analysis model
Metrics for the design model
Metrics for maintenance
Project Explanation: Book Recommendation System
The goal of this project was to develop a book recommendation system that provides personalized recommendations to users based on their preferences and past reading behavior. The project involved the following key steps:
1. Data Collection: I gathered a comprehensive dataset of books, including information such as titles, authors, genres, and user ratings. This data was obtained from various reliable sources, such as online bookstores or publicly available book datasets.
2. Data Preprocessing: The collected data required cleaning and preprocessing to ensure its quality and consistency. I handled missing values, resolved inconsistencies in book titles or authors, and standardized the data format for further analysis.
3. Exploratory Data Analysis: I performed exploratory data analysis to gain insights into the dataset. This included analyzing book genres, distribution of user ratings, and identifying popular authors or books.
4. Feature Engineering: To capture the preferences and interests of users, I created relevant features from the available data. These features could include book genres, authors, user demographics, or historical reading behavior.
5. Recommendation Model Development: I developed a recommendation model using collaborative filtering techniques or content-based filtering methods. Collaborative filtering utilizes the preferences of similar users to make recommendations, while content-based filtering suggests books based on their attributes and user preferences. I employed popular machine learning algorithms, such as matrix factorization or k-nearest neighbors, to build the recommendation model.
6. Model Evaluation: I evaluated the performance of the recommendation system using metrics such as precision, recall, or mean average precision. I also conducted A/B testing or cross-validation to assess the system's effectiveness and optimize its performance.
7. User Interface Development: I created a user-friendly interface where users could input their preferences and receive personalized book recommendations. The interface provided an intuitive and interactive experience, allowing users to explore recommended books and provide feedback.
8. Deployment and Feedback Loop: The recommendation system was deployed in a production environment, where users could access it and provide feedback on the recommended books. This feedback was incorporated into the system to continually improve its accuracy and relevance over time.
By completing this project, I gained hands-on experience in data collection, preprocessing, exploratory data analysis, and recommendation system development. I demonstrated my ability to leverage machine learning algorithms and user data to build a personalized book recommendation system that enhances user engagement and satisfaction.
Olist Store Analysis
ccording to the data, Olist E-commerce has about 99,440 orders. With about 89,940 orders being delivered, the company has a 90% delivery success rate.
✔Their average product rating is 4.09 stars, with product categories going as high as 4.67 stars and as low as 2.5 stars. 1 Star reviews are on third place in the review score distribution ranking which likely indicates that there could be problems with product quality in some product categories
✔It helps in understanding the spending patterns of customers in sao paulo city .it also helps Olist in identifying high value customers and creating targeted marketing campaigns.
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
Looking to build a robust machine learning infrastructure to streamline MLOps? Learn from Provectus experts how to ensure the success of your MLOps initiative by implementing Data QA components in your ML infrastructure.
For most organizations, the development of multiple machine learning models, their deployment and maintenance in production are relatively new tasks. Join Provectus as we explain how to build an end-to-end infrastructure for machine learning, with a focus on data quality and metadata management, to standardize and streamline machine learning life cycle management (MLOps).
Agenda
- Data Quality and why it matters
- Challenges and solutions of Data Testing
- Challenges and solutions of Model Testing
- MLOps pipelines and why they matter
- How to expand validation pipelines for Data Quality
I believe that our existing models of testing are not fit for purpose – they are inconsistent, controversial, partial, proprietary and stuck in the past. They are not going to support us in the rapidly emerging technologies and approaches. The certification schemes that should represent the interests and integrity of our profession don’t, and we are left with schemes that are popular, but have low value, lower esteem and attract harsh criticism. My goal in proposing the New Model is to stimulate new thinking in this area.
eurostarconferences.com
testhuddle.com
I believe that our existing models of testing are not fit for purpose – they are inconsistent, controversial, partial, proprietary and stuck in the past. They are not going to support us in the rapidly emerging technologies and approaches. The certification schemes that should represent the interests and integrity of our profession don’t, and we are left with schemes that are popular, but have low value, lower esteem and attract harsh criticism. My goal in proposing the New Model is to stimulate new thinking in this area.
eurostarconferences.com
testhuddle.com
Computers and the Internet in sensory quality control
Chris Findlay*
Compusense Inc., 111 Farquhar Street, Guelph, Ontario, Canada N1H 3N4
Accepted 8 February 2002
This talk includes the following items:
1) discussion of the various stages of ML application life cycle - problem formulation, data definitions, modeling, production system design & implementation, testing, deployment & maintenance, online evaluation & evolution.
2) getting the ML problem formulation right
3) key tenets for different stages of application cycle.
Audio for the talk:
https://youtu.be/oBR8flk2TjQ?t=19207
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
6. Classical recommendation model
Three types of entities: Users, Items and Contexts
1. A background knowledge:
• A set of ratings – preferences
• r: Users x Items x Contexts --> {1, 2, 3, 4, 5}
• A set of “features” of the Users, Items and Contexts
2. A method for predicting the function r where it is unknown:
• r*(u, i, c) = Average ratings r(u’, i, c’): users u’ are similar to u and context c’ is
similar to c
3. A method for selecting the items to recommend (choice):
• In context c recommend to u the item i* with the largest predicted rating
r*(u,i,c)
7. The goal is to find items that the user will
happily choose
8. The goal is to find items that the user will
happily choose
10. Requirements surrounding RecSys
Sculley, David, et al. "Hidden technical debt in machine learning systems." Advances in neural information processing systems 28 (2015).
11. Requirements surrounding RecSys
Sculley, David, et al. "Hidden technical debt in machine learning systems." Advances in neural information processing systems 28 (2015).
User
Modeling
12. Scoping
1. Decide the recommendation paradigm and degree of
personalization
• Resources and product
• Type of user we are catering to
2. KPI metrics:
• Online: CVR, CTR, ad-hoc metrics
• Offline: recall, nDCG
Ricci, Francesco, Lior Rokach, and Bracha Shapira. "Recommender systems: introduction and challenges." Recommender systems handbook (2015): 1-34.
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
13. The life cycle of a recommender
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
1. Features:
• Reusable
• Transformable
• Interpretable
• Reliable
2. Type of feedback
• Implicit data is (usually):
• More dense, and available
for all users
• Better representative of
user behavior vs. user
reflection
• More related to final
objective function ○ Better
correlated with AB test
results
• E.g. Rating vs watching
3. Cleaning
• Dropping bots
• Missing data strategy
• Standardisation
• Bucketing
Beliakov, Gleb, Tomasa Calvo, and Simon James. "Aggregation of preferences in recommender systems." Recommender systems handbook (2011): 705-734.
15. Data preparation
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
1. Features:
• Reusable
• Transformable
• Interpretable
• Reliable
2. Type of feedback
• Implicit data is (usually):
• More dense, and available
for all users
• Better representative of
user behavior vs. user
reflection
• More related to final
objective function ○ Better
correlated with AB test
results
• E.g. Rating vs watching
3. Cleaning
• Dropping bots
• Missing data strategy
• Standardisation
• Bucketing
Beliakov, Gleb, Tomasa Calvo, and Simon James. "Aggregation of preferences in recommender systems." Recommender systems handbook (2011): 705-734.
16. Modelling
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline evaluation
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
Latency is key!
18. Collaborative Filtering
Coba, L., Rook, L., Zanker, M., & Symeonidis, P. (2019, March). Decision making strategies differ in the presence of collaborative explanations: two conjoint studies. IUI 19.
19. Content-based
Dominguez, V., Messina, P., Donoso-Guzmán, I., & Parra, D. (2019, March). The effect of explanations and algorithmic accuracy on visual recommender systems of artistic images. IUI 2019.
21. Selecting the model
• There is no single winner
• Usually, models are ensembled
• Multi-stage architectures are
common
• Live predictions or in batches
• DNNs are not always deployable
(cost-benefit tradeoff and latency)
22. A non personalized recommender
• We can use a hybrid
recommender: collaborative
filtering + content based
• Collaborative filtering suffers
from cold start, but it is better
performing
• If the catalogue doesn’t
change often then we can pre-
compute interactions
24. Non-personalized recommender
Edge device
API Call
Recommendations
ANN
Search
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point https://github.com/spotify/annoy
25. Scalable personalized recommender
Covington, Paul, Jay Adams, and Emre Sargin. "Deep neural networks for youtube recommendations." Proceedings of the 10th ACM conference on recommender systems. 2016.
26. Scalable personalized recommender
Amatrian, Xavier. “Blueprints for recommender system architectures: 10th anniversary edition.” https://amatriain.net/blog/RecsysArchitectures
27. Modelling
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline evaluation
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
• Latency is key!
• Choose the right architecture based on the situation
• Pick the right metrics:
• recall for retrieval
• nDCG or MRR for ranking @ K, or precision if ranking is irrelevant
• Go beyond traditional metrics, measure diversity, novelty and see whether the models are biased
• UAT
• Define data contracts
28. Deploying
1. Test
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
29. Deploying
1. Test – high code coverage
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
30. Deploying
1. Test – high code coverage
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
31. Deploying
1. Test – high code coverage
2. Test
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
32. Deploying
1. Test – high code coverage
2. Test – smart deployment strategy
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
33. Tech debt sources
• Glue code
• Pipeline Jungles
• Dead Experimental Codepaths
• Plain-Old-Data Type Smell
• Multiple-Language Smell
• Prototype Smell
• Reproducibility Debt
• Cultural Debt
34. Deploying
1. Test – high code coverage
2. Test – smart deployment strategy
3. Test
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
35. Deploying
1. Test – high code coverage
2. Test – smart deployment strategy
3. Test – A/B testing, interleaving or MAB
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring
36.
37. Deploying
1. Test – high code coverage
2. Test – smart deployment strategy
3. Test – A/B testing, interleaving or MAB
4. Keep monitoring
Scoping
• Define the project
• Define KPI metrics
Data
• Features
• Type of feedback
• Cleaning
• Transformation
Modelling
• Select model
• Offline test model
• Check
fairness/explainability
• UAT
Deployment
• Deploy in production
• A/B testing
• Continuous
Monitoring