Most large-scale online recommender systems like newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest.
In this tutorial, we will talk about how we can apply Bayesian Optimization techniques to obtain the parameters for such complex online systems in order to balance the competing metrics. First, we will provide an in-depth introduction to Bayesian Optimization, covering some of the basics as well as the recent advances in the field. Second, we will talk about how to formulate a real-world recommender system problem as a black-box optimization problem that can be solved via Bayesian Optimization. We will focus on a few key problems such as newsfeed ranking, people recommendations, job recommendations, etc. Third, we will talk about the architecture of the solution and how we are able to deploy it for large-scale systems. Finally, we will discuss the extensions and some of the future directions in this domain.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
Product Recommendations Enhanced with Reviewsmaranlar
Tutorial presented by Muthusamy Chelliah (Flipkart, India) and Sudeshna Sarkar (IIT Kharagpur, India) at ACM RecSys 2017 https://recsys.acm.org/recsys17/tutorials/#content-tab-1-3-tab
E-commerce websites commonly deploy recommender systems that make use of user activity (e.g., ratings, views, and purchases) or content (product descriptions). These recommender systems can benefit enormously by also exploiting the information contained in customer reviews. Reviews capture the experience of multiple customers with diverse preferences, often on the fine-grained level of specific features of products. Reviews can also identify consumers’ preferences for product features and provide helpful explanations. The usefulness of reviews is evidenced by the prevalence of their use by customers to support shopping decisions online. With the appropriate techniques, recommender systems can benefit directly from user reviews.
This tutorial will present a range of techniques that allow recommender systems in e-commerce websites to take full advantage of reviews. Topics covered include text mining methods for feature-specific sentiment analysis of products, topic models and distributed representations that bridge the vocabulary gap between user reviews and product descriptions, and recommender algorithms that use review information to address the cold-start problem.
The tutorial sessions will be interspersed with examples from an online marketplace (i.e., Flipkart) and experience with using data mining and Natural Language Processing techniques (e.g., matrix factorization, LDA, word embeddings) from Web-scale systems.
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
Nearest neighbors refers to something that is conceptually very simple. For a set of points in some space (possibly many dimensions), we want to find the closest k neighbors quickly.
This presentation covers a library called Annoy built my me that that helps you do (approximate) nearest neighbor queries in high dimensional spaces. We're going through vector models, how to measure similarity, and why nearest neighbor queries are useful.
Presentation at the Netflix Expo session at RecSys 2020 virtual conference on 2020-09-24. It provides an overview of recommendation and personalization at Netflix and then highlights some of the things we’ve been working on as well as some important open research questions in the field of recommendations.
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
Overview of the Recommender system or recommendation system. RFM Concepts in brief. Collaborative Filtering in Item and User based. Content-based Recommendation also described.Product Association Recommender System. Stereotype Recommendation described with advantage and limitations.Customer Lifetime. Recommender System Analysis and Solving Cycle.
Product Recommendations Enhanced with Reviewsmaranlar
Tutorial presented by Muthusamy Chelliah (Flipkart, India) and Sudeshna Sarkar (IIT Kharagpur, India) at ACM RecSys 2017 https://recsys.acm.org/recsys17/tutorials/#content-tab-1-3-tab
E-commerce websites commonly deploy recommender systems that make use of user activity (e.g., ratings, views, and purchases) or content (product descriptions). These recommender systems can benefit enormously by also exploiting the information contained in customer reviews. Reviews capture the experience of multiple customers with diverse preferences, often on the fine-grained level of specific features of products. Reviews can also identify consumers’ preferences for product features and provide helpful explanations. The usefulness of reviews is evidenced by the prevalence of their use by customers to support shopping decisions online. With the appropriate techniques, recommender systems can benefit directly from user reviews.
This tutorial will present a range of techniques that allow recommender systems in e-commerce websites to take full advantage of reviews. Topics covered include text mining methods for feature-specific sentiment analysis of products, topic models and distributed representations that bridge the vocabulary gap between user reviews and product descriptions, and recommender algorithms that use review information to address the cold-start problem.
The tutorial sessions will be interspersed with examples from an online marketplace (i.e., Flipkart) and experience with using data mining and Natural Language Processing techniques (e.g., matrix factorization, LDA, word embeddings) from Web-scale systems.
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
Nearest neighbors refers to something that is conceptually very simple. For a set of points in some space (possibly many dimensions), we want to find the closest k neighbors quickly.
This presentation covers a library called Annoy built my me that that helps you do (approximate) nearest neighbor queries in high dimensional spaces. We're going through vector models, how to measure similarity, and why nearest neighbor queries are useful.
Presentation at the Netflix Expo session at RecSys 2020 virtual conference on 2020-09-24. It provides an overview of recommendation and personalization at Netflix and then highlights some of the things we’ve been working on as well as some important open research questions in the field of recommendations.
basic Function and Terminology of Recommendation Systems. Some Algorithmic Implementation with some sample Dataset for Understanding. It contains all the Layers of RS Framework well explained.
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
In this part 1 presentation, I have attempted to provide a '30,000 feet view' of BERT (Bidirectional Encoder Representations from Transformer) - a state of the art Language Model in NLP with high level technical explanations. I have attempted to collate useful information about BERT from various useful sources.
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task.
We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%.
In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper.
You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847
The code is available on GitHub: https://github.com/hidasib/GRU4Rec
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
What’s in a Query? Understanding query intentFlipkartStories
Rewind - Take a peek at the presentation deck from the talk by Flipkart data scientists Bharat Thakarar, Subhadeep Maji and Mohit Kumar at slash n 2018.
Bayesian Optimization for Balancing Metrics in Recommender SystemsViral Gupta
Most large-scale online recommender systems like newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest.
In this tutorial, we will talk about how we can apply Bayesian Optimization techniques to obtain the parameters for such complex online systems in order to balance the competing metrics. First, we will provide an in-depth introduction to Bayesian Optimization, covering some of the basics as well as the recent advances in the field. Second, we will talk about how to formulate a real-world recommender system problem as a black-box optimization problem that can be solved via Bayesian Optimization. We will focus on a few key problems such as newsfeed ranking, people recommendations, job recommendations, etc. Third, we will talk about the architecture of the solution and how we are able to deploy it for large-scale systems. Finally, we will discuss the extensions and some of the future directions in this domain.
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Kinjal Basu from LinkedIn discussed Online Parameter Selection for web-based Ranking vis Bayesian Optimization
BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M
In this part 1 presentation, I have attempted to provide a '30,000 feet view' of BERT (Bidirectional Encoder Representations from Transformer) - a state of the art Language Model in NLP with high level technical explanations. I have attempted to collate useful information about BERT from various useful sources.
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task.
We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%.
In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper.
You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847
The code is available on GitHub: https://github.com/hidasib/GRU4Rec
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
What’s in a Query? Understanding query intentFlipkartStories
Rewind - Take a peek at the presentation deck from the talk by Flipkart data scientists Bharat Thakarar, Subhadeep Maji and Mohit Kumar at slash n 2018.
Bayesian Optimization for Balancing Metrics in Recommender SystemsViral Gupta
Most large-scale online recommender systems like newsfeed ranking, people recommendations, job recommendations, etc. often have multiple utilities or metrics that need to be simultaneously optimized. The machine learning models that are trained to optimize a single utility are combined together through parameters to generate the final ranking function. These combination parameters drive business metrics. Finding the right choice of the parameters is often done through online A/B experimentation, which can be incredibly complex and time-consuming, especially considering the non-linear effects of these parameters on the metrics of interest.
In this tutorial, we will talk about how we can apply Bayesian Optimization techniques to obtain the parameters for such complex online systems in order to balance the competing metrics. First, we will provide an in-depth introduction to Bayesian Optimization, covering some of the basics as well as the recent advances in the field. Second, we will talk about how to formulate a real-world recommender system problem as a black-box optimization problem that can be solved via Bayesian Optimization. We will focus on a few key problems such as newsfeed ranking, people recommendations, job recommendations, etc. Third, we will talk about the architecture of the solution and how we are able to deploy it for large-scale systems. Finally, we will discuss the extensions and some of the future directions in this domain.
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Kinjal Basu from LinkedIn discussed Online Parameter Selection for web-based Ranking vis Bayesian Optimization
Using Optimal Learning to Tune Deep Learning PipelinesSigOpt
SigOpt talk from NVIDIA GTC 2017 and AWS SF AI Day
We'll introduce Bayesian optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters, including hyperparameters, the architecture, and feature transformations, that can have a large impact on the efficacy of the model. We'll provide several example applications using multiple open source deep learning frameworks and open datasets. We'll compare the results of Bayesian optimization to standard techniques like grid search, random search, and expert tuning. Additionally, we'll present a robust benchmark suite for comparing these methods in general.
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
SigOpt talk from NVIDIA GTC 2017 and AWS Popup Loft AI Day
We'll introduce Bayesian optimization as an efficient way to optimize machine learning model parameters, especially when evaluating different parameters is time consuming or expensive. Deep learning pipelines are notoriously expensive to train and often have many tunable parameters, including hyperparameters, the architecture, and feature transformations, that can have a large impact on the efficacy of the model. We'll provide several example applications using multiple open source deep learning frameworks and open datasets. We'll compare the results of Bayesian optimization to standard techniques like grid search, random search, and expert tuning. Additionally, we'll present a robust benchmark suite for comparing these methods in general.
Best Practices for Hyperparameter Tuning with MLflowDatabricks
Hyperparameter tuning and optimization is a powerful tool in the area of AutoML, for both traditional statistical learning models as well as for deep learning. There are many existing tools to help drive this process, including both blackbox and whitebox tuning. In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, Bayesian optimization, and parzen estimators) and then discuss the open source tools which implement each of these techniques. Finally, we will discuss how we can leverage MLflow with these tools and techniques to analyze how our search is performing and to productionize the best models.
Speaker: Joseph Bradley
Tuning the Untunable - Insights on Deep Learning OptimizationSigOpt
Patrick Hayes originally gave this talk at ODSC West in 2018. During this talk, Patrick discusses a couple key barriers to deep learning optimization and how SigOpt solves them. First, Patrick discusses the problem of lengthy training cycles and how novel techniques like multitask optimization are designed to use partial information to solve this challenge. Second, Patrick discusses automated cluster management and how solving this problem makes it much easier to manage training cycles for these models.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
MongoDB 3.2 introduces a host of new features and benefits, including encryption at rest, document validation, MongoDB Compass, numerous improvements to queries and the aggregation framework, and more. To take advantage of these features, your team needs an upgrade plan.
In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. By the end, you should be prepared to start developing an upgrade plan for your deployment.
Advanced Optimization for the Enterprise WebinarSigOpt
Building on the TWIML eBook, TWIMLcon event and TWIML podcast series that explore Machine Learning Platforms in great detail, this webinar examines the machine learning platforms that power enterprise leaders in AI. SigOpt CEO Scott Clark will provide an overview of critical technical capabilities that our customers have prioritized in their ML platforms.
Review these slides to learn about:
- Critical capabilities for data, experiment and model management
- Tradeoffs between building and buying these capabilities
- Lessons from the implementation of these platforms by AI leaders
Why focus on these platforms and the capabilities that power them? Nearly every company is investing in machine learning that differentiates products or generates revenue. These so-called "differentiated models" represent the biggest opportunity for AI to transform the business. Most of these teams find success hiring expert data scientists and machine learning engineers who can build these models. But most of these teams also struggle to create a more sustainable, scalable and reproducible process for model development, and have begun building ML platforms to tackle this challenge.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
6. Why Bayesian Optimization?
E.g. in machine learning: optimizing hyperparameters
feature
vector
parameters
hyperparameters
f(): F1, precision, recall, accuracy, etc
parameters: neural network weights, decision tree features to
split onhyperparameters: learning rate, classification threshold
7. Why Bayesian Optimization?
E.g. in machine learning: optimizing hyperparameters
feature
vector
parameters
hyperparameters
parameters: fast/efficient to optimize
hyperparameters: slow/resource intensive to optimize
8. Why Bayesian Optimization?
E.g. in machine learning: optimizing hyperparameters
Want to optimize with a method that:
• minimizes function
evaluations
• can optimize black box
• quantifies uncertainty in
solution
9. Bayesian optimization
• model as a Gaussian random process
• fit model to data
• use uncertainty quantification to inform search for optimal x
24. Optimization with Gaussian processes
The basic recipe:
1. Choose kernel and mean functions
2. Sample function at several points
3. Use maximum likelihood to fit GP parameters
4. Are we done? If not, select new points to
sample from; GOTO step 3
33. Gaussian Process Regression: use maximum likelihood to fit GP
parameters
Sample function and treat problem as discrete
Gaussian RV
34. Use maximum likelihood to fit kernel parameters
Gaussian Process Regression: use maximum likelihood to fit GP
parameters
35. Optimization with Gaussian processes
The basic recipe:
1. Choose kernel and mean functions
2. Sample function at several points
3. Use maximum likelihood to fit GP parameters
4. Are we done? If not, select new points to
sample from; GOTO step 3
36. Additional Resources
Gaussian process book:
http://www.gaussianprocess.org/
• Especially chapters 2 and 5
Peter Frazier’s tutorial on Bayesian optimization: arXiv:1807.02811
Eytan Bakshy’s publications: https://eytan.github.io/
39. NAS Overlapped with BO
• Search Space
• Both BO and NAS need to define
a proper search space
• Search Strategy
• BO is one important search
strategy
• Performance Estimation Strategy
• Standard training and evaluation
is expensive. Performance
estimation is required
40. Search Space
The choice of the search space largely determines the difficulty of
the optimization problem
41. Search Strategy
• Existing methods: Random Search, Bayesian Optimization,
Evolutionary Methods, Reinforcement Learning, Gradient-based
Methods
• BO has not been widely applied to NAS yet since Gaussian
Process focuses more on low-dimensional continuous
optimization problems
42. Performance Estimation Strategy
• Training each architecture to be evaluated from scratch
frequently yields computational demands in the order of
thousands of GPU days for NAS
43. References
• Neural Architecture Search Survey:
https://arxiv.org/pdf/1808.05377.pdf
• AutoML online book: http://www.automl.org/book/
• A blog for Neural Architecture Search:
https://lilianweng.github.io/lil-log/2020/08/06/neural-
architecture-search.html
45. Bayesian Optimization Framework Overview
• GPyOpt: A Bayesian Optimization library based on GPy. Not
actively developed and maintained as of now.
• Ray-Tune: Have multiple hyperparameter tuning frameworks
integrated, including BayesianOptimization and BoTorch.
• BoTorch: A flexible and scalable Bayesian Optimization library
based on GPyTorch and PyTorch.
46. Advantages of BoTorch
• BoTorch provides a modular and easily extensible interface for
composing Bayesian optimization primitives, including
probabilistic models, acquisition functions, and optimizers.
• BoTorch harnesses the power of PyTorch, including auto-
differentiation, native support for highly parallelized modern
hardware (e.g. GPUs) using device-agnostic code, and a dynamic
computation graph.
47. BoTorch Models: SingleTaskGP
• A single-task exact GP model. Use this model when you have
independent output(s) and all outputs use the same training
data.
49. BoTorch Models: MultiTaskGP
• Multi-task exact GP that uses a simple ICM (Intrinsic
Coregionalization Model) kernel.
50. BoTorch Model Fitting: fit_gpytorch_model
Parameters:
• mll (MarginalLogLikelihood) – MarginalLogLikelihood to be maximized
• optimizer (Callable) – The optimizer function
• kwargs (Any) – Arguments passed along to the optimizer function
53. BoTorch Optimizers: optimize_acqf
• Optimize the acquisition function
• Parameters
• acq_function (AcquisitionFunction) – An AcquisitionFunction.
• bounds (Tensor) – A 2 x d tensor of lower and upper bounds for each
column of X.
• q (int) – The number of candidates.
• num_restarts (int) – The number of starting points for multistart
acquisition function optimization.
• raw_samples (int) – The number of samples for initialization.
59. LinkedIn Feed
Mission: Enable
Members to build an
active professional
community that
advances their career.
Heterogenous List:
• Shares from a
member’s connections
• Recommendations
such as jobs, articles,
courses, etc.
• Sponsored content or
ads
Member
Shares
Jobs
Articles
Ads
60. Important Metrics
Viral Actions (VA)
Members liked, shared or
commented on an item
Job Applies (JA)
Members applied for a job
Engaged Feed Sessions (EFS)
Sessions where a member
engaged with anything on
feed.
61. Ranking Function
𝑆 𝑚, 𝑢 ≔ 𝑃𝑉𝐴 𝑚, 𝑢 + 𝑥 𝐸𝐹𝑆 𝑃𝐸𝐹𝑆 𝑚, 𝑢 + 𝑥𝐽𝐴 𝑃𝐽𝐴 𝑚, 𝑢
• The weight vector 𝑥 = 𝑥 𝐸𝐹𝑆, 𝑥𝐽𝐴 controls the balance between the three business
metrics: EFS, VA and JA.
• A Sample Business Strategy is
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒. 𝑉𝐴(𝑥)
𝑠. 𝑡. 𝐸𝐹𝑆 𝑥 > 𝑐 𝐸𝐹𝑆
𝐽𝐴 𝑥 > 𝑐𝐽𝐴
• m – Member, u - Item
62. ● 𝑌𝑖,𝑗
𝑘
𝑥 ∈ 0,1 denotes if the 𝑖-th member during the 𝑗-th session which was
served by parameter 𝑥, did action k or not. Here k = VA, EFS or JA.
● We model this data as follows
● Assume a Gaussian process prior on each of the latent function 𝑓𝑘 .
Modeling The Metrics
𝑖 𝑗
𝑌𝑖,𝑗(𝑥) ~ Gaussian 𝑓𝑘 𝑥 , 𝜎2
𝑓𝑘(𝑥) ~ N 0, 𝐾 𝑅𝐵𝐹(𝑥, 𝑥 )
63. Reformulation
We approximate each of the metrics as:
𝑉𝐴 𝑥 = 𝑓𝑉𝐴(𝑥)
E 𝐹𝑆 𝑥 = 𝑓𝐸𝐹𝑆(𝑥)
𝐽𝐴 𝑥 = 𝑓𝐽𝐴 (𝑥)
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒. 𝑉𝐴(𝑥)
𝑠. 𝑡. EF 𝑆 𝑥 > 𝑐 𝐸𝐹𝑆
JA 𝑥 > 𝑐𝐽𝐴
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓𝑉𝐴(𝑥)
𝑠. 𝑡. 𝑓𝐸𝐹𝑆(𝑥) > 𝑐 𝐸𝐹𝑆
𝑓𝐽𝐴(𝑥) > 𝑐𝐽𝐴
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥
𝑥 ∈ 𝑋
Benefit: The last problem can now be solved using techniques from the literature of
Bayesian Optimization.
The original optimization problem can be written through this parametrization.
65. ● Main driver of network growth at LinkedIn.
● Besides network growth, Engagement is another Important piece.
Business Metrics
● Accept Rate: the fraction of accepted invitations out of all invitations
sent from PYMK.
● Invite Rate: the fraction of impressions for which members send invitations,
out of all impressions delivered.
● Member Engagement Metric: the fraction of invitations sent that helps
member engagement.
PYMK Recommendation
66. PYMK Scoring Function
𝑆 𝑚, 𝑐 ≔ 𝑃𝐼𝑛𝑣𝑖𝑡𝑒𝐴𝑐𝑐𝑒𝑝𝑡 𝑚, 𝑐 ∗ (1 + 𝛼 𝑃𝐸𝑛𝑔𝑎𝑔𝑒 𝑚, 𝑐 )
• pInviteAccept is a score predicting a combination
of accept score and invite score.
• pEngage is a score predicting member engagement score.
• 𝛼 is the hyper-parameter we need to tune
• m – Member, c - candidate
67. Reformulation
We approximate each of the metrics as:
𝐼𝑛𝑣𝑖𝑡𝑒 𝑥 = 𝑓𝑖𝑛𝑣𝑖𝑡𝑒(𝑥)
Accept 𝑥 = 𝑓𝑎𝑐𝑐𝑒𝑝𝑡(𝑥)
𝐸𝑛𝑔𝑎𝑔𝑒 𝑥 = 𝑓𝑒𝑛𝑔𝑎𝑔𝑒 (𝑥)
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒. 𝐼𝑛𝑣𝑖𝑡𝑒(𝑥)
𝑠. 𝑡. Accept 𝑥 > 𝑐 𝑎𝑐𝑐𝑒𝑝𝑡
Engage 𝑥 > 𝑐 𝑒𝑛𝑔𝑎𝑔𝑒
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓𝑎𝑐𝑐𝑒𝑝𝑡(𝑥)
𝑠. 𝑡. 𝑓𝑖𝑛𝑣𝑖𝑡𝑒(𝑥) > 𝑐𝑖𝑛𝑣𝑖𝑡𝑒
𝑓𝑒𝑛𝑔𝑎𝑔𝑒(𝑥) > 𝑐 𝑒𝑛𝑔𝑎𝑔𝑒
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥
𝑥 ∈ 𝑋
Benefit: The last problem can now be solved using techniques from the literature of
Bayesian Optimization.
The original optimization problem can be written through this parametrization.
68. ● 𝑌𝑖,𝑗
𝑘
𝑥 ∈ 0,1 denotes if the 𝑖-th member during the 𝑗-th session which was
served by parameter 𝑥, did action k or not. Here k = invite, accept, engage.
● We model this data as follows
● Assume a Gaussian process prior on each of the latent function 𝑓𝑘 .
Modeling The Metrics
𝑖 𝑗
𝑌𝑖,𝑗(𝑥) ~ Gaussian 𝑓 𝑥 , 𝜎2
𝑓𝑘(𝑥) ~ N 0, 𝐾 𝑅𝐵𝐹(𝑥, 𝑥 )
69. LinkedIn Feed is mixture of OU and SU
Organic
Update
Sponsored
Update
Share
Comment
Like
Revenue
70. Need to find a desired balance
Share
Comment
Like
Revenue
71. Revenue Engagement Tradeoff (RENT)
● RENT is a mechanism that blends sponsored updates (SU) and organic updates (OU) to
optimize revenue and engagement tradeoff
● Two important while conflicting objectives are considered
72. Fast blending system with a light-weight infrastructure
● High velocity on model dev
● Ranking invariance
● Calibration with autotuning
72
73. Ads use-case
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒. |𝑆𝑈𝑖𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝑠(𝐿𝐶𝐹) − 𝑆𝑈𝑖𝑚𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝑠(control)|
• Compare ads models M1 and M2 by making sure each have equal impressions.
• This is controlled by LCF factor that affects the score and hence the ranking on the
feed.
• Increasing LCF gives higher score and hence increase rank which increases number of
impressions
75. Mathematical Abstraction – Constrained Optimization
max
𝑥
𝑦(𝑥)
𝑠. 𝑡. 𝑦1 𝑥 ≥ 𝑐1, ⋯ , 𝑦 𝑘 𝑥 ≥ 𝑐 𝑘 ⋯ , 𝑦 𝐾 𝑥 ≥ 𝑐 𝐾
Goal
• When launching a new model online, we need to tune
hyperparameters to optimize the major metric while making sure
the guardrail metrics do not drop
Notation
• 𝑥: The hyperparameter to tune
• 𝑦(𝑥): The major business metric to optimize
• 𝑦 𝑘 𝑥 : The guardrail metrics
• 𝑐 𝑘: The constraint value
76. Model Each Metric via Gaussian Process
Examples
• 𝑦(𝑥): Member viral actions in Feed; invitation rate in PYMK
• 𝑦 𝑘 𝑥 : Job application metrics in Feed; acceptance rate in PYMK
Use Gaussian Process to model online metrics.
Different distributions can be used (𝑛 𝑥 is a normalizer if we use
Binomial or Poisson distribution):
• 𝑦 𝑥 ~𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑓 𝑥 , 𝜎2
, 𝑓 𝑥 ~𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 0, 𝐾 𝑥, 𝑥
• 𝑦 𝑥 ~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛 𝑥 , 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑓 𝑥 ) , 𝑓 𝑥 ~𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 0, 𝐾 𝑥, 𝑥
• 𝑦 𝑥 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝑒 𝑛 𝑥 𝑓(𝑥)
, 𝑓 𝑥 ~𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 0, 𝐾 𝑥, 𝑥
77. Combine All Metrics – Method of Lagrange Multipliers
Equivalent to maximize the following
𝐿 𝑥 = 𝑦 𝑥 + 𝜆1 𝜎 𝛼 𝑦1 𝑥 − 𝑐1 + ⋯ 𝜆 𝐾 𝜎 𝛼 𝑦 𝐾 𝑥 − 𝑐 𝐾
𝜎 𝛼(𝑧) = 1 + 𝑒−𝛼𝑧 −1
• 𝜎 𝛼 is a smoothing Sigmoid function which take the value close to
1 for positive 𝑧 and close to 0 for negative 𝑧.
• 𝜆 𝑘 𝜎 𝛼 𝑦 𝑘 𝑥 − 𝑐 𝑘 penalizes constraint violation.
• 𝜆 𝑘 is a positive multiplier, which is tunable in different
applications
• The optimal value of 𝐿(𝑥) optimizes 𝑦 𝑥 while satisfying all
constraints.
78. Langrange Multipliers + Gaussian Process
𝐿(𝑥) = 𝑦 𝑥 + 𝜆1 𝜎 𝛼 𝑦1 𝑥 − 𝑐1 + ⋯ 𝜆 𝐾 𝜎 𝛼(𝑦 𝐾 𝑥 − 𝑐 𝐾)
Posterior samples of 𝑦 𝑥 and 𝑦 𝑘 𝑥 could be easily obtained via
Gaussian Process Regression. So could 𝐿(𝑥).
79. Collecting Metrics
• Metrics 𝑦 𝑥 and 𝑦 𝑘 𝑥 are collected on equal spaced Sobol
points 𝑥1, ⋯ , 𝑥 𝑛.
• Sobol points offer a lower discrepancy than the same number of
random numbers.
80. Reweighting Points
• Initially recommender systems are served equally by 𝑥1, ⋯ , 𝑥 𝑛.
• After collecting metrics and apply Gaussian Process Regression,
we generate samples from 𝐿(𝑥).
• We need to assign each point 𝑥𝑖 a probability value 𝑝𝑖 to split
traffic.
81. Thompson Sampling
• Use Thompson Sampling to assign probability value 𝑝𝑖 to each
point 𝑥𝑖
• Step 1: For all candidate points 𝑥1, ⋯ , 𝑥 𝑛, obtain posterior samples from
𝐿(𝑥): 𝐿 𝑥1 , 𝐿 𝑥2 , ⋯ , 𝐿 𝑥 𝑛 .
• Step 2: Pick 𝑥𝑖
∗
such that 𝐿 𝑥𝑖
∗
is the largest among
𝐿 𝑥1 , 𝐿 𝑥2 , ⋯ , 𝐿 𝑥 𝑛
• Repeat Step 1 and Step 2 1000 times.
• Step 3: Aggregate 𝑥1
∗
, ⋯ , 𝑥1000
∗
. Set 𝑝𝑖 as the frequency of 𝑥𝑖 in
𝑥1
∗
, ⋯ , 𝑥1000
∗
82. Intuition of Thompson Sampling
• Thompson Sampling is used to select the optimal
hyperparameters via sampling from a distribution
𝑥1, 𝑝1 , 𝑥2, 𝑝2 , ⋯ , 𝑥 𝑛, 𝑝 𝑛 among all candidates.
• 𝑝𝑖 is equal to the probability of each candidate being optimal
among all.
• Thompson Sampling is proved to converge to the optimal value
in various settings.
• Thompson Sampling is widely used in online A/B test, online
advertising and other applications.
83. Exploration versus Exploitation
• Exploration: explore sub-optimal but high potential regions in
the search space
• Exploitation: search around the current optimal candidate
• Thompson Sampling automatically balances exploration and
exploitation via a probability distribution
86. High Level Steps
We need to run the following steps in a loop
• Optimization Function emits hyper-parameter distribution
• Use the hyper-parameters in online serving
• Collect Metrics
• Call the optimization function
87. Goals of the Infrastructure
Few realities about the clients of the Optimization package
• Every team within LinkedIn can have different online
serving system. eg: Streaming, offline, rest end point.
• Tracking data and metrics for each system can be very
different. Eg. Different calculation methodology.
To make a minimum viable product and stay focused on
solving the optimization problem well we built the
Optimization package as a library.
88. Flow Architecture
Built the Optimization package as a library. It will expand as
a spark flow.
Client will call the Optimization package in its existing spark
offline flow.
Client flow will contain the following
• A hadoop/spark flow to collect metrics and aggregate.
• Call the Optimization package which expands into spark
flow
• A flow to use the output of the optimizer.
90. Notifications Example
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒. 𝑆𝑒𝑠𝑠𝑖𝑜𝑛𝑠(𝑡𝑟)
𝑠. 𝑡. 𝐶𝑙𝑖𝑐𝑘𝑠 𝑡𝑟 > 𝑐 𝐶𝑙𝑖𝑐𝑘𝑠
𝑆𝑒𝑛𝑑 𝑉𝑜𝑙𝑢𝑚𝑒 𝑡𝑟 < 𝑐 𝑆𝑒𝑛𝑑 𝑉𝑜𝑙𝑢𝑚𝑒
Send the right volume of notification at most opportune time
to improve sessions and clicks keeping send volume same.
𝑃𝑉𝑖𝑠𝑖𝑡 𝑚, 𝑖 : 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑣𝑖𝑠𝑖𝑡
𝑃𝐶𝑙𝑖𝑐𝑘 𝑚, 𝑖 : 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑐𝑙𝑖𝑐𝑘 on item i member m
We have the following ML models
91. Notifications Scoring Function
𝑆 𝑚, 𝑖 ≔ 𝑃𝐶𝑙𝑖𝑐𝑘 𝑚, 𝑖 + 𝑥 𝛼 𝑃𝑉𝑖𝑠𝑖𝑡 𝑚, 𝑖 > 𝑥 𝑡ℎ
• The weight vector 𝑥 = 𝑥 𝛼, 𝑥 𝑡ℎ controls the balance between the
business metrics: Sessions, Clicks, Send Volume.
• A Sample Business Strategy is
• m – Member, i - Item
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒. 𝑆𝑒𝑠𝑠𝑖𝑜𝑛𝑠(𝑥)
𝑠. 𝑡. 𝐶𝑙𝑖𝑐𝑘𝑠 𝑥 > 𝑐 𝐶𝑙𝑖𝑐𝑘𝑠
𝑆𝑒𝑛𝑑 𝑉𝑜𝑙𝑢𝑚𝑒 𝑥 < 𝑐 𝑆𝑒𝑛𝑑 𝑉𝑜𝑙𝑢𝑚𝑒
96. Tracking Data Aggregation
Tracking Data from user activity logs are available on HDFS.
The raw data gets aggregated by the client flows.
97. Library API: Problem Specification
• Problem Specification
• Treatment models and a control model
• Parameters and search range
• Objective metric and constraint metrics
• The number of exploration iterations
• Exploration – Exploitation Setup
• The algorithm starts with a few
exploration iterations that outputs uniform
serving probability.
• Exploration stage is the stage in
the algorithm which explores each point
equally while exploitation stage is the
stage to reassign probabilities to different
points.
99. Offline System
The heart of the product
Tracking
• All member activities are
tracked with the
parameter of interest.
• ETL into HDFS for easy
consumption
Utility Evaluation
• Using the tracking data
we generate (𝑥, 𝑓𝑘 𝑥 )
for each function k.
• The data is kept in
appropriate schema that
is problem agnostic.
Bayesian Optimization
• The data and the problem
specifications are input to
this.
• Using the data, we first
estimate each of the
posterior distributions of
the latent functions.
• Sample from those
distributions to get
distribution of the
parameter 𝑥 which
100. • The Bayesian Optimization library generates
• A set of potential candidates for trying in the next round (𝑥1, 𝑥2, … , 𝑥 𝑛)
• A probability of how likely each point is the true maximizer (𝑝1, 𝑝2, … , 𝑝 𝑛)
such that 𝑖=1
𝑛
𝑝𝑖 = 1
• To serve members with the above distribution, each memberId is mapped to
[0,1] using a hashing function ℎ. For example
𝑖=1
𝑘
𝑝𝑖 < ℎ 𝑢𝑠𝑒𝑟 ≤ 𝑖=1
𝑘+1
𝑝𝑖
Then my feed is served with parameter 𝑥 𝑘+1
• The parameter store (depending on use-case) can contain
• <parameterValue, probability> i.e. 𝑥𝑖, 𝑝𝑖 or
• <memberId, parameterValue>
The Parameter Store and Online Serving
101. Online System
Serving hundreds of millions of users
Parameter Sampling
• For each member 𝑚 visiting
LinkedIn,
• Depending on the parameter
store, we either evaluate <m,
parameterValue>
• Or we directly call the store to
retrieve <m,
parameterValue>
Online Serving
• Depending on the
parameter value that is
retrieved (say x), the
notification item is scored
according to the ranking
function and served
𝑆 𝑚, 𝑢 ≔ 𝑃𝐶𝑙𝑖𝑐𝑘 𝑚, 𝑖 +
𝑥 𝛼 𝑃𝑉𝑖𝑠𝑖𝑡 𝑚, 𝑖 > 𝑥𝑡ℎ
103. Entities
• Task is the single unit of a problem that we want to solve.
• OptimizationToken can be either objective or a constraint.
A given task object can have multiple
optimizationTokens.
• UtilitySegment encapsulates all the information for
modelling a single latent function f 𝑘.
• ObservationDataToken is used to store observed data.
The raw metrics 𝑦 (vector), hyper parameters 𝑥 (matrix),
sampleSize 𝑛 (vector)
106. Plots (When tuning 1d parameter)
Shows how Send Volume for a cohort
(FourByFour) changes as we change threshold.
The red curve is the observed metrics.
The middle blue line is the mean of the GP
estimate.
The left and right blue lines are the GP variance
estimates.
109. Operational Efficiency
• Feed model tuning needs 0.5x traffic and improve the
tuning speed by 2.5x .
• Ads model tuning improved by 4x.
• Notification model tuning improved by 3x.
110. Key Takeaways
● Removes the human in the loop: Fully automatic process to find
the optimal parameters.
● Drastically improves developer productivity.
● Can scale to multiple competing metrics.
● Very easy onboarding infra for multiple vertical teams. Currently
used by Ads, Feed, Notifications, PYMK, etc.
112. The Simplest Optimization Problem
Our goal is to efficiently solve an optimization
problem of the form
Subject to constraints
113. Basic Bayesian Optimization
Observe data as
where the error term is assumed to follow a normal distribution,
and the mean function of interest is assumed to follow a
Gaussian process
with mean mu, typically assumed to be identically zero, and
covariance kernel K (common choices include the RBF kernel or
114. Some Natural Extensions
1) Often there is time dependence in A/B experiments.
• Day over-day fluctuations
• User behavior may differ on weekdays vs weekends
2) Can we incorporate other sources of information into the
optimization?
3) How can we address heterogeneity in users’ affinity towards
treatments
115. Time Dependence
Traditional Bayesian optimization strongly assumes metrics are stable
over time. Is this a problem?
It can be when using explore/exploit methods
Examples of time dependence in A/B experiments.
• Day over-day fluctuations
• User behavior may differ on weekdays vs weekends
If an exploit step suggests a new point to sample, but this occurs on
a Saturday when users tend to be more active, this may give an
overly optimistic sense of the impact
116. Additional Sources of Information
It is common in the internet industry to iterate through a large
volume of models. Past experiment data may be suggestive towards
the best hyperparameters for a new model
Many companies have developed “offline replay” systems which
simulate the results that would have been seen had users been given
a particular treatment.
Such sources of additional information can be leveraged to expedite
optimization
117. “Personalized” Hyperparameter Tuning
It is plausible that different groups of members may have different
affinity towards model hyperparameters
For instance, active users of a platform may be very interested in
frequent updates
On the other hand, less active users may not appreciate frequent
pings
This can be handled in a notification system by allowing a separate
hyperparameter for the active and less active users.
118. Commonalities
• Each of these extensions are easily accommodated through
modifications of the Bayesian optimization framework
• They are all used to improve the speed and accuracy of function fits
120. Time Dependence
A/B experiment metrics often see variations in time, such as day
over-day fluctuations or differences on weekdays vs weekends or
holidays
Bayesian optimization typically assumes data is observed as
a more realistic model might be to assume we have a time
dependent error term in addition to the measurement error:
121. Reformulating this model
The model incorporating time fluctuations can be expressed as
That is, we are now trying to optimize for a function that is allowed to
vary with time!
Practically, this can be accomplished by simple adjustments to the
covariance kernel.
122. Time Dependent Gaussian Processes
Rather than assuming the function follows a Gaussian process, i.e.
we can instead assume that the function is a Gaussian process
indexed by the hyperparameter and time:
And Bayesian optimization proceeds exactly as in the time-invariant
case!
123. A Simple Example of a Time Dependent Kernel
A simple example of a kernel for a time dependent Gaussian process
is
which specifies
and is equivalent to
126. Additional Sources of Information
It is common in the internet industry to iterate through a large
volume of models. Past experiment data may be suggestive
towards the best hyperparameters for a new model
Many companies have developed “offline replay” systems which
simulate the results that would have been seen had users been
given a particular treatment.
Such sources of additional information can be leveraged to
expedite our predictions of the metrics of interest
127. Joint Model Of Experiment And Additional Information
Assume we observe metrics of interest as
as well as another “proxy” metric
and we believe that g is informative of the function of interest f
We can jointly model the functions f and g to improve our predictions
of f
128. Joint Gaussian Processes
We still assume
and also that
Further assume the Gaussian processes f and g are
correlated. Namely
(Note that the common marginal priors is not a
129. Extensions To Multiple Sources Of Information
Suppose we are interested in modelling C different Gaussian
Processes,
A simple joint model is specified by the Intrinsic Coregionalization
Model (ICM). This specifies that marginally
and also jointly
130. Notification Sessions Modeling
Send a notification if a score exceeds a threshold
Our goal is to estimate the sessions that would be observed for a given
threshold
The sessions induced by sending a notification are easily modelled, for
instance by a survival model
Based on offline data, we can estimate the induced sessions as
132. Comparison of GP Fit With Left Out Data
Mean Squared Error:
• Without offline data:
0.00928
• With offline data:
0.00325
Relative efficiency: 285%
134. Group Definitions
Often, segments of users can be given by ad-hoc definitions based
on domain specific knowledge
Demographic segments (e.g. students vs full-time employees)
• Usage patterns (e.g. heavy vs light users)
Recently, advances have been made towards identifying
heterogeneous cohorts of users which can be leveraged in the
cohort definition phase
135. Causal Discovery of Groups
1. Launch an A/B experiment for a variety of hyperparameter values
2. Using this randomized experiment data, apply an automated
method for discovering subgroups of the population which
demonstrate heterogeneity in treatment effect
3. An example of such a tool is the “causal tree” methodology
proposed by Athey and Imbens (2016)
4. This takes in user features (usage, demographics, etc), and results
in a tree whose splits are chosen according to heterogeneity in
treatment effect
136. Optimization by Group
We now want to give each group a hyperparameter value which
globally optimizes the metrics of interest.
Suppose we have G groups, and each group g is exposed to
hyperparameter xg
Our optimization problem can be expressed as
which is generally not tractable
137. Grey Box Optimization
Generally, the metric of interest can be expressed as a (weighted)
average of the metric across the cohorts
That is,
so we estimate the objective metric (and similarly constraint metrics)
separately for each cohort as a function of the hyperparameter for
that cohort
This estimation requires dramatically less data than estimating a
single function of G variables
138. Grey Box Optimization
Using the decomposition
Bayesian optimizations (explore/exploit) proceeds exactly as
previously seen
Now, the mean and variances of the objective and constraints are
simply the sum of the independent GPs for each cohort
139. A Simple Example
Send a notification if a score exceeds a threshold
Ad-hoc cohorts based on usage frequency:
• Four by four: daily active users
• One by three: weekly active users
• One by one: monthly active users
• Onboarding
• Dormant
140. Influence Of One Parameter On Number Of Sent Messages
Marginal plot of Send Volume for FourByFour
members
The two red lines are maximum and minimum
over all cohorts except FourByFour
The blue lines are the variance estimates
141. Summary
• An advantage of Bayesian optimization is that it is easily
modified to suit the nuances of the optimization problem at
hand
• Some easy extensions
• Incorporate time dependence seen in many online
experiments
• Incorporate additional sources of information
• Personalize user experience through group specific
optimization
142. Thank you
Marco Alvarado, Mavis Li, Yafei Wang, Jason Xu, Jinyun Yan, Shuo Yang, Wensheng
Sun, Yiping Yuan, Ajith Muralidharan, Matthew Walker, Boyi Chen, Jun Jia ,
Haichao Wei, Huiji Gao, Zheng Li, Mohsen Jamali, Shipeng Yu,
Hema Raghavan, Romer Rosales