A review of the paper “Ad Click Prediction: a View from the Trenches”
The paper discusses predicting ad click--through rates (CTR) which is a massive-scale learning problem central to the multi-billion dollar online advertising industry.
Presented by Mazen & Arzam in the Data Intensive Computing class at KTH, Stockholm, Sweden.
Link of the paper: http://research.google.com/pubs/pub41159.html
Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned ...Databricks
CTR prediction algorithms are essential, and are used extensively for ads bidding and sponsored search. While logistic regression models have proven effective for this kind of problem, rapid growth in the amount of data has created a lot of challenges. For example, how to train a logistic regression model with billions of parameters in a commodity hardware cluster, or how to improve the model’s accuracy with better feature engineering. Other challenges include figuring out how to benefit from popular deep learning technologies to reduce the dependence on human labor and expert knowledge, and how to improve job performance given such a complicated workload.
At Spark Summit East 2017, Hortonworks introduced vector-free L-BFGS to conquer the scalability challenge of MLlib and provide a very scalable logistic regression implementation. In this talk, hear about their experience integrating this implementation with different feature learning technologies to solve Ad CTR prediction problems, and the lessons they learned.
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016PAPIs.io
Machine learning as a service (MLaS) is imperative to the success of many companies as many internal teams and organizations need to gain business intelligence from big data. Building a scalable MLaS in a very challenging problem. In this paper, we present the scalable MLaS we built for a company that operates globally. We focus on several scalability challenges and our technical solutions.
Video at https://www.youtube.com/watch?v=MpnszJ_3Ong
Couldn't attend PAPIs '16? Get access to the other presentations' slides and videos at https://gumroad.com/products/fehon/
Extracting information from images using deep learning and transfer learning ...PAPIs.io
For online businesses, recommender systems are paramount. There is an increasing need to take into account all the user information to tailor the best product offer, tailored to each new user.
Part of that information is the content that the user actually sees: the visuals of the products. When it comes to products like luxury hotels, pictures of the room, the building or even the nearby beach can significantly impact users’ decision.
In this talk, we will describe how we improved an online vacation retailer recommender system by using the information in images. We’ll explain how to leverage open data and pre-trained deep learning models to derive information on user taste. We will use a transfer learning approach that enables companies to use state of the art machine learning methods without needing deep learning expertise.
A review of the paper “Ad Click Prediction: a View from the Trenches”
The paper discusses predicting ad click--through rates (CTR) which is a massive-scale learning problem central to the multi-billion dollar online advertising industry.
Presented by Mazen & Arzam in the Data Intensive Computing class at KTH, Stockholm, Sweden.
Link of the paper: http://research.google.com/pubs/pub41159.html
Large-Scale Ads CTR Prediction with Spark and Deep Learning: Lessons Learned ...Databricks
CTR prediction algorithms are essential, and are used extensively for ads bidding and sponsored search. While logistic regression models have proven effective for this kind of problem, rapid growth in the amount of data has created a lot of challenges. For example, how to train a logistic regression model with billions of parameters in a commodity hardware cluster, or how to improve the model’s accuracy with better feature engineering. Other challenges include figuring out how to benefit from popular deep learning technologies to reduce the dependence on human labor and expert knowledge, and how to improve job performance given such a complicated workload.
At Spark Summit East 2017, Hortonworks introduced vector-free L-BFGS to conquer the scalability challenge of MLlib and provide a very scalable logistic regression implementation. In this talk, hear about their experience integrating this implementation with different feature learning technologies to solve Ad CTR prediction problems, and the lessons they learned.
Scaling machine learning as a service at Uber — Li Erran Li at #papis2016PAPIs.io
Machine learning as a service (MLaS) is imperative to the success of many companies as many internal teams and organizations need to gain business intelligence from big data. Building a scalable MLaS in a very challenging problem. In this paper, we present the scalable MLaS we built for a company that operates globally. We focus on several scalability challenges and our technical solutions.
Video at https://www.youtube.com/watch?v=MpnszJ_3Ong
Couldn't attend PAPIs '16? Get access to the other presentations' slides and videos at https://gumroad.com/products/fehon/
Extracting information from images using deep learning and transfer learning ...PAPIs.io
For online businesses, recommender systems are paramount. There is an increasing need to take into account all the user information to tailor the best product offer, tailored to each new user.
Part of that information is the content that the user actually sees: the visuals of the products. When it comes to products like luxury hotels, pictures of the room, the building or even the nearby beach can significantly impact users’ decision.
In this talk, we will describe how we improved an online vacation retailer recommender system by using the information in images. We’ll explain how to leverage open data and pre-trained deep learning models to derive information on user taste. We will use a transfer learning approach that enables companies to use state of the art machine learning methods without needing deep learning expertise.
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Josef A. Habdank
Prediction using Machine Learning (ML) techniques on Big Data is a computationally and system-wide challenging problem. Especially in the case when the system is processing approximately 10^9 observations per day scalability is the prime concern. In order to be able to rapidly train models covering whole multivariate space the time series vectors, which exhibit significant similarities, are clustered into the groups. Consequently the resulting vector clusters could be modelled using ML tools capable of coefficient estimation at the massive scale (Apache Spark with Scikit Learn). Presentation describes application of the Linear Regression and Support Vector Regression with Radial Basis Function kernel. This approach enables training models fast enough to complete the task within a couple of hours, allowing daily or even real time updates of the coefficients. The above machine learning framework is used to predict the airfares used as support tool for the Revenue Management systems.
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
The concept of talk is as follows: - to give a general idea about user segmentation task in DMP project and how solving this problem helps our business - to tell how we use autoML to solve this task and to explain its components - to give insights about techniques we apply to make our pipeline fast and stable on huge datasets
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
Hua Jiang and Kedar Sadekar talked about feature engineering using time rewinding in the context of Netflix Recommendations at an ML Platform meetup at LinkedIn HQ. Jan 24, 2018
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
In this talk, we will explore how Uber enables rapid experimentation of machine learning models and optimization algorithms through the Uber’s Data Science Workbench (DSW). DSW covers a series of stages in data scientists’ workflow including data exploration, feature engineering, machine learning model training, testing and production deployment. DSW provides interactive notebooks for multiple languages with on-demand resource allocation and share their works through community features.
It also has support for notebooks and intelligent applications backed by spark job servers. Deep learning applications based on TensorFlow and Torch can be brought into DSW smoothly where resources management is taken care of by the system. The environment in DSW is customizable where users can bring their own libraries and frameworks. Moreover, DSW provides support for Shiny and Python dashboards as well as many other in-house visualization and mapping tools.
In the second part of this talk, we will explore the use cases where custom machine learning models developed in DSW are productionized within the platform. Uber applies Machine learning extensively to solve some hard problems. Some use cases include calculating the right prices for rides in over 600 cities and applying NLP technologies to customer feedbacks to offer safe rides and reduce support costs. We will look at various options evaluated for productionizing custom models (server based and serverless). We will also look at how DSW integrates into the larger Uber’s ML ecosystem, e.g. model/feature stores and other ML tools, to realize the vision of a complete ML platform for Uber.
In this deck I’m going to show you how SigOpt can help you amplify your trading models by optimally tuning them using our black-box optimization platform.
In this video I’m going to show you how SigOpt can help you amplify your machine learning and AI models by optimally tuning them using our black-box optimization platform.
Video: https://youtu.be/EjGrRxXWg8o
The SigOpt platform provides an ensemble of state-of-the-art Bayesian and Global optimization algorithms via a simple Software-as-a-Service API.
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.
In this project, Spark is used in the whole pipeline: retrieving and processing the search phrases and their results, making click models, creating feature sets, training and evaluating ranking models, pushing the models to production using ElasticSearch and creating Tableau dashboarding. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
Zipline - A Declarative Feature Engineering FrameworkDatabricks
Zipline is Airbnb’s data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks.
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Fred Madrid
Adyen enables integrating companies to accept payments from their customers using any payment method over any sales channel. We have designed and implemented a time series forecasting algorithm that allows us to predict the volume for each integration with confidence and thus be able to flag anomalies such as traffic drop or abnormally low traffic. We are using Apache Spark as our computational engine both to make this data available to the training process as well as to train over years of data in a scalable way. The prediction performances are benchmarked and the models are served in production through custom real-time monitoring and alerting infrastructure that uses ElasticSearch as hot storage. With this state-of-the-art solution, Adyen knows whether a problem happened and can alert the operational teams accordingly in a record time.
‘This presentation will cover the journey we took with focus on the mathematical concepts, the present time constraints, the prediction performances, and the architecture needed to make this happen. We’ll go over lessons learned, pitfalls, and best practices discovered on modeling time series datasets with Apache Spark. Data Scientists would be able to gain insights on applying effective and real-life seasonality modeling techniques. We’ll share our approaches used for sub-millisecond model serving that would inspire Data Engineers who work on related problems.
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
From training billions of ad impressions to scaling gradient boosted trees with more than three million nodes, Ad Targeting at Yelp uses Apache Spark in many stages of its large-scale machine learning pipeline.
This session will explore examples of how Yelp employed and tweaked Spark to support big data feature engineering, visualizations and machine learning model training, evaluation and diagnostics. You’ll also hear about the challenges in building and deploying such a large-scale intelligent system in a production environment.
In display and mobile advertising, the most significant development in recent years is the Real-Time Bidding (RTB), which allows selling and buying in real-time one ad impression at a time. The ability of making impression level bid decision and targeting to an individual user in real-time has fundamentally changed the landscape of the digital media. The further demand for automation, integration and optimisation in RTB brings new research opportunities in the IR fields, including information matching with economic constraints, CTR prediction, user behaviour targeting and profiling, personalised advertising, and attribution and evaluation methodologies. In this tutorial, teamed up with presenters from both the industry and academia, we aim to bring the insightful knowledge from the real-world systems, and to provide an overview of the fundamental mechanism and algorithms with the focus on the IR context. We will also introduce to IR researchers a few datasets recently made available so that they can get hands-on quickly and enable the said research.
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Josef A. Habdank
Prediction using Machine Learning (ML) techniques on Big Data is a computationally and system-wide challenging problem. Especially in the case when the system is processing approximately 10^9 observations per day scalability is the prime concern. In order to be able to rapidly train models covering whole multivariate space the time series vectors, which exhibit significant similarities, are clustered into the groups. Consequently the resulting vector clusters could be modelled using ML tools capable of coefficient estimation at the massive scale (Apache Spark with Scikit Learn). Presentation describes application of the Linear Regression and Support Vector Regression with Radial Basis Function kernel. This approach enables training models fast enough to complete the task within a couple of hours, allowing daily or even real time updates of the coefficients. The above machine learning framework is used to predict the airfares used as support tool for the Revenue Management systems.
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
When making machine learning applications in Uber, we identified a sequence of common practices and painful procedures, and thus built a machine learning platform as a service. We here present the key components to build such a scalable and reliable machine learning service which serves both our online and offline data processing needs.
The concept of talk is as follows: - to give a general idea about user segmentation task in DMP project and how solving this problem helps our business - to tell how we use autoML to solve this task and to explain its components - to give insights about techniques we apply to make our pipeline fast and stable on huge datasets
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
Hua Jiang and Kedar Sadekar talked about feature engineering using time rewinding in the context of Netflix Recommendations at an ML Platform meetup at LinkedIn HQ. Jan 24, 2018
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
In this talk, we will explore how Uber enables rapid experimentation of machine learning models and optimization algorithms through the Uber’s Data Science Workbench (DSW). DSW covers a series of stages in data scientists’ workflow including data exploration, feature engineering, machine learning model training, testing and production deployment. DSW provides interactive notebooks for multiple languages with on-demand resource allocation and share their works through community features.
It also has support for notebooks and intelligent applications backed by spark job servers. Deep learning applications based on TensorFlow and Torch can be brought into DSW smoothly where resources management is taken care of by the system. The environment in DSW is customizable where users can bring their own libraries and frameworks. Moreover, DSW provides support for Shiny and Python dashboards as well as many other in-house visualization and mapping tools.
In the second part of this talk, we will explore the use cases where custom machine learning models developed in DSW are productionized within the platform. Uber applies Machine learning extensively to solve some hard problems. Some use cases include calculating the right prices for rides in over 600 cities and applying NLP technologies to customer feedbacks to offer safe rides and reduce support costs. We will look at various options evaluated for productionizing custom models (server based and serverless). We will also look at how DSW integrates into the larger Uber’s ML ecosystem, e.g. model/feature stores and other ML tools, to realize the vision of a complete ML platform for Uber.
In this deck I’m going to show you how SigOpt can help you amplify your trading models by optimally tuning them using our black-box optimization platform.
In this video I’m going to show you how SigOpt can help you amplify your machine learning and AI models by optimally tuning them using our black-box optimization platform.
Video: https://youtu.be/EjGrRxXWg8o
The SigOpt platform provides an ensemble of state-of-the-art Bayesian and Global optimization algorithms via a simple Software-as-a-Service API.
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
As a leading e-commerce company in fashion in the Netherlands, Wehkamp dedicates itself to provide a better shopping experience for the customers. Using Spark, the data science team is able to develop various machine-learning projects for this purpose based on the large scale data of products and customers. A major topic for the data science team is ranking products. If a visitor enters a search phrase, what are the best products that fit the search phrase and in what order should the products been shown? Ranking products is also important if a visitor enters a product overview page, where hundreds or even thousands of products of a certain article type are displayed.
In this project, Spark is used in the whole pipeline: retrieving and processing the search phrases and their results, making click models, creating feature sets, training and evaluating ranking models, pushing the models to production using ElasticSearch and creating Tableau dashboarding. In this talk, we are going to demonstrate how we use Spark to build up the whole pipeline of ranking products and the challenges we faced along the way.
Zipline - A Declarative Feature Engineering FrameworkDatabricks
Zipline is Airbnb’s data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks.
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Fred Madrid
Adyen enables integrating companies to accept payments from their customers using any payment method over any sales channel. We have designed and implemented a time series forecasting algorithm that allows us to predict the volume for each integration with confidence and thus be able to flag anomalies such as traffic drop or abnormally low traffic. We are using Apache Spark as our computational engine both to make this data available to the training process as well as to train over years of data in a scalable way. The prediction performances are benchmarked and the models are served in production through custom real-time monitoring and alerting infrastructure that uses ElasticSearch as hot storage. With this state-of-the-art solution, Adyen knows whether a problem happened and can alert the operational teams accordingly in a record time.
‘This presentation will cover the journey we took with focus on the mathematical concepts, the present time constraints, the prediction performances, and the architecture needed to make this happen. We’ll go over lessons learned, pitfalls, and best practices discovered on modeling time series datasets with Apache Spark. Data Scientists would be able to gain insights on applying effective and real-life seasonality modeling techniques. We’ll share our approaches used for sub-millisecond model serving that would inspire Data Engineers who work on related problems.
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
From training billions of ad impressions to scaling gradient boosted trees with more than three million nodes, Ad Targeting at Yelp uses Apache Spark in many stages of its large-scale machine learning pipeline.
This session will explore examples of how Yelp employed and tweaked Spark to support big data feature engineering, visualizations and machine learning model training, evaluation and diagnostics. You’ll also hear about the challenges in building and deploying such a large-scale intelligent system in a production environment.
In display and mobile advertising, the most significant development in recent years is the Real-Time Bidding (RTB), which allows selling and buying in real-time one ad impression at a time. The ability of making impression level bid decision and targeting to an individual user in real-time has fundamentally changed the landscape of the digital media. The further demand for automation, integration and optimisation in RTB brings new research opportunities in the IR fields, including information matching with economic constraints, CTR prediction, user behaviour targeting and profiling, personalised advertising, and attribution and evaluation methodologies. In this tutorial, teamed up with presenters from both the industry and academia, we aim to bring the insightful knowledge from the real-world systems, and to provide an overview of the fundamental mechanism and algorithms with the focus on the IR context. We will also introduce to IR researchers a few datasets recently made available so that they can get hands-on quickly and enable the said research.
Int'l Conference on Predictive APIs: RTB Optimizer presentationDatacratic
Real-time bidding, in the context of digital marketing, refers to the purchase of advertising impressions one at a time, responding to tens of thousands of messages per second, paying a different price for each via an auction mechanism. This talk will cover in detail how Datacratic’s RTB Optimizer Prediction API predicts the outcome of buying a given impression, then computes the economic value of that outcome to produce optimal bidding behaviour.
Advanced Optimization for the Enterprise WebinarSigOpt
Building on the TWIML eBook, TWIMLcon event and TWIML podcast series that explore Machine Learning Platforms in great detail, this webinar examines the machine learning platforms that power enterprise leaders in AI. SigOpt CEO Scott Clark will provide an overview of critical technical capabilities that our customers have prioritized in their ML platforms.
Review these slides to learn about:
- Critical capabilities for data, experiment and model management
- Tradeoffs between building and buying these capabilities
- Lessons from the implementation of these platforms by AI leaders
Why focus on these platforms and the capabilities that power them? Nearly every company is investing in machine learning that differentiates products or generates revenue. These so-called "differentiated models" represent the biggest opportunity for AI to transform the business. Most of these teams find success hiring expert data scientists and machine learning engineers who can build these models. But most of these teams also struggle to create a more sustainable, scalable and reproducible process for model development, and have begun building ML platforms to tackle this challenge.
From Labelling Open data images to building a private recommender systemPierre Gutierrez
Recommender systems are paramount for e-business companies. There is an increasing need to take into account all the user information to tailor the best product proposition. One of them is the content that the user actually sees: the visual of the product.
When it comes to hostels, some people can be more attracted by pictures of the room, the building or even the nearby beach.
In this talk, we will describe how we improved an e-business vacation retailer recommender system using the content of images. We’ll explain how to leverage open dataset and pre-trained deep learning models to derive user taste information. This transfer learning approach enables companies to use state of the art machine learning methods without having deep learning expertise.
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Webanalisten .nl
Keynote of the 4th #DDTT meetup (http://digitaldata.tips) in Amsterdam, the Netherlands. Guy Liberali of RSM / Erasmus presented about optimization through algorithms: bandits and morphing websites.
Stay up to date about new #DDTT events through our meetup group: http://meetup.com/onlineoptimizers
Organizer: Webanalisten.nl / Online Optimizers meetup group
Sponsors: Adobe and Relay42
How to buy traffic from Facebook, Instagram and Facebook Audience NetworkTravelpayouts
Facebook is one of the biggest sources of mobile traffic in the whole world for now. Learn how to launch a successful campaign and convert your prospects into loyal customers!
Flash Series: Using Ad Testing Data To Set Up Foolproof TestsHanapin Marketing
Ads are the only part of a PPC account that a searcher actually sees. If your ads are poorly written, then you either won’t get traffic or the wrong type of traffic. There’s only one way to know which offer or ad is best for your account: by testing it!
In this presentation, Google AdWords Guru and Founder of Certified Knowledge, Brad Geddes, discusses how to work with ad testing data to set up tests, how and when to end tests, and how to choose your metrics.
You’ll get expert-level PPC tips like:
*The types and uses of different text ad tests
*Examples of ad tests with results
*Determining your segments and defining your metrics
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
4. DISPLAY ADVERTISING
• Rapidly growing multi-billion dollar business (30% of internet
advertising revenue in 2013).
• Marketplace between:
– Publishers: sell display opportunities
– Advertisers: pay for showing their ad
• Real Time Bidding:
– Auction amongst advertisers is held at the moment when a user generates a
display opportunity by visiting a publisher‟s web page.
4
5. BRANDING VS PERFORMANCE
PRICING TYPE
• CPM (Cost Per Mille): advertiser pays per thousand impressions
• CPC (Cost Per Click): advertiser pays only when the user clicks
• CPA (Cost Per Action): advertiser pays only when the user performs a
predefined action such as a purchase.
CAMPAIGN TYPE
• Branding
CPM
• Performance based advertising (retargeting)
CPC, CPA
CONVERSIONS
eCPM = CPC * predicted clickthrough rate
eCPM = CPA * predicted conversion rate
5
7. RECOMMENDATION
Collaborative filtering
Propose related items to a user based on
historical interactions from other users
Over 50% of Criteo driven sales come
from recommended products the user had
never viewed on advertiser websites
7
10. FEATURES
• Three sources of features: user, ad, page
• In this talk: categorical features on ad and page.
Advertiser network
Publisher network
Advertiser
Publisher
Campaign
Site
Publisher hierarchy
Url
Ad
Advertiser hierarchy
10
11. HASHING TRICK
• Standard representation of categorical features: “one-hot” encoding
For instance, site feature
0
0
1
cnn.com
0
0
0
0
news.yahoo.com
• Dimensionality equal to the number of different values
– can be very large
• Hashing to reduce dimensionality (made popular by John Langford in VW)
• Dimensionality now independent of number of values
11
12. HASHING VS FEATURE SELECTION
• “Small” problem with 35M different values.
• Methods that require a dictionary have a larger model.
12
13. QUADRATIC FEATURES
• Outer product between two features.
• Example: between site and advertiser,
Feature is 1
site=finance.yahoo.com & advertiser=bank of america
Advertiser network
Publisher network
Publisher
Advertiser
Site
Campaign
Url
Ad
Similar to a polynomial kernel of degree 2
Large number of values
hashing trick
13
14. ADVANTAGES OF HASHING
• Practical
– Straightforward implement; no need to maintain dictionaries
• Statistical
– Regularization (infrequent values are washed away by frequent ones)
• Most powerful when combined with quadratic features
Quote of John Langford about hashing
At first it‟s scary, then you love it
14
15. LEARNING
• Regularized logistic regression
– Vowpal Wabbit open source package
• Regularization with hierarchical features
Well estimated
backoff smoothing
Small if rare value
• Negative data subsampled for computational reason
15
16. EVALUATION
• Comparison with (Agarwal et al. ‟10)
– Probabilistic model for the same display advertising prediction problem
– Leverages the hierarchical structures on the ad and publisher sides
– Sparse prior for smoothing
• Model trained on three weeks of data, tested on the 3 following days
auROC
auPRC
Log likelihood
+ 3.1%
+ 10.0%
+ 7.1%
D. Agarwal et al., Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear Models, KDD, 2010
16
18. MODEL UPDATE
• Needed because ads / campaigns keep changing.
• The posterior distribution of a previously trained model can be used as
the prior for training a new model with a new batch of data
Day 1 Day 2 Day 3
Day 4
Day 5
M0
M1
M2
• Influence of the update frequency (auPRC):
1 day
6 hours
2 hours
+3.7%
+5.1%
+5.8%
18
20. PARALLEL LEARNING
• Large training set
– 2B training samples; 16M parameters
– 400GB (compressed)
• Proposed method: less than one hour with 500 machines
• Optimize:
• SGD is fast on a single machine, but difficult to parallelize.
• Batch (quasi-Newton) methods are straightforward to parallelize
– L-BFGS with distributed gradient computation.
m
20
21. ALLREDUCE
• Aggregate and broadcast across nodes
9
13
37
37
15
1
7
7
37 37
8
5
3
5
3
37
37
4
4
• Very few modification to existing code: just insert several AllReduce op.
• Compatible with Hadoop / MapReduce
– Build a spanning tree on the gateway
– Single MapReduce job
– Leverage speculative execution to alleviate the slow node issue
21
22. ONLINE INITIALIZATION
• Hybrid approach:
– One pass of online learning on each node
– Average the weights from each node to get a warm start for batch
optimization
• Best of both (online / batch) worlds.
Splice site prediction (Sonnenburg et al. ‘10)
S. Sonnenburg and V. Franc, COFFIN: A Computational Framework for Linear SVMs, ICML 2010
Display advertising
22
24. THOMPSON SAMPLING
• Heuristic to address the Explore / Exploit problem, dating back to
Thompson (1933)
• Simple to implement
• Good performance in practice (Graepel et al. „10, Chapelle and Li „11)
• Rarely used, maybe because of lack of theoretical guarantee.
Draw model parameter
according to P( D)
t
Select best action
according to t
T. Graepel et al., Web-scale Bayesian click-through rate prediction for sponsored search advertising in
Microsoft’s Bing search engine, ICML 2010
O. Chapelle and L. Li, An Empirical Evaluation of Thompson Sampling, NIPS 2011
Observe reward
and update model
24
26. EVALUATION
• Semi-simulated environment: real input features, but labels generated.
• Set of eligible ads varies from 1 to 5,910. Total ads = 66,373
• Comparison of E/E algorithms:
– 4 days of data
– Cold start
• Algorithms:
X candidates
– UCB: mean + std. dev.
– -greedy
– Thompson sampling
X selected
Random w
(ground truth)
Generated Y
Learned w
model update
26
27. RESULTS
• CTR regret (in percentage):
Thompson
UCB
e-greedy
Exploit-only
Random
3.72
4.14
4.98
5.00
31.95
• Regret over time:
27
28. OPEN QUESTIONS
• Hashing
– Theoretical performance guarantees
• Low rank matrix factorization
– Better predict on unseen pairs (publisher, advertiser)
• Sample selection bias
– System is trained only on selected ads, but all ads are scored.
– Possible solution: inverse propensity scoring
– But we still need to bias the training data toward good ads.
• Explore / exploit
– Evaluation framework
– Regret analysis of Thompson‟s sampling
– E/E with a budget; with multiple slots; with a delayed feedback
28
29. CONCLUSION
• Simple yet efficient techniques for click prediction
• Main difficulty in applied machine learning: avoid the bias (because of
academic papers) toward complex systems
– It‟s easy to get lured into building a complex system
– It‟s difficult to keep it simple
See paper for more details
Simple and scalable response prediction for display advertising
O. Chapelle, E. Manavoglu, R. Rosales, 2014
29