CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...Shuai Yuan
Computational Advertising has been an important topical area in information retrieval and knowledge management. This tutorial will be focused on real-time advertising, aka Real-Time Bidding (RTB), the fundamental shift in the field of computational advertising. It is strongly related to CIKM areas such as user log analysis and modelling, information retrieval, text mining, knowledge extraction and management, behaviour targeting, recommender systems, personalization, and data management platform.
This tutorial aims to provide not only a comprehensive and systemic introduction to RTB and computational advertising in general, but also the emerging research challenges and research tools and datasets in order to facilitate the research. Compared to previous Computational Advertising tutorials in relevant top-tier conferences, this tutorial takes a fresh, neutral, and the latest look of the field and focuses on the fundamental changes brought by RTB.
We will begin by giving a brief overview of the history of online advertising and present the current eco-system in which RTB plays an increasingly important part. Based on our field study and the DSP optimisation contest organised by iPinyou, we analyse optimization problems both from the demand side (advertisers) and the supply side (publishers), as well as the auction mechanism design challenges for Ad exchanges. We discuss how IR, DM and ML techniques have been applied to these problems. In addition, we discuss why game theory is important in this area and how it could be extended beyond the auction mechanism design.
CIKM is an ideal venue for this tutorial because RTB is an area of multiple disciplines, including information retrieval, data mining, knowledge discovery and management, and game theory, most of which are traditionally the key themes of the conference. As an illustration of practical application in the real world, we shall cover algorithms in the iPinyou global DSP optimisation contest on a production platform; for the supply side, we also report experiments of inventory management, reserve price optimisation, etc. in production systems.
We expect the audience, after attending the tutorial, to understand the real-time online advertising mechanisms and the state of the art techniques, as well as to grasp the research challenges in this field. Our motivation is to help the audience acquire domain knowledge and obtain relevant datasets, and to promote research activities in RTB and computational advertising in general.
Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB
Rockset is a realtime indexing database that powers fast SQL over semi-structured data such as JSON, Parquet, or XML without requiring any schematization. All data loaded into Rockset are automatically indexed and a fully featured SQL engine powers fast queries over semi-structured data without requiring any database tuning. Rockset exploits the hardware fluidity available in the cloud and automatically grows and shrinks the cluster footprint based on demand. Available as a serverless cloud service, Rockset is used by developers to build data-driven applications and microservices.
In this talk, we discuss some of the key design aspects of Rockset, such as Smart Schema and Converged Index. We describe Rockset's Aggregator Leaf Tailer (ALT) architecture that provides low latency queries on large datasets.Then we describe how you can combine lightweight transactions in ScyllaDB with realtime analytics on Rockset to power an user-facing application.
A Comprehensive and detailed approach using Machine Learning - An international E-Commerce company wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix. They also have future expansion plans to India.
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Fernando Amat and Elliot Chow from Netflix talk about the Bandit infrastructure for Personalized Recommendations
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...Shuai Yuan
Computational Advertising has been an important topical area in information retrieval and knowledge management. This tutorial will be focused on real-time advertising, aka Real-Time Bidding (RTB), the fundamental shift in the field of computational advertising. It is strongly related to CIKM areas such as user log analysis and modelling, information retrieval, text mining, knowledge extraction and management, behaviour targeting, recommender systems, personalization, and data management platform.
This tutorial aims to provide not only a comprehensive and systemic introduction to RTB and computational advertising in general, but also the emerging research challenges and research tools and datasets in order to facilitate the research. Compared to previous Computational Advertising tutorials in relevant top-tier conferences, this tutorial takes a fresh, neutral, and the latest look of the field and focuses on the fundamental changes brought by RTB.
We will begin by giving a brief overview of the history of online advertising and present the current eco-system in which RTB plays an increasingly important part. Based on our field study and the DSP optimisation contest organised by iPinyou, we analyse optimization problems both from the demand side (advertisers) and the supply side (publishers), as well as the auction mechanism design challenges for Ad exchanges. We discuss how IR, DM and ML techniques have been applied to these problems. In addition, we discuss why game theory is important in this area and how it could be extended beyond the auction mechanism design.
CIKM is an ideal venue for this tutorial because RTB is an area of multiple disciplines, including information retrieval, data mining, knowledge discovery and management, and game theory, most of which are traditionally the key themes of the conference. As an illustration of practical application in the real world, we shall cover algorithms in the iPinyou global DSP optimisation contest on a production platform; for the supply side, we also report experiments of inventory management, reserve price optimisation, etc. in production systems.
We expect the audience, after attending the tutorial, to understand the real-time online advertising mechanisms and the state of the art techniques, as well as to grasp the research challenges in this field. Our motivation is to help the audience acquire domain knowledge and obtain relevant datasets, and to promote research activities in RTB and computational advertising in general.
Realtime Indexing for Fast Queries on Massive Semi-Structured DataScyllaDB
Rockset is a realtime indexing database that powers fast SQL over semi-structured data such as JSON, Parquet, or XML without requiring any schematization. All data loaded into Rockset are automatically indexed and a fully featured SQL engine powers fast queries over semi-structured data without requiring any database tuning. Rockset exploits the hardware fluidity available in the cloud and automatically grows and shrinks the cluster footprint based on demand. Available as a serverless cloud service, Rockset is used by developers to build data-driven applications and microservices.
In this talk, we discuss some of the key design aspects of Rockset, such as Smart Schema and Converged Index. We describe Rockset's Aggregator Leaf Tailer (ALT) architecture that provides low latency queries on large datasets.Then we describe how you can combine lightweight transactions in ScyllaDB with realtime analytics on Rockset to power an user-facing application.
A Comprehensive and detailed approach using Machine Learning - An international E-Commerce company wants to use some of the most advanced machine learning techniques to analyse their customers with respect to their services and some important customer success matrix. They also have future expansion plans to India.
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Fernando Amat and Elliot Chow from Netflix talk about the Bandit infrastructure for Personalized Recommendations
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how Pulsar and Flink enable new capabilities that push forward the state-of-the-art in streaming. They will also share upcoming features and new capabilities in the integrations between Flink and Pulsar and how these two communities are working together to truly advance the power of stream processing.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )SANG WON PARK
몇년 전부터 Data Architecture의 변화가 빠르게 진행되고 있고,
그 중 Cloud DW는 기존 Data Lake(Hadoop 기반)의 한계(성능, 비용, 운영 등)에 대한 대안으로 주목받으며,
많은 기업들이 이미 도입했거나, 도입을 검토하고 있다.
본 자료는 이러한 Cloud DW에 대해서 개념적으로 이해하고,
시장에 존재하는 다양한 Cloud DW 중에서 기업의 환경에 맞는 제품이 어떤 것인지 성능/비용 관점으로 비교했다.
- 왜기업들은 CloudDW에주목하는가?
- 시장에는어떤 제품들이 있는가?
- 우리Biz환경에서는 어떤 제품을 도입해야 하는가?
- CloudDW솔루션의 성능은?
- 기존DataLake(EMR)대비 성능은?
- 유사CloudDW(snowflake vs redshift) 대비성능은?
앞으로도 Data를 둘러싼 시장은 Cloud DW를 기반으로 ELT, Mata Mesh, Reverse ETL등 새로운 생테계가 급속하게 발전할 것이고,
이를 위한 데이터 엔지니어/데이터 아키텍트 관점의 기술적 검토와 고민이 필요할 것 같다.
https://blog.naver.com/freepsw/222654809552
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유Hyojun Jeon
NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 1부 입니다.(총 2부)
- 1부 링크: https://goo.gl/3v4DAa
- 2부 링크: https://goo.gl/wpoZpY
(SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study
이 세션에서는 데브시스터즈의 Case Study를 통하여 Data Lake를 만들고 사용하는데 있어 요구 되는 사항들에 대해 공유합니다. 여러 목적에 맞는 데이터를 전달하기 위해 AWS 를 활용하여 Data Lake 를 구축하게된 계기와 실제 구축 작업을 하면서 경험하게 된 것들에 대해 말씀드리고자 합니다. 기존 인프라 구조 대비 효율성 및 비용적 측면을 소개해드리고, 빅데이터를 이용한 부서별 데이터 세분화를 진행할 때 어떠한 Architecture가 사용되었는지 소개드리고자 합니다.
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...hoondong kim
[Tensorflow-KR Offline 세미나 발표자료]
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps Cycle 구성 방법론. (Azure Docker PaaS 위에서 1만 TPS Tensorflow Inference Serving 방법론 공유)
이윤희 : 다짜고짜 배워보는 인과추론
발표영상 https://youtu.be/fShRiqe1Cf0
---
PAP가 준비한 팝콘 시즌1에서 프로덕트와 함께 성장하는 데이터 실무자들의 이야기를 담았습니다.
---
PAP(Product Analytics Playground)는 프로덕트 데이터 분석에 대해 편안하게 이야기할 수 있는 커뮤니티입니다.
우리는 데이터 드리븐 프로덕트 문화를 더 많은 분들이 각자의 자리에서 이끌어갈 수 있도록 하는 것을 목표로 합니다.
다양한 직군의 사람들이 모여 프로덕트를 만들듯 PAP 역시 다양한 멤버로 구성되어 있으며, 여러분들의 참여로 만들어집니다.
---
공식 페이지 : https://playinpap.oopy.io
페이스북 그룹 : https://www.facebook.com/groups/talkinpap
팀블로그 : https://playinpap.github.io
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Databricks
In the last few years, RNNs have achieved significant success in modeling time series and sequence data, in particular within the speech, language, and text domains. Recently, these techniques have been begun to be applied to session-based recommendation tasks, with very promising results.
This talk explores the latest research advances in this domain, as well as practical applications. I will provide an overview of RNNs, covering common architectures and applications, before diving deeper into RNNs for session-based recommendations. I will pay particular attention to the challenges inherent in common personalization tasks and the specific adjustments to models and optimization techniques required for success.
(Randall Hauch, Confluent) Kafka Summit SF 2018
The Kafka Connect framework makes it easy to move data into and out of Kafka, and you want to write a connector. Where do you start, and what are the most important things to know? This is an advanced talk that will cover important aspects of how the Connect framework works and best practices of designing, developing, testing and packaging connectors so that you and your users will be successful. We’ll review how the Connect framework is evolving, and how you can help develop and improve it.
In display and mobile advertising, the most significant development in recent years is the Real-Time Bidding (RTB), which allows selling and buying in real-time one ad impression at a time. The ability of making impression level bid decision and targeting to an individual user in real-time has fundamentally changed the landscape of the digital media. The further demand for automation, integration and optimisation in RTB brings new research opportunities in the IR fields, including information matching with economic constraints, CTR prediction, user behaviour targeting and profiling, personalised advertising, and attribution and evaluation methodologies. In this tutorial, teamed up with presenters from both the industry and academia, we aim to bring the insightful knowledge from the real-world systems, and to provide an overview of the fundamental mechanism and algorithms with the focus on the IR context. We will also introduce to IR researchers a few datasets recently made available so that they can get hands-on quickly and enable the said research.
Machine learning for profit: Computational advertising landscapeSharat Chikkerur
Online advertising is a multibillion-dollar industry than spans several subfields of computer science and engineering including information retrieval, machine learning, auction theory, optimization and distributed computing. Competitiveness in the marketplace is directly dependent on accuracy, scalability and sophistication of machine learning algorithms involved. The talk will present an overview of the computational advertising landscape, outline several key challenges, and discuss how machine learning addresses them.
http://www.buffalo.edu/cdse/CDSEdayslanding1/cdse-days2017/faculty_directory/SharatChikkerur.html
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
In this talk, Till Rohrmann and Addison Higham discuss how Flink allows for ambitious stream processing workflows and how Pulsar and Flink enable new capabilities that push forward the state-of-the-art in streaming. They will also share upcoming features and new capabilities in the integrations between Flink and Pulsar and how these two communities are working together to truly advance the power of stream processing.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
Cloud dw benchmark using tpd-ds( Snowflake vs Redshift vs EMR Hive )SANG WON PARK
몇년 전부터 Data Architecture의 변화가 빠르게 진행되고 있고,
그 중 Cloud DW는 기존 Data Lake(Hadoop 기반)의 한계(성능, 비용, 운영 등)에 대한 대안으로 주목받으며,
많은 기업들이 이미 도입했거나, 도입을 검토하고 있다.
본 자료는 이러한 Cloud DW에 대해서 개념적으로 이해하고,
시장에 존재하는 다양한 Cloud DW 중에서 기업의 환경에 맞는 제품이 어떤 것인지 성능/비용 관점으로 비교했다.
- 왜기업들은 CloudDW에주목하는가?
- 시장에는어떤 제품들이 있는가?
- 우리Biz환경에서는 어떤 제품을 도입해야 하는가?
- CloudDW솔루션의 성능은?
- 기존DataLake(EMR)대비 성능은?
- 유사CloudDW(snowflake vs redshift) 대비성능은?
앞으로도 Data를 둘러싼 시장은 Cloud DW를 기반으로 ELT, Mata Mesh, Reverse ETL등 새로운 생테계가 급속하게 발전할 것이고,
이를 위한 데이터 엔지니어/데이터 아키텍트 관점의 기술적 검토와 고민이 필요할 것 같다.
https://blog.naver.com/freepsw/222654809552
[NDC18] 야생의 땅 듀랑고의 데이터 엔지니어링 이야기: 로그 시스템 구축 경험 공유Hyojun Jeon
NDC18에서 발표하였습니다. 현재 보고 계신 슬라이드는 1부 입니다.(총 2부)
- 1부 링크: https://goo.gl/3v4DAa
- 2부 링크: https://goo.gl/wpoZpY
(SlideShare에 슬라이드 300장 제한으로 2부로 나누어 올렸습니다. 불편하시더라도 양해 부탁드립니다.)
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...Amazon Web Services Korea
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study
이 세션에서는 데브시스터즈의 Case Study를 통하여 Data Lake를 만들고 사용하는데 있어 요구 되는 사항들에 대해 공유합니다. 여러 목적에 맞는 데이터를 전달하기 위해 AWS 를 활용하여 Data Lake 를 구축하게된 계기와 실제 구축 작업을 하면서 경험하게 된 것들에 대해 말씀드리고자 합니다. 기존 인프라 구조 대비 효율성 및 비용적 측면을 소개해드리고, 빅데이터를 이용한 부서별 데이터 세분화를 진행할 때 어떠한 Architecture가 사용되었는지 소개드리고자 합니다.
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps...hoondong kim
[Tensorflow-KR Offline 세미나 발표자료]
Auto Scalable 한 Deep Learning Production 을 위한 AI Serving Infra 구성 및 AI DevOps Cycle 구성 방법론. (Azure Docker PaaS 위에서 1만 TPS Tensorflow Inference Serving 방법론 공유)
이윤희 : 다짜고짜 배워보는 인과추론
발표영상 https://youtu.be/fShRiqe1Cf0
---
PAP가 준비한 팝콘 시즌1에서 프로덕트와 함께 성장하는 데이터 실무자들의 이야기를 담았습니다.
---
PAP(Product Analytics Playground)는 프로덕트 데이터 분석에 대해 편안하게 이야기할 수 있는 커뮤니티입니다.
우리는 데이터 드리븐 프로덕트 문화를 더 많은 분들이 각자의 자리에서 이끌어갈 수 있도록 하는 것을 목표로 합니다.
다양한 직군의 사람들이 모여 프로덕트를 만들듯 PAP 역시 다양한 멤버로 구성되어 있으며, 여러분들의 참여로 만들어집니다.
---
공식 페이지 : https://playinpap.oopy.io
페이스북 그룹 : https://www.facebook.com/groups/talkinpap
팀블로그 : https://playinpap.github.io
Recurrent Neural Networks for Recommendations and Personalization with Nick P...Databricks
In the last few years, RNNs have achieved significant success in modeling time series and sequence data, in particular within the speech, language, and text domains. Recently, these techniques have been begun to be applied to session-based recommendation tasks, with very promising results.
This talk explores the latest research advances in this domain, as well as practical applications. I will provide an overview of RNNs, covering common architectures and applications, before diving deeper into RNNs for session-based recommendations. I will pay particular attention to the challenges inherent in common personalization tasks and the specific adjustments to models and optimization techniques required for success.
(Randall Hauch, Confluent) Kafka Summit SF 2018
The Kafka Connect framework makes it easy to move data into and out of Kafka, and you want to write a connector. Where do you start, and what are the most important things to know? This is an advanced talk that will cover important aspects of how the Connect framework works and best practices of designing, developing, testing and packaging connectors so that you and your users will be successful. We’ll review how the Connect framework is evolving, and how you can help develop and improve it.
In display and mobile advertising, the most significant development in recent years is the Real-Time Bidding (RTB), which allows selling and buying in real-time one ad impression at a time. The ability of making impression level bid decision and targeting to an individual user in real-time has fundamentally changed the landscape of the digital media. The further demand for automation, integration and optimisation in RTB brings new research opportunities in the IR fields, including information matching with economic constraints, CTR prediction, user behaviour targeting and profiling, personalised advertising, and attribution and evaluation methodologies. In this tutorial, teamed up with presenters from both the industry and academia, we aim to bring the insightful knowledge from the real-world systems, and to provide an overview of the fundamental mechanism and algorithms with the focus on the IR context. We will also introduce to IR researchers a few datasets recently made available so that they can get hands-on quickly and enable the said research.
Machine learning for profit: Computational advertising landscapeSharat Chikkerur
Online advertising is a multibillion-dollar industry than spans several subfields of computer science and engineering including information retrieval, machine learning, auction theory, optimization and distributed computing. Competitiveness in the marketplace is directly dependent on accuracy, scalability and sophistication of machine learning algorithms involved. The talk will present an overview of the computational advertising landscape, outline several key challenges, and discuss how machine learning addresses them.
http://www.buffalo.edu/cdse/CDSEdayslanding1/cdse-days2017/faculty_directory/SharatChikkerur.html
Computational Advertising in Yelp Local Adssoupsranjan
How does Yelp decide which relevant business or service to show you as an ad within 10s of milliseconds of your visit? What are the criteria and metrics by which we measure success of our ad serving system?
In this talk, the audience will learn about how Yelp figures out the best ad to show a user during his visit to Yelp: via a 2nd price auction amongst all the matching advertisers. Powering this 2nd price auction is a Machine Learning based system that predicts Click Through Rates (CTR) for all ads and an Auto-Bidding system that determines the optimal bid price for each ad per user request.
Yelp's local advertising presents challenges that are unique compared to display, social or mobile advertising. I'll motivate this via some trends and data observations. One of the interesting aspects is business categories and geolocation: How far are people willing to travel to visit a restaurant? What about professional services like plumbers: are users less or more sensitive to how far those are compared to restaurants?
I'll provide examples of how we use our open-sourced Map Reduce package (MRJob) to scale ML feature engineering and performance metric computation. I'll also provide details on our Machine Learning pipeline built using the popular open source packages: Python scikit-learn, Vowpal Wabbit and Apache Spark.
This talk would give you an in-depth overview of advertising systems, and why with increasingly sophisticated ad systems, in future we will wonder why we ever hated ads!
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingJun Wang
We study and formulate arbitrage in display advertising. Real-Time Bidding (RTB) mimics stock spot exchanges and utilises computers to algorithmically buy display ads per impression via a real-time auction. Despite the new automation, the ad markets are still informationally inefficient due to the heavily fragmented marketplaces. Two display impressions with similar or identical effectiveness (e.g., measured by conversion or click-through rates for a targeted audience) may sell for quite different prices at different market segments or pricing schemes. In this paper, we propose a novel data mining paradigm called Statistical Arbitrage Mining (SAM) focusing on mining and exploiting price discrepancies between two pricing schemes. In essence, our SAMer is a meta-bidder that hedges advertisers' risk between CPA (cost per action)-based campaigns and CPM (cost per mille impressions)-based ad inventories; it statistically assesses the potential profit and cost for an incoming CPM bid request against a portfolio of CPA campaigns based on the estimated conversion rate, bid landscape and other statistics learned from historical data. In SAM, (i) functional optimisation is utilised to seek for optimal bidding to maximise the expected arbitrage net profit, and (ii) a portfolio-based risk management solution is leveraged to reallocate bid volume and budget across the set of campaigns to make a risk and return trade-off. We propose to jointly optimise both components in an EM fashion with high efficiency to help the meta-bidder successfully catch the transient statistical arbitrage opportunities in RTB. Both the offline experiments on a real-world large-scale dataset and online A/B tests on a commercial platform demonstrate the effectiveness of our proposed solution in exploiting arbitrage in various model settings and market environments.
TripleLift: Preparing for a New Programmatic Ad-Tech WorldVoltDB
Michael Harroun, Director of Backend Architecture at TripleLift explores the benefits of leveraging real-time databases to power their programmatic native advertisement exchange.
RTB tutorial Version 2.
In display and mobile advertising, the most significant development in recent years is the Real-Time Bidding (RTB), which allows selling and buying in real-time one ad impression at a time. Since then, RTB has fundamentally changed the landscape of the digital marketing by scaling the buying process across a large number of available inventories. The demand for automation, integration and optimisation in RTB brings new research opportunities in the IR/DM/ML fields. However, despite its rapid growth and huge potential, many aspects of RTB remain unknown to the research community for many reasons. In this tutorial, together with invited distinguished speakers from online advertising industry, we aim to bring the insightful knowledge from the real-world systems to bridge the gaps and provide an overview of the fundamental infrastructure, algorithms, and technical and research challenges of this new frontier of computational advertising. We will also introduce to researchers the datasets, tools, and platforms which are publicly available thus they can get hands-on quickly.
This tutorial aims to provide not only a comprehensive and systematic introduction to RTB and computational advertising in general, but also the emerging research challenges and research tools and datasets in order to facilitate the research. Compared to previous Computational Advertising tutorials in relevant top-tier conferences, this tutorial takes a fresh, neutral, and the latest look of the field and focuses on the fundamental changes brought by RTB. We expect the audience, after attending the tutorial, to understand the real-time online advertising mechanisms and the state of the art techniques, as well as to grasp the research challenges in this field. Our motivation is to help the audience acquire domain knowledge and obtain relevant datasets, and to promote research activities in RTB and computational advertising in general.
Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Predi...MLconf
Machine Learning for Display Advertising @ Scale: In this talk, we will briefly introduce the display advertising marketplace, its stakeholders and the key performance metrics. We will then present the models we have developed at Criteo for bidding in real-time auctions, product recommendation, and look & feel optimization at scale (1B+ monthly users, 3B+ products in our catalog, and 30K ad displayed / sec at peak traffic). For these tasks, we’ve moved over time from predicting rare, binary events (clicks) to predicting very rare events (sales) and continuous events (sales amounts), all of them being quite noisy, and we’ll discuss the different methods that we have tried to build these models (such as generalized linear models, trees or factorization machines). We’ll continue by discussing how we evaluate these models both offline and online. We will describe the infrastructure for large-scale distributed data processing that these algorithms rely upon and discuss different optimization techniques we have experimented with (such as SGD, L-BFGS, SVRG). Finally, we will conclude with future areas of research and discuss open challenges we are currently facing.”
Big Data in Advertising Industry — Oleksandr Fedirko, Danylo StepanchukGlobalLogic Ukraine
This presentation deals with Big Data usage in advertising industry, namely in ad exchange business. It contains an overview of Big Data tools, a respective architectural approach and Java application for MapReduce.
This presentation was held by Oleksandr Fedirko (Lead Software Engineer, Consultant, GlobalLogic) and Danylo Stepanchuk (Lead Software Engineer, Consultant, GlobalLogic) at GlobalLogic Kyiv Java Career Day on August 11, 2018.
Learn more: https://www.globallogic.com/ua/events/globallogic-kyiv-java-career-day
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
From training billions of ad impressions to scaling gradient boosted trees with more than three million nodes, Ad Targeting at Yelp uses Apache Spark in many stages of its large-scale machine learning pipeline.
This session will explore examples of how Yelp employed and tweaked Spark to support big data feature engineering, visualizations and machine learning model training, evaluation and diagnostics. You’ll also hear about the challenges in building and deploying such a large-scale intelligent system in a production environment.
Int'l Conference on Predictive APIs: RTB Optimizer presentationDatacratic
Real-time bidding, in the context of digital marketing, refers to the purchase of advertising impressions one at a time, responding to tens of thousands of messages per second, paying a different price for each via an auction mechanism. This talk will cover in detail how Datacratic’s RTB Optimizer Prediction API predicts the outcome of buying a given impression, then computes the economic value of that outcome to produce optimal bidding behaviour.
Understanding Natural Language Instructions for Fetching Daily Objects Using ...Aly Magassouba
Presentation given at IROS 2019 Macau: We address multimodal language understanding with natural fetching instruction for domestic service robots. A typical fetching instruction such as ``Bring me the yellow toy from the white shelf'' requires to infer the user intention, {\it i.e.}, what object (target) to fetch and from where (source). To solve the task, we propose a Multimodal Target-source Classifier Model (MTCM), which predicts the region-wise likelihood of target and source candidates in the scene.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
RTB Bid Landscape in Adform
1. RTB Bid Landscape
survival models
powered by bitmaps
Łukasz Mączewski Senior Data Scientist
Przemysław Piotrowski Senior Data Engineer
AI&Analytics, Adform
14.05.2019
1
2. Agenda
• Real Time Bidding – RTB
• Bidding Strategy
• Bid Landscape
• Bitmap population selection
• Survival analysis
• Validation
• Product demo
2
3. Real-Time Bidding (RTB)
RTB allows for fully automated ad
inventory selling / buying in a real-time
3
Advertiser 1
Advertiser 2
Advertiser 3
Advertiser n
AudienceInventory
Auction
Demand Side Platform is to
estimate optimal
bid price according to:
• Targeting (audience)
• Business requirements (KPIs)
• Ad inventory (context, ad slot price)
• Campaign budget and it's spend plan
4. First price vs. second price auction
First price auction:
- Pay exact amount that was bidded
- Bid price is close to a real impression value
- Seller revenue maximization
4
Second price auction:
- Pay 0.00001$ more than second-highest bid
price
- Bid as high as possible to maximize winning
probability
In the past the second price auction was preferred, but currently
the first price auction becomes more and more popular
5. RTB ecosystem
5
diverse
audience
web page
Publisher 1 and its
inventory SSP 1
Supply Side
Platform
Publisher n and its
inventory
.
.
. Ad Exchange
DMP
Data
Management
Platform
DSP 1
Demand Side Platform
SSP n DSP n
.
.
.
advertisers
demographics, browsing, interest segments data
cookie profile data
feedback data
ad-slot / banner data
campaign data
Bidding strategy
optimisation
Time to bid is ~100 ms
6. Bidding strategy optimisation
6
Bidding Strategy
Optimization
Candidate impression valuation with
respect to KPIs and budget
constraints
DSP algorithms
Evaluation of bid request business potential
Bid Landscape
Estimation of winning probability at bid price
Budget throttle
Control budget spend
Optimal bid price
7. Numbers
7
Adform provides an integrated Software as a Service
platform for the buying, managing and serving of
digital advertising.
Presence in EU, US, APAC
3 millions bid request per second
billions of engagements daily
5 millions scoring of algorithms per second
200 thousands of cookie profiles in DMP
9. Discovering knowledge from massive RTB data
9
Cookie profile/
Targeting:
• device type
• os type
• browser type
• screen size
• country
• regions
• city
• clicks made
• visited log points
• visited domains
• urls
• IAB categories
• ...
Feedback:
• acquisition/conversion
• click
• order status
• revenue
• viewability
• ...
Adslot/ banners/
creatives:
• inventory name
• domain
• URL
• banner position
• banner size
• banner type
• floor price
• video player properties
• ...
Data Management
Platform:
• demography
• interests and preferences
• profession
• ...
Campaign:
• budget
• pricing type
• deals
• targeting
• brand safety
• ...
10. Discovering knowledge from massive RTB data
10
Huge class
imbalance
Mostly
categorical
data
High
cardinality
Sparsity
High risk of
overfit
Data summary
Reduction of dimensionality
Preselection of features RegularisationData sampling
Cross validationEnsembling
11. RTB models
11
CPC
Cost per Click
CPL
Cost per Lead
netCPL
Confirmed lead
CPV
Cost per
Viewability
ROAS
Return on Ad
Spend
VCR
Video Completion
Rate
12. Adform machine learning pipeline
12
Train/Test data
sets preparation
(dedicated ETLs)
Data
preprocessing
and feature
engineering
Feature selection
Building model
and
hyperparameters
optimization
Offline backtests Online A/B tests Deployment Live monitoring
After successful model deployment it is
periodically retrained
14. Backtests
• Offline procedure to check performance of
two competing models in time
• Sequential analysis of historical data:
- Data divided into training and test sets
• Time series analysis of critical performance
KPIs:
- AUC
- Sparsity
- Target rates
• If the backtests succeed the algorithm is
submitted to online A/B testing procedure
14
AUC
SPARSITY
Target rate
Time
15. A/B test online monitoring
15
Split traffic to
hash(cookie_id)%1024
buckets
For each experiment monitor
• Cost per target
• Target rate
• Target count
• Number of impressions
• ... many more
18. User Stories
As a Trafficker I want to know volume of impressions my campaign would win in RTB given
the selected inventory, targeting and budget settings (CPM).
As a DSP I want to enable Advertisers winning impressions for lowest possible price.
18
Bid Shading
Demand Side Platform protects Advertiser from overpaying while
AdTech ecosystem migrating from second-price to first-price RTB auctions.
19. Forecasting & Bid Landscape
Forecasting
• Total number of matching Bid Requests for
given Campaign Targeting
• Bid Opportunities
19
Bid Landscape
• Predicting probability of winning
auction with a given bid price and
Campaign Settings
• Win rate evolution with a bid price
Impressions
CPM
20. Bid Landscape
• Building Bid Landscape model using
historical data regarding won and failed
auctions:
For lost auctions win price is unknown
• Right Censored Data
Survival Analysis
The exact moment of failure for majority of
objects is unknown
20
All Bids
Won Bids
Relative Rate
Bid Price Buckets
22. The Data
22
Input
Bids and Impressions with features
Impression is a Bid that won the auction
100000 2000 5000 100 4000 3000 ... 4500
Output
cardinality of selected (Campaign Settings)
Bids bucketed by bid price
Impressions bucketed by bid price
Impressions bucketed by win price
100 buckets in our case
How?
Population Selection – select relevant population from high-dimensional
space for survival analysis.
source:biretail.com
26. Roaring Bitmap
26
Roaring Bitmap – state of the art compressed bitmap algorithm.
Hybrid approach between sparse and dense bitmaps representation.
roaringbitmap.org/about/
„Use Roaring for bitmap compression whenever possible.
Do not use other bitmap compression methods” [1]
[1] Wang, Jianguo, et al.
"An experimental study of bitmap compression vs. inverted list compression." 2017.
27. Bitmap Execution Database
27
• Roaring Bitmap – state of the art compressed bitmap algorithm
• Whole 7 days of Internet bid opportunities – 12GB (sample 1/1000) Naive 50TB
• Bids&Impressions – 5GB (sample 1/100)
• Majority of responses under 1 second
30. Principles of survival analysis
• The basic assumption in survival analysis is that the risk of failure changes with time
• Two major approaches for survival probability modelling can be distinguished:
- Parametric – can be extrapolated for new experiences (Weibull)
- Non-parametric – very flexible but can't be extended for new experience (Kaplan-Maier)
30
FailureRate
32. Weibull based survival probability estimation
• Weibull distribution is derived from hazard function:
where:
- S(t) - survival function (cdf)
- F(t) = 1 – S(t) - failure function (cdf)
- f(t) = dF(t)/dt - pdf
• Weibull distribution is defined by 2 parameters:
- Shape parameter p - failure mode type
- Scale parameter l - position
32
Time
Survive .
0 t ∆t
Fail .
33. Weibull based survival probability estimation
• One can distinguish 3 classes of failure types:
- Infant mortality p < 1
- Random failures p = 1
- Ware out p > 1
33
Weibull examples
Parametric approach easy to extrapolate
age
Hazard
34. Kaplan-Maier based survival probability estimation
• In Kaplan-Meier method we are to predict chance of surviving time ∆t after surviving time t:
34
No parameters to be estimated, very flexible and easy in implementation
S(t,Δt) = S(t) • S(Δt|t)
Time
Survive .
0 t ∆t
Survive.
37. RTB auctions modelling with survival methods
From survival models to Bid Landscape:
- time -> cost per 1k bought impressions (CPM)
- died -> won bid (left censored, known win price)
- alive -> lost bid (right censored, unknown exact win price)
37
our win price (second price auction)
competitor outbid price (unknown)
our bid price
38. Bid Landscape vs. Survival Analysis
Survival analysis
• Risk of failure evolves with time
• Incomplete knowledge regarding failure
time, right censoring:
- Majority of object survive
• Multiple competing failure modes
• Very high disproportion between positive
(failures) and negative (survived) class
representatives
• Usually very limited data
• Wrong predictions may have catastrophic
consequences – quality
monitoring required
38
Bid Landscape analysis
• Winning probability evolves with price
• Incomplete knowledge regarding win price, right
censoring:
- Only small fraction of bids are won
• Different inventory sources
• Very high disproportion between positive (won)
and negative (lost) class representatives
• Usually huge data sets – subsampling
• Wrong predictions can lead to not efficient
campaign money spend – quality monitoring
required
39. Survival model for Bid Landscape modelling
39
• Let's apply survival techniques to auction data:
- Instead of time a bid price is to be used
• Legend:
• number of lost auctions
• number of won impressions
- win price CDF
- Kaplan-Maier estimate
- Weibull estimate
Adform Bid Landscape exploits
Kaplan-Maier estimate exclusively
Bid Price (CPM)
Multiplicity
WinRate
41. Bid Landscape validation – qualitative
41
Win rate evolution with bid price Win rate evolution over day
The bid landscape predictions are in line with the intuition:
• The win rate increases with bid price
• The win rate changes periodically over a day
42. Bid Landscape validation – quantitative (real RTB data)
42
Real auction data versus Bid Landscape predictions
47. Bid Multipliers optimisation
47
Polish accountants'
portal
Jobs for professionals
Video oriented website
Impressions
Source of traffic:
• Campaign targeting Warsaw people to be
run on three different domains
• Global CPM makes no sense, needs to be
adjusted
48. Bid Landscape dynamic CPM algorithm
48
Given the Campaign Settings including hourly budget maximize
bought relevant impressions by using lowest CPM.
First tests are showing multiplied Viewable Impressions,
Clicks comparing to same Campaign using static price.
49. Summary and conclusions
• The Adform Bid Landscape is composed of two base elements ensuring speed and quality:
- Bitmaps – fast and efficient population selection
- Kaplan-Maier win rates – simple and precise estimate
• The Adform Bid Landscape is in production for more than one year
- It is mostly used by Adform customers optimising the CPM campaigns
- Currently we develop new RTB algorithm that is based on BL predictions
- In the short time perspective we are to extend usage of BL in other RTB models
49