Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced by a treatment through data mining / machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining/data science/business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with plenty of publications and presentations appeared in recent years from both industry practitioners and academics.
In this workshop, I will introduce the concept of Uplift, review existing methods, contrast with the traditional approach, and introduce a new method that can be implemented with standard software. A method and metrics for model assessment will be recommended. Our discussion will include new approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, using techniques from causal inference. Additionally, an integrated modeling approach for uplift and direct response (where it can be identified who actually responded, e.g., click-through or coupon scanning) will be discussed. Last but not least, extension to the multiple treatment situation with solutions to optimizing treatments at the individual level will also be discussed. While the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedRising Media Ltd.
For many businesses, it is not enough to model the probability of an outcome but rather, “given a predictive model, what can we do to change the probability of this outcome?” The goal of this talk is to present how uplift modelling is used to make causal inferences that guide acquisition strategy at Shopify. Mojan will walk through a case study focused on the statistics and experimental design behind uplift modelling, in addition to the learnings gained from bringing this model to production. The python implementation of this presentation will be made available to attendees.
This is a brief review of net lift models based on the presentation I did at Truecar in June, 2013 after I attended the training provided by SAS Institute. Recently, I added a few things to the slides after I reviewed several online examples and papers.
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
These slides are from a talk I at the papis conference in Boston in 2016. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
I tried to bridge the gap between causal inference theory and uplift theory, especially concerning how to properly cross validate the results. The notation used is the one from uplift modelling.
These slides are from a talk I gave at Google Campus Madrid for the Machine Learning Meetup. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
Why start using uplift models for more efficient marketing campaignsData Con LA
Data Con LA 2020
Description
What and why uplift modeling? A real marketing use case at Uber with uplift modeling using CausalML which is an open-source Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research developed by Uber.
*What is uplift modeling?
*From A/B testing to personalization
*From propensity to incremental effect
*Use case for uplift modeling at Uber
*Algorithms in CausalML Package
Speaker
Pan Jing, Uber, Data Scientist
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedRising Media Ltd.
For many businesses, it is not enough to model the probability of an outcome but rather, “given a predictive model, what can we do to change the probability of this outcome?” The goal of this talk is to present how uplift modelling is used to make causal inferences that guide acquisition strategy at Shopify. Mojan will walk through a case study focused on the statistics and experimental design behind uplift modelling, in addition to the learnings gained from bringing this model to production. The python implementation of this presentation will be made available to attendees.
This is a brief review of net lift models based on the presentation I did at Truecar in June, 2013 after I attended the training provided by SAS Institute. Recently, I added a few things to the slides after I reviewed several online examples and papers.
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
These slides are from a talk I at the papis conference in Boston in 2016. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
I tried to bridge the gap between causal inference theory and uplift theory, especially concerning how to properly cross validate the results. The notation used is the one from uplift modelling.
These slides are from a talk I gave at Google Campus Madrid for the Machine Learning Meetup. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
Why start using uplift models for more efficient marketing campaignsData Con LA
Data Con LA 2020
Description
What and why uplift modeling? A real marketing use case at Uber with uplift modeling using CausalML which is an open-source Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research developed by Uber.
*What is uplift modeling?
*From A/B testing to personalization
*From propensity to incremental effect
*Use case for uplift modeling at Uber
*Algorithms in CausalML Package
Speaker
Pan Jing, Uber, Data Scientist
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsLightbend
Most machine learning algorithms are designed to work with stationary data. Yet, real-life streaming data is rarely stationary. Machine learned models built on data observed within a fixed time period usually suffer loss of prediction quality due to what is known as concept drift.
The most common method to deal with concept drift is periodically retraining the models with new data. The length of the period is usually determined based on cost of retraining. The changes in the input data and the quality of predictions are not monitored, and the cost of inaccurate predictions is not included in these calculations.
A better alternative is monitoring the model quality by testing the inputs and predictions for changes over time, and using change points in retraining decisions. There has been significant development in this area within the last two decades.
In this webinar, Emre Velipasaoglu, Principal Data Scientist at Lightbend, Inc., will review the successful methods of machine learned model quality monitoring.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
해당 자료는 풀잎스쿨 18기 중 "설명가능한 인공지능 기획!" 진행 중 Counterfactual Explanation 세션에 대해서 정리한 자료입니다.
논문, Youtube 및 하기 자료를 바탕으로 정리되었습니다.
https://christophm.github.io/interpretable-ml-book/
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...Anmol Bhasin
Tutorials at ACM RecSys 2013
Social Networks
Learning to Rank
Beyond Friendship
Pref. Handling
Beyond Friendship: The Art, Science and Applications of Recommending People to People in Social Networks
by Luiz Augusto Pizzato (University of Sydney, Australia)
& Anmol Bhasin (LinkedIn, USA)
While Recommender Systems are powerful drivers of engagement and transactional utility in social networks, People recommenders are a fairly involved and diverse subdomain. Consider that movies are recommended to be watched, news is recommended to be read, people however, are recommended for a plethora of reasons – such as recommendation of people to befriend, follow, partner, targets for an advertisement or service, recruiting, partnering romantically and to join thematic interest groups.
This tutorial aims to first describe the problem domain, touch upon classical approaches like link analysis and collaborative filtering and then take a rapid deep dive into the unique aspects of this problem space like Reciprocity, Intent understanding of recommender and the recomendee, Contextual people recommendations in communication flows and Social Referrals – a paradigm for delivery of recommendations using the Social Graph. These aspects will be discussed in the context of published original work developed by the authors and their collaborators and in many cases deployed in massive-scale real world applications on professional networks such as LinkedIn.
Introduction
The basics of Social Recommenders
People recommender systems
Special Topics in People Recommenders
Why reciprocal (people) recommenders are different to traditional (product) recommendations
Multi-Objective Optimization
Intent Understanding
Feature Engineering
Social Referral
Pathfinding
Concluding remarks
The pre-requisite for this tutorial is some familiarity with foundational Recommender Systems, Data Mining, Machine Learning and Social Network Analysis literature.
Date
Oct 13, 2013 (08:30 – 10:15)
(Presented at the Deep Learning Re-Work SF Summit on 01/25/2018)
In this talk, we go through the traditional recommendation systems set-up, and show that deep learning approaches in that set-up don't bring a lot of extra value. We then focus on different ways to leverage these techniques, most of which relying on breaking away from that traditional set-up; through providing additional data to your recommendation algorithm, modeling different facets of user/item interactions, and most importantly re-framing the recommendation problem itself. In particular we show a few results obtained by casting the problem as a contextual sequence prediction task, and using it to model time (a very important dimension in most recommendation systems).
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Fernando Amat and Elliot Chow from Netflix talk about the Bandit infrastructure for Personalized Recommendations
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
DoWhy Python library for causal inference: An End-to-End toolAmit Sharma
As computing systems are more frequently and more actively intervening in societally critical domains such as healthcare, education, and governance, it is critical to correctly predict and understand the causal effects of these interventions. Without an A/B test, conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal reasoning.
Much like machine learning libraries have done for prediction, "DoWhy" is a Python library that aims to spark causal thinking and analysis. DoWhy provides a unified interface for causal inference methods and automatically tests many assumptions, thus making inference accessible to non-experts.
For a quick introduction to causal inference, check out amit-sharma/causal-inference-tutorial. We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (KDD 2018) conference: causalinference.gitlab.io/kdd-tutorial.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
Talk at RecSys 2017 in Como, Italy on 2017-08-29.
Abstract:
Time plays a key role in recommendation. Handling it properly is especially critical when using recommender systems in real-world applications, which may not be as clear when doing research with historical data. In this talk, we will discuss some of the important challenges of handling time in recommendation algorithms at Netflix. We will focus on challenges related to how our users, items, and systems all change over time. We will then discuss some strategies for tackling these challenges, which revolves around proper treatment of causality in our systems.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
Meet up French Video Game Analyst
April 2016
Company : Dataiku
Speaker: Pierre Guitterrez, Data Scientist
Prédire qu'un joueur va partir c'est bien. Agir sur lui uniquement dans le cas où ça sera utile, c'est mieux.
Théorie et pratique sur les modèles d'uplift via l'exemple de la prédiction du churn à Ankama.
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsLightbend
Most machine learning algorithms are designed to work with stationary data. Yet, real-life streaming data is rarely stationary. Machine learned models built on data observed within a fixed time period usually suffer loss of prediction quality due to what is known as concept drift.
The most common method to deal with concept drift is periodically retraining the models with new data. The length of the period is usually determined based on cost of retraining. The changes in the input data and the quality of predictions are not monitored, and the cost of inaccurate predictions is not included in these calculations.
A better alternative is monitoring the model quality by testing the inputs and predictions for changes over time, and using change points in retraining decisions. There has been significant development in this area within the last two decades.
In this webinar, Emre Velipasaoglu, Principal Data Scientist at Lightbend, Inc., will review the successful methods of machine learned model quality monitoring.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
해당 자료는 풀잎스쿨 18기 중 "설명가능한 인공지능 기획!" 진행 중 Counterfactual Explanation 세션에 대해서 정리한 자료입니다.
논문, Youtube 및 하기 자료를 바탕으로 정리되었습니다.
https://christophm.github.io/interpretable-ml-book/
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...Anmol Bhasin
Tutorials at ACM RecSys 2013
Social Networks
Learning to Rank
Beyond Friendship
Pref. Handling
Beyond Friendship: The Art, Science and Applications of Recommending People to People in Social Networks
by Luiz Augusto Pizzato (University of Sydney, Australia)
& Anmol Bhasin (LinkedIn, USA)
While Recommender Systems are powerful drivers of engagement and transactional utility in social networks, People recommenders are a fairly involved and diverse subdomain. Consider that movies are recommended to be watched, news is recommended to be read, people however, are recommended for a plethora of reasons – such as recommendation of people to befriend, follow, partner, targets for an advertisement or service, recruiting, partnering romantically and to join thematic interest groups.
This tutorial aims to first describe the problem domain, touch upon classical approaches like link analysis and collaborative filtering and then take a rapid deep dive into the unique aspects of this problem space like Reciprocity, Intent understanding of recommender and the recomendee, Contextual people recommendations in communication flows and Social Referrals – a paradigm for delivery of recommendations using the Social Graph. These aspects will be discussed in the context of published original work developed by the authors and their collaborators and in many cases deployed in massive-scale real world applications on professional networks such as LinkedIn.
Introduction
The basics of Social Recommenders
People recommender systems
Special Topics in People Recommenders
Why reciprocal (people) recommenders are different to traditional (product) recommendations
Multi-Objective Optimization
Intent Understanding
Feature Engineering
Social Referral
Pathfinding
Concluding remarks
The pre-requisite for this tutorial is some familiarity with foundational Recommender Systems, Data Mining, Machine Learning and Social Network Analysis literature.
Date
Oct 13, 2013 (08:30 – 10:15)
(Presented at the Deep Learning Re-Work SF Summit on 01/25/2018)
In this talk, we go through the traditional recommendation systems set-up, and show that deep learning approaches in that set-up don't bring a lot of extra value. We then focus on different ways to leverage these techniques, most of which relying on breaking away from that traditional set-up; through providing additional data to your recommendation algorithm, modeling different facets of user/item interactions, and most importantly re-framing the recommendation problem itself. In particular we show a few results obtained by casting the problem as a contextual sequence prediction task, and using it to model time (a very important dimension in most recommendation systems).
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Fernando Amat and Elliot Chow from Netflix talk about the Bandit infrastructure for Personalized Recommendations
Machine Learning operations brings data science to the world of devops. Data scientists create models on their workstations. MLOps adds automation, validation and monitoring to any environment including machine learning on kubernetes. In this session you hear about latest developments and see it in action.
DoWhy Python library for causal inference: An End-to-End toolAmit Sharma
As computing systems are more frequently and more actively intervening in societally critical domains such as healthcare, education, and governance, it is critical to correctly predict and understand the causal effects of these interventions. Without an A/B test, conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal reasoning.
Much like machine learning libraries have done for prediction, "DoWhy" is a Python library that aims to spark causal thinking and analysis. DoWhy provides a unified interface for causal inference methods and automatically tests many assumptions, thus making inference accessible to non-experts.
For a quick introduction to causal inference, check out amit-sharma/causal-inference-tutorial. We also gave a more comprehensive tutorial at the ACM Knowledge Discovery and Data Mining (KDD 2018) conference: causalinference.gitlab.io/kdd-tutorial.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
Slides from our talk at the RecSys 2016 conference in Boston, MA 2016-09-18 on our perspective for what are important areas for future work in recommender systems.
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
Talk at RecSys 2017 in Como, Italy on 2017-08-29.
Abstract:
Time plays a key role in recommendation. Handling it properly is especially critical when using recommender systems in real-world applications, which may not be as clear when doing research with historical data. In this talk, we will discuss some of the important challenges of handling time in recommendation algorithms at Netflix. We will focus on challenges related to how our users, items, and systems all change over time. We will then discuss some strategies for tackling these challenges, which revolves around proper treatment of causality in our systems.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
Through this session we're going to introduce the MLOps lifecycle and discuss the hidden loopholes that can affect the MLProject. Then we are going to discuss the ML Model lifecycle and discuss the problem with training. We're going to introduce the MLFlow Tracking module in order to track the experiments.
Recommendation systems today are widely used across many applications such as in multimedia content platforms, social networks, and ecommerce, to provide suggestions to users that are most likely to fulfill their needs, thereby improving the user experience. Academic research, to date, largely focuses on the performance of recommendation models in terms of ranking quality or accuracy measures, which often don’t directly translate into improvements in the real-world. In this talk, we present some of the most interesting challenges that we face in the personalization efforts at Netflix. The goal of this talk is to sunshine challenging research problems in industrial recommendation systems and start a conversation about exciting areas of future research.
Meet up French Video Game Analyst
April 2016
Company : Dataiku
Speaker: Pierre Guitterrez, Data Scientist
Prédire qu'un joueur va partir c'est bien. Agir sur lui uniquement dans le cas où ça sera utile, c'est mieux.
Théorie et pratique sur les modèles d'uplift via l'exemple de la prédiction du churn à Ankama.
Most data scientists are focused on predictive (aka supervised) projects, yet the real growth is usually in the estimation of action effects and optimizations of action policies. To this end, I will present causal inference and related packages.
There are three layers of analytics: descriptive (BI), predictive (supervised modeling), and prescriptive - the latter, the less-known one, focus on answering the most important business questions. For example, "what was the effect of giving a discount" ( or "what should I do to create the desired effect" - In this talk, we will first discuss what frameworks are used to answer these questions, namely causal inference, and reinforcement learning. Then we will deep dive into CI and discuss in causality crash 101 courses why is it important. Last but not least we will present existing causal-inference open-source packages and their limitations.
Building Institutional Capacity in Thailand to Design and Implement Climate P...UNDP Climate
23-25 November 2016, Thailand - A centerpiece of the Integrating Agriculture in National Adaptation Plans Programme (NAP-Ag) in Thailand is its support to develop a new five-year Strategy on Climate Change in Agriculture (2017-2021). This is spearheaded by the Ministry of Agriculture and Cooperatives (MOAC) and its Office of Agriculture Economics (OAE). The strategy was unveiled after a series of meetings by a Technical Working Group at a three-day workshop held on 23-25 November 2016 in Bangkok, organized by UNDP. Over 60 participants from each MOAC line department and 10 participants from academia and civil society were briefed by the Office of the Natural Resources and Environmental Policy and Planning (ONEP) and GIZ on the status of the National Adaption Plan (NAP) and learned how NAP-Ag programme efforts could support a broader NAP process and align with the Sector Plan. The new strategy focuses on improving evidence and data for informing policy choices, building the capacity of farmers and agri-businesses to adapt, promoting low-carbon development and productivity growth in the sector, and building institutional and managerial capacities to cope with climate change impacts.
Market Research using SPSS _ Edu4Sure Sept 2023.pptEdu4Sure
SPSS Training Related Content. There is practical training on the tool. The PPT is for reference purpose.
For any training need, kindly connect us at partner@edu4sure.com or call us at +91-9555115533.
For more courses at our LMS, you can also refer www.testformula.com
#Edu4Sure #SPSS #Training #Certificate
Critical Checks for Pharmaceuticals and Healthcare: Validating Your Data Inte...Minitab, LLC
Watch online at: https://hubs.ly/H0hswm60
Organizations in the pharmaceutical and health sectors are being asked by regulators to:
- Apply more complete methods to validate analytical techniques and measurement systems, known as Data Integrity
-Monitor and evaluate the performance of production processes, otherwise called Statistical Process Control (SPC)
In this presentation you will learn how to:
-Improve the precision and accuracy of analytical techniques, using Minitab's tools for Gage R & R, Gage Linearity and Bias studies and Design of Experiments
-Select the relevant control charts and capability analyses for data that does and does not follow the normal distribution
The presentation will explain how data integrity and process monitoring are critical to each other for regulatory compliance. If the data is not healthy, the evaluation of the process could also be incorrect.
You will finish with the confidence to use more sophisticated statistical techniques, in particular for data integrity.
As the importance of having a data strategy in place is sinking in, many organizations have added a chief data officer (CDO) to their executive team to help create and implement that strategy. But every organization is doing this a little bit differently. This talk will describe how a variety of industries and organizations are using CDOs and will make recommendations for best practices.
I’ll present the new knowledge discovery tools we are building at Diffeo. Unlike traditional search engines that use keywords, Diffeo provides an in-browser knowledge base that accelerates information gathering about people, companies, chemical compounds, cyber events, or other real world entities. I’ll describe how Diffeo uses active learning to encourage long and deep user interactions in order to recommend new content for in-progress articles. As you write, the search results get better and more interesting, because the system can see more precisely which entity you mean and which you don’t (disambiguation) and also what you don’t know yet about the entity (discovery).
Finally in this presentation I’ll describe our experience organizing the Text REtrieval Conference (TREC) on Knowledge Base Acceleration (KBA) and Dynamic Domain (DD) which are pushing the state of the art in knowledge discovery on large streams. I’ll show you how to access the largest corpus of streaming text data ever released for public evaluations.
An exposé on human-centered design, as related to data science and “medium data”. Examples of great API design will be showcased, as well as other end-user facing tools that can enable data scientists to share their observations with the world.
Mobile technology Usage by Humanitarian Programs: A Metadata Analysisodsc
CommCare, developed by Dimagi Inc., is an open-source mobile technology platform that supports hundreds of humanitarian frontline programs worldwide. The objective of this analysis is to demonstrate how CommCare metadata contains a wealth of information that can inform humanitarian programs in their use of mobile technology. This understanding can help programs determine the most effective way to implement CommCare or other mobile technology in resource-poor settings. A typical CommCare user is a frontline worker, such as a community health worker who provides outreach to pregnant women and children. An important feature of CommCare is that it supports case management, allowing users to register, update, and close cases in their CommCare application. A case is usually a user’s client, e.g., a pregnant woman who is supported by the CommCare user. While using CommCare, the user fills out electronic forms which eventually get submitted to the CommCare cloud server. The cumulative number of forms submitted by CommCare users as of December 2014 was just over 10 million. Metadata for each form submitted through CommCare are stored in Dimagi’s data platform; included in a form’s metadata are date and time stamps for when each form was started and ended by the user and when the form was eventually received by the cloud server.
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hiveodsc
The main objective of this workshop is to give the audience hands on experience with several Hadoop technologies and jump start their hadoop journey. In this workshop, you will load data and submit queries using Hadoop! Before jumping in to the technology, the Founders of DataKitchen review Hadoop and some of its technologies (MapReduce, Hive, Pig, Impala and Spark), look at performance, and present a rubric for choosing which technology to use when.
We’ve all been told to “work smarter, not harder.” But what does working smarter really mean? In the world of finance and trading, working smarter means working differently. None of us can compete against computers stacked inches away from the stock exchange or blue chip companies with multi-million dollar marketing campaigns. The key to winning is to go where the big guys haven’t and the way to do that is through diverse datasets. In this talk, you will discover the theory and tools to discover new datasets from unexpected sources in order to gain an upper-hand in both finance and business. So whether you’re a quant that trades in his bedroom or a restaurateur looking to grow his business, you’ll learn how the diversity of data can be the sharpest knife if your set.
Data Science at Dow Jones: Monetizing Data, News and Informationodsc
In this presentation I will describe the way Data Science supports the business of information and news at Dow Jones. Specifically, I will describe how we are introducing innovative and advanced large-scale information mining and analytic approaches not only into Dow Jones’ products but also into our strategy and decision making processes.Our goal is to impact every aspect of Dow Jones: from the way journalism is produced in the newsroom, to the way we create and deliver institutional products, to the way we improve retention and acquisition of subscribers. While the task seems broad and daunting, we have already achieved various successes through the application of machine learning, data mining, advanced analytics and big data approaches.In this presentation I will describe how we have achieved this, including our tools, data, approaches and mechanisms as well as describe what our plans are going forward.
Have you been in the situation where you’re about to start a new project and ask yourself, what’s the right tool for the job here? I’ve been in that situation many times and thought it might be useful to share with you a recent project we did and why we selected Spark, Python, and Parquet. My plan is take you through a use case that involves loading, transforming, aggregating, and persisting the dataset. We’ll use an open dataset consisting of full fund holdings graciously provided by Morningstar. My goal in presenting this use case are to have the audience learn about how these technologies can be applied to a real world problem and to inspire members of the audience to start learning these technologies and applying them to their own projects.
Building a Predictive Analytics Solution with Azure MLodsc
Create and operationalize a predictive model using Microsoft Azure Machine Learning.
– Perform the typical steps involved in building a predictive analytics solution such as data ingestion, data cleansing, data exploration, feature engineering, model selection and evaluation of model results
–learn how to use machine learning with big data scenarios using tools like Hadoop and SQL Server to process and work with such data.
Finding and classifying the mentions of the things named in text, often called Named Entity Recognition or NER, is a fundamental task in many search and analysis applications. Mature, robust NER technology is available for many languages and domains, from people, places, and products, to diseases, genes, and molecules. However, for emerging tasks like knowledge-base construction, mentions alone are insufficient.
In this presentation we’ll explore techniques that go beyond names to:
link mentions to one another and to rich knowledge sources like Wikidata
discover and characterise the relationships between entities that are explicit in the text
And we’ll discuss some of the most important practical implications of these advancements for open data science.
According to Credit Suisse’s Gender 3000 report, at the end of 2013, women accounted for 12.9% of top management in 3000 companies across 40 countries. However, since 2009, companies with women as 25-50% of their management team
returned 22-29%. If companies with women in management outperform so dramatically, what would happen if you invested in women-led companies? Karen Rubin will explore this question and share her findings after running a 12 year investment simulation.
Data science allows us to turn a dark forest into a world of
perpetual twilight by giving us the tools to better understand the data that surrounds us. Unfortunately, in this world of twilight we still need a flashlight to get a clean crisp image of our immediate surroundings. We will talk about how to use deep domain expertise as that flashlight shedding light on our understanding of data. Our focus will be on using text analysis as a means to examine qualitative information in a structured, quantitative way. We will draw heavily from examples in complex central bank policy and financial regulation.
Open Source Tools & Data Science Competitions odsc
This talk shares the presenter’s experience with open source tools in data science competitions. In the past several years Kaggle and other competitions have created a large online community of data scientists. In addition to competing with each other for fame and glory, members of this community also generously share knowledge, insights using forum and open source code. The open competition and sharing have resulted in rapid progress in the sophistication of the entire community. This presentation will briefly cover this journey from a competitor’s perspective, and share hands on tips on some open source tools proven popular and useful in recent competitions.
scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry.
scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models.
The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline.
We will also cover how to build machine learning models on text data, and how to handle very large datasets.
Bridging the Gap Between Data and Insight using Open-Source Toolsodsc
Despite the proliferation of open-source tools for analysis (such as Python and R) and those used for visualization
(such as Javascript / D3), there often exist significant gaps between these areas, and those of us trying to navigate the complete arc from data to insight can encounter many obstacles along the way. Fortunately, in recent years there have been many efforts to fill these needs, and today distilling a meaningful visualization from raw data is faster and easier than ever before.
In this talk we will use will use examples in geospatial analysis and visualization to illustrate how to open-source tools like Python, geopandas, and TileMill work together. Using examples from the RunKeeper mobile app we will show how we currently use these tools to understand better our customers and their data, and to communicate
with our colleagues, external partners, and the data community at large.
Human-generated text may be the next frontier for big data analysis, but we humans are complicated beasts and the text we generate is messy and complicated in ways that can confound analysis. We’ll describe the top ten mistakes people make when they start doing text analysis, and hopefully save you from making a few of these mistakes yourself.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
1. O P E N
D A T A
S C I E N C E
C O N F E R E N C E_ BOSTON
2015
@opendatasci
Victor S.Y. Lo
May, 2015
Machine Learning Based
Personalization Using
Uplift Analytics:
Examples and Applications
Uplift Modeling Workshop
2. Outline
Why do we need Uplift modeling? 10 min
Various methods for Uplift modeling 30 min
Break 5 min
Direct response vs. Uplift modeling 10 min
Prescriptive Analytics for Multiple Treatments 20 min
Q&A 10 min
2
4. 4
0%
2%
4%
6%
8%
10%
1 2 3 4 5 6 7 8 9 10
Decile
Response rate
Average2.5%
Top decile lift (over random) = 4 times
Top 3 deciles lift = 2.6 times
Big Lift
Modelers: VERY SUCCESSFUL MODEL!
Response Modeling
5. 5
Top 3 Deciles Random
Treatment 6.7% 2.5%
Control 6.7% 2.5%
Lift 0.0% 0.0%
Campaign Results
No Lift
Marketers: VERY DISAPPOINTING!
Modelers:
Not my problem, it is the mail design!
7. A successful response model
1 2 3 4 5 6 7 8 9 10
A successful marketing campaign
What’s wrong with this picture?
3.3%
2.7%
3.0%
2.3%
1.7%
2.0%
Test 1 Test 2 Total
Treatment Response Rate
Control Response Rate
7
14%
7%
4%
2%
1% 1% 0% 0% 0% 0%
1 2 3 4 5 6 7 8 9 10
Decile
Incidence of Treatment
Responders
0%
50%
100%
0% 50% 100%
PctofTreatmentResponders
Pct of Treatment Group
CUME Pct of Responders Random
DM LIFT? DM LIFT?
8. Motivation
Based on the following campaign result, which of the customer
groups is the best for future targeting ?
Treatment Control Difference
<35 0.5% 0.2% 0.3%
35-60 2.5% 0.5% 2.0%
>60 3.5% 2.5% 1.0%
Age
Response Rate By Age and Treatment/Control
• >60 has the highest response rate – treatment-only
focus (common practice)
• 35-60 has the highest Lift (most positively influenced by
the treatment) 8
9. Framework for Causal and Association
Analysis
9
Causal
Inference
(Lift Analysis,
Average Treatment
Effect)
Uplift
Modeling
(Heterogeneous
Treatment Effect,
Effect Modification)
Reporting /
Summary
Statistics
Response
Modeling /
Propensity
Modeling
Population /
Sub-population
Personalized
FromAssociationtoCausality
Granularity
10. 1 2 3 4 5 6 7 8 9 10
Decile
Treatment "Responders"
Control "Responders"
1 2 3 4 5 6 7 8 9 10
Decile
Treatment "Responders"
Control "Responders"
The Uplift Model Objective
Maximize the Treatment responders while minimizing
the control “responders”
10
True
lift
True
lift
A standard response model A uplift response model
(Ideal)
Hypothetical data
11. Traditional Approach Uplift Modeling
Uplift Approaches
Previous campaign data
Control Treatment
Training
data set
Holdout
data set
Model
Previous campaign data
Control Treatment
Training
data set
Holdout
data set
Model Source: Lo (2002)
11
12. Uplift model solutions
0. Baseline results: Standard response model –
treatment-only (as a benchmark)
1. Two Model Approach: Take difference of two
models, Treatment Minus Control
2. Treatment Dummy Approach: Single combined
model using treatment interactions
3. Four Quadrant Method
12
13. Method 1: Two Model Approach:
Treatment - Control
Model 1 predicts P(R | Treatment)
Model Sample = Treatment Group
Model 2 predicts P(R | no Treatment)
Model Sample = Control Group
Final prediction of lift =
Treatment Response Score – Control Response Score
Pros: simple concept, familiar execution (x2)
Cons: indirectly models uplift, the difference may be only noise, 2x
the work, scales may not be comparable, 2x the error, variable
reduction done on indirect dependent vars
13
14. Method 2: Treatment Dummy Approach, Lo (2002)
1. Estimate both E(Yi|Xi;treatment) and E(Yi|Xi;control) and use a
dummy T to differentiate between treatment and control:
Linear logistic regression:
2. Predict the lift value (treatment minus control) for each individual:
)iTiXδ'iγTiXβ'exp(α1
)iTiXδ'iγTiXβ'exp(α
)iX|iE(YiP
)
i
Xβ'exp(α1
)
i
Xβ'exp(α
)
i
Xδ'
i
Xβ'γexp(α1
)
i
Xδ'
i
Xβ'γexp(α
control|iPtreatment|iP
i
Lift
Pros: simple concept, tests for presence of interaction effects
Cons: multicollinearity issues 14
15. Method 3: Four Quadrant Method
Model predicts probability of being in one
of four categories
Dependent variable outcome (nominal)
= TR, CR, TN, or CN
Model Population = Treatment &
Control groups together
Prediction of lift:
15
Pros: only one model required; more “success cases” to model after
Cons: not that intuitive…
Response
Yes No
Treatment
Yes TR TN
No CR CN
𝒁 𝒙 =
𝟏
𝟐
[
𝑷 𝑻𝑹 𝒙
𝑷 𝑻
+
𝑷 𝑪𝑵 𝒙
𝑷 𝑪
−
𝑷 𝑻𝑵 𝒙
𝑷 𝑻
−
𝑷 𝑪𝑹 𝒙
𝑷 𝑪
]
Lai (2006) generalized by Kane, Lo, Zheng (2014)
16. 16
Gini and Top 15% Gini in Holdout Sample
Source: Kane, Lo, and Zheng (2014)
17. Simulated Example:
Charity Donation
17
80-20% split between treatment and control
Randomly split into training (300K) and holdout (200K)
Predictors available:
Age of donor
Frequency – # times a donation was made in the past
Spent – average $ donation in the past
Recency – year of the last donation
Income
Wealth
18. 18
Holdout Sample Performance
Lift Chart on Simulated Data
Theoretical model: Two logistics for treatment and control
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Baseline Two model Lo (2002) Four Quadrant (KLZ) Random
19. 19
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Baseline Random
Two Model Approach Treatment Dummy Approach
Four Quadrant (KLZ)
Gains Chart on Simulated Data
Gini Gini 15% Gini repeatability (R^2)
Baseline 5.6420 0.5412 0.7311
Method 1: Two Model approach 6.0384 0.7779 0.7830
Method 2: Lo(2002), Treatment Dummy 6.0353 0.7766 0.7836
Method 3: Four Quadrant Method (or KLZ) 5.9063 0.7484 0.7884
20. Online Merchandise Data
20
From blog.minethatdata.com, with women’s merchandise
online visit as response
50-50% split between treatment and control (43K in total)
Randomly split into training (70%) and holdout (30%)
Predictors available:
• Recency
• Dollar spent last year
• Merchandise purchased last year (men’s, women’s, both)
• Urban, suburban, or rural
• Channel – web, phone, or both for purchase last year
21. -0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Baseline Lo(2002) trt dummy
Two model approach Four Quadrant (KLZ)
Random
Holdout Sample Performance
21
Lift Chart on Email Online Merchandise Data
22. 22
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Baseline Random
Two Model Approach Treatment Dummy Approach
Four Quadrant (KLZ)
Gains Chart on Email Online Merchandise Data
Gini Gini15% Ginirepeatability(R^2)
Baseline 1.8556 -0.0240 0.2071
Method1: Two Modelapproach 2.0074 0.0786 0.2941
Method2: Lo(2002),TreatmentDummy 2.4392 0.0431 0.2945
Method3: FourQuadrantMethod(orKLZ) 2.3703 0.2288 0.3290
23. Ideal Conditions for Uplift Modeling
A randomized control group is withheld!
Treatment does not cause all “responses,” i.e.
control response rate > 0
Natural Response is not highly correlated to Lift
Lift Signal-to-Noise ratio (Lift/control rate) is
large enough
23
30. Case II:
Optimization of Multiple
Treatments -
From Predictive Analytics to
Prescriptive Analytics
30
31. 31
A (or B)Improving
Targeting
No Targeting,
Single Treatment: A (or B)
Individual Level
Targeting - Model-based
No Targeting,
Single Best Treatment
for all individuals
Improving Treatment
Best of A and B
A
B
2) Target Selection
4) Optimal Treatment
for Each Individual
3) One Size Fits All
1) Random Targeting
From Random Selection to Optimization
32. 32
Maximize
𝑖=1
𝑛
𝑗=1
𝑚
△ 𝑝𝑖𝑗 𝑥𝑖𝑗
Subject to:
𝑖=1
𝑛
𝑗=1
𝑚
𝑐𝑖𝑗 𝑥𝑖𝑗 ≤ 𝐵, Budget Constraint
𝑗=1
𝑚
𝑥𝑖𝑗 ≤ 1, for 𝑖 = 1, … , 𝑛,
𝑥𝑖𝑗 = 0 or 1, 𝑖 = 1, … , 𝑛; 𝑗 = 1, … , 𝑚.
where △ 𝑝𝑖𝑗 = estimated lift value for individual i and treatment j,
𝑥𝑖𝑗 (decision variable) = 1 if treatment j is assigned to individual i and 0
otherwise; and 𝑐𝑖𝑗= cost of promoting treatment j to i.
Integer Program Formulation
E.g., size of target population = 30 million, # treatment combinations = 10,
then # decision variables = 300 millions, and
total # possible combinations without constraints = 2300,000,000!
33. 33
A Heuristic Algorithm
1. Perform cluster analysis of the m model-based
lift scores in the holdout sample
2. Compute cluster-level lift score for each
treatment, using sample mean differences
3. Apply cluster solution to new data (for a future
marketing program)
4. Solve a linear programming model to optimize
treatment assignment at the cluster-level
Source: Lo and Pachamanova (2015)
34. 34
Maximize
𝑐=1
𝐶
𝑗=1
𝑚
△ 𝑝 𝑐𝑗 𝑥 𝑐𝑗
Subject to:
𝑐=1
𝐶
𝑗=1
𝑚
𝑐𝑗 𝑥 𝑐𝑗 ≤ 𝐵𝑢𝑑𝑔𝑒𝑡, Budget Constraint
𝑗=1
𝑚
𝑥 𝑐𝑗 ≤ 𝑁𝑐, for 𝑐 = 1, … , 𝐶, Cluster Size Constraint,
and
𝑥 𝑐𝑗 ≥ 0, 𝑐 = 1, … , 𝐶; 𝑗 = 1, … , 𝑚,
where 𝑥 𝑐𝑗 = # individuals in cluster c to receive treatment j,
𝑐𝑗 = cost of treatment j for each individual.
Becomes A Much Simpler Optimization Problem
Can be solved by Excel Solver
37. 37
CLU
STER
Cluster
Size in
New
Data
Obs. Lift
in
response:
Men's
Obs. Lift
in
response:
Women's
Cost
per
treatme
nt ($)
Decision
var on
number
of men's
Decision
var on
number
of
women's
Total
number
of
treated
by
cluster
Overa 0.07408 0.0438631 4,180 0.1587 0.0224 1 4,180 - 4,180
2 5,650 0.0652 -0.0055 1 - - -
4 60,220 0.0658 0.0628 1 2,340 - 2,340
5 12,370 0.1290 0.0618 1 12,370 - 12,370
6 8,940 0.0672 0.0760 1 - 8,940 8,940
7 29,240 0.0519 0.0213 1 - - -
8 28,070 0.0868 0.0254 1 28,070 - 28,070
9 4,100 0.2249 0.0239 1 4,100 - 4,100
10 37,080 0.0572 0.0426 1 - - -
Total 189,850 obj value 5,773 680 6,453
cost $51,060 $ 8,940 $60,000
Budget 60,000$
Linear Programming Solution from Excel Solver
38. 38
Stochastic Optimization
Lift estimates can have high degree of uncertainty, stochastic
optimization solutions take the uncertainty into account:
Stochastic Programming
Robust Optimization
Mean Variance Optimization
40. Conclusion
40
• Uplift is a very impactful emerging subfield
• Deserves more R&D
• Extensions are plenty (Lo (2008)):
• Multiple treatments
• Optimization
• Non-randomized experiments
• Direct tracking
• Applications in other fields
• E.g. Potter (2013), Yong (2015)
41. References
Cai, T., Tian, L., Wong, P., and Wei, L.J. (2011), “Analysis of Randomized Comparative Clinical Trial Data for Personalized Treatment,” Biostatistics, 12:2, p.270-282,
Collins, F.S. (2010), The Language of Life: DNA and the Revolution in Personalized Medicine, HarperCollins.
Conrady, S. and Jouffe, L. (2011). “Causal Inference and Direct Effects,” Bayesia and Conrady Applied Science, at http://www.conradyscience.com/index.php/causality
Freedman, D. (2010). Statistical Methods and Causal Inference. Cambridge.
Hamburg, M.A. and Collins, F.S. (2010). “The path to personalized medicine.” The New England Journal of Medicine, 363;4, p.301-304.
Haughton, D. and Haughton, J. (2011). Living Standards Analytics, Springer.
Holland, C. (2005). Breakthrough Business Results with MVT, Wiley.
Kane, K., Lo, V.S.Y., and Zheng, J. (2014) “Mining for the Truly Responsive Customers and Prospects Using True-Lift Modeling: Comparison of New and Existing
Methods.” Journal of Marketing Analytics, v.2, Issue 4, p.218-238.
Lai, Lilly Y.-T. (2006) Influential Marketing: A New Direct Marketing Strategy Addressing the Existence of Voluntary Buyers. Master of Science thesis, Simon Fraser
University School of Computing Science, Burnaby, BC, Canada.
Lo, V.S.Y. (2002) “The True Lift Model – A Novel Data Mining Approach to Response Modeling in Database Marketing.” SIGKDD Explorations 4, Issue 2, p.78-86, at:
http://www.acm.org/sigs/sigkdd/explorations/issues/4-2-2002-12/lo.pdf
Lo, V.S.Y. (2008), “New Opportunities in Marketing Data Mining," in Encyclopedia of Data Warehousing and Mining, Wang (2008) ed., 2nd edition, Idea Group Publishing.
Lo, V.S.Y. and D. Pachamanova (2015), “A Practical Approach to Treatment Optimization While Accounting for Estimation Risk,” Technical Report.
McKinney, R.E. et al. (1998),”A randomized study of combined zidovudine-lamivudine versus didanosine monotherapy in children with sympotomatic therapy-naïve HIV-1
infection,” J. of Pediatrics,133, no.4, p.500-508.
Mehr, I.J. (2000), “Pharmacogenomics and Industry Change,” Applied Clinical Trials, 9, no.5, p.34,36.
Morgan, S.L. and Winship C. (2007). Counterfactuals and Causal Inference. Cambridge University Press.
Pearl, J. (2000), Causality. Cambridge University Press.
Potter, Daniel (2013) Pinpointing the Persuadables: Convincing the Right Voters to Support Barack Obama. Presented at Predictive Analytics World; Oct, Boston, MA;
http://www.predictiveanalyticsworld.com/patimes/pinpointing-the-persuadables-convincing-the-right-voters-to-support-barack-obama/ (available with free subscription).
Radcliffe, N.J. and Surry, P. (1999). “Differential response analysis: modeling true response by isolating the effect of a single action,” Proceedings of Credit Scoring and
Credit Control VI, Credit Research Centre, U. of Edinburgh Management School.
Radcliffe, N.J. (2007). “Using Control Groups to Target on Predicted Lift,” DMA Analytics Annual Journal, Spring, p.14-21.
Robins, J.M. and Hernan, M.A. (2009), “Estimation of the Causal Effects of Time-Varying Exposures,” In Fitzmaurice G., Davidian, M,, Verbeke, G., and Molenberghs, G.
eds. (2009) Longitudinal Data Analysis, Chapman & Hall/CRC, p.553 – 399.
Rosenbaum, P.R. (2002), Observational Studies. Springer.
Rosenbaum, P.R. (2010), Design of Observational Studies. Springer.
Rubin, D.B. (2006), Matched Sampling for Causal Effects. Cambridge University Press.
Rubin, D.B. (2008), “For Objective Causal Inference, Design Trumps Analysis,” The Annals of Applied Statistics, p.808-840.
Rubin, D.B. and Waterman, R.P. (2006), “Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology,” Statistical Science, p.206-222.
Russek-Cohen, E. and Simon, R.M. (1997), “Evaluating treatments when a gender by treatment interaction may exist,” Statistics in Medicine, 16, issue 4, p.455-464.
Signorovitch, J. (2007), “Estimation and Evaluation of Regression for Patient-Specific Efficacy,” Harvard School of Public Health working paper.
Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search, 2nd edition, MIT Press.
Wikipedia (2010), “Uplift Modeling,” at http://en.wikipedia.org/wiki/Uplift_modelling
Yong, Florence H. (2015), “Quantitative Methods for Stratified Medicine,” PhD Dissertation, Department of Biostatistics, Harvard T.H. Chan School of Public Health,
41