Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Criteo AI Lab: from applied to fundamental AI

23 views

Published on

Jérémie Mary (Criteo) at the International Workshop Machine Learning and Artificial Intelligence

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Criteo AI Lab: from applied to fundamental AI

  1. 1. Jeremie Mary, 17/09/18 From applied to fundamental research
  2. 2. Copyright © 2018 Criteo AI applied to Criteo Dynamic Retargeting since 2008 Universal Match One user profile across all devices Product Recommendations Kinetic Design Predictive Bidding Chooses the right products to display Chooses the right look and feel for the banners in real time Personalized Ads Optimized Performance Chooses the right users / advertiser / publisher to display  eCPM = CPC*pCTR*pCR*pOV 1 3 2 4 Optimized on CTR + CR + Order Value
  3. 3. Copyright © 2018 Criteo Outline 1. Fusion of modalities 2. Auction theory meets Machine Learning 3. Hot topics
  4. 4. Copyright © 2018 Criteo Fusion of heterogeneous data Problem How to build a predictor based on completly different kind of data ? e.g. pictures and texts and you want to predict the interest of the user for the item. Your favorite neural network for pictures (Resnet?) Some description text or tags Your favorite neural network for this (BiGRU with GA?) E m b e d d i n g E m b e d d i n g Prediction 1 Prediction 2 Vote! or average
  5. 5. Copyright © 2018 Criteo Fusion of heterogeneous data Problem How to build a predictor based on completly different kind of data ? e.g. pictures and texts and you want to predict the interest of the user for the item. Your favorite neural network for pictures (Resnet?) What is the color of the cat? Your favorite neural network for this (BiGRU with GA?) E m b e d d i n g E m b e d d i n g Prediction M e r g e Is it actually good to build the embeddings independantly ?
  6. 6. Copyright © 2018 Criteo Idea Batch Norm Parameters In a good network activation of neurons thought the data should be similar [1]. This was introduced as a reparametrization trick to ensure faster convergence [1] I. Sergey and S. Christian. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML, 2015.
  7. 7. Copyright © 2018 Criteo Few parameters but… While Number of batch norms parameters is usually 0.2 to 5% of the net, their impact on the output is huge [2] [2] V. Dumoulin, J. Shlens, and M. Kudlur. A Learned Representation For Artistic Style. In Proc. of ICLR, 2017.
  8. 8. Copyright © 2018 Criteo An alternative way to fuse modalities Image Text
  9. 9. Copyright © 2018 Criteo … and this work well on VQA [13] Modulating early visual processing by language. H De Vries, F Strub, J Mary, H Larochelle, O Pietquin, AC Courville, NIPS’17
  10. 10. Copyright © 2018 Criteo And actually change the embedding construction from
  11. 11. Copyright © 2018 Criteo And actually change the embedding construction to
  12. 12. Copyright © 2018 Criteo Doing it using several states of the RNN
  13. 13. Copyright © 2018 Criteo ReferIt / Guesswhat oracle problem
  14. 14. Copyright © 2018 Criteo ReferIt / Guesswhat oracle Visual Reasoning with a Multi-hop FiLM Generator Florian Strub, Mathieu Seurin, Ethan Perez, Harm De Vries, Jeremie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin
  15. 15. Copyright © 2018 Criteo Cherry Picking
  16. 16. Copyright © 2018 Criteo Cherry Picking Failures
  17. 17. Copyright © 2018 Criteo Outline 1. Fusion of modalities 2. Auction theory meets Machine Learning 3. Hot topics
  18. 18. Copyright © 2018 Criteo We are a bidding company More than 300 billion of bids a day. Less than 10ms to make a price. 1 seller with 1 item n bidders, bidder i has private valuation vi “valuation” = maximum willingness-to-pay “private” = initially known only to bidder i Second-price auction collect bid bi from each bidder i winner = highest bidder price = second-highest bid Very often our price is way higher than the competion. Theorem: renders truthful bidding a dominant strategy Problem
  19. 19. Copyright © 2018 Criteo Reserve Prices (Seller point of view) Will extract more $$$ at the cost of not selling some displays How to choose it ? Assumptions: •Bidder’s valuation v drawn from distribution F. (F known to seller, v unknown) •Seller aims to maximize expected revenue (w.r.t. v~F) Solution: offer r* = argmaxr≥0 r  (1-F(r)) revenue of a sale probability of a sale
  20. 20. Copyright © 2018 Criteo Reserve price with several bidders Theorem : [Myyerson 81] With n symmetric iid bidders, for second price auction with reserve contributing to revenue, the revenue maximizing reserve price is independant of the number of bidders Theorem: [Bulow-Klemperer 96]: for every n: expected revenue ≥ expected revenue of reserve price 0 of monopoly reserve [with (n+1) i.i.d. bidders] [with n i.i.d. bidders]
  21. 21. Copyright © 2018 Criteo Personalized reserves1… Theorem [Hartline/Roughgarden 09]: for any valuation distributions F1,...,Fn: ≥ expected revenue with monopoly reserves (ri = monopoly price for Fi) 50% of expected revenue of Myerson’s optimal auction for F1,...,Fn 1 Yes the bidder can loose the auction while having the highest bid
  22. 22. Copyright © 2018 Criteo In real bidding F is unknown and is estimated from the bids. Done by [Ostrovsky/Schwarz 09] at Yahoo Analysis leads to some finite time ML style bounds by [Morgenstern/Roughgarden 15,16]. Typically requires O(n log n) samples in the multiple bidders setting to achieve expected revenue within ε of best possible. This assume the bidders to reveal their true value
  23. 23. Copyright © 2018 Criteo One strategic bidder setting A two stage game. First day: the seller receives billions of bids from the bidders. (we do not consider any approximation error). Second day: she sets for each bidder their reserve price as the exact monopoly price computed on the bids she received during the first stage. we denote by F1, ..., FN the distribution of the bidders. We assume bidder 1 is strategic and the others continue to bid truthfully. G is the distribution of the maximum value of the competitors of bidder 1. On all illustration true distribution of values is U[0;1]
  24. 24. Copyright © 2018 Criteo Myerson lemma Defining virtual values Suppose bidder i has values Xi with distribution Fi and associated density fi . fi is assumed to be positive on the support of Xi . For any incentive compatible auction, when G represents the distribution of the bids faced by user i, we have, if r is the reserve price set by the seller, regardless of whether ψi is increasing.
  25. 25. Copyright © 2018 Criteo Visualization of Myerson’s lemma
  26. 26. Copyright © 2018 Criteo ß shading The payoff of the strategic bidder using the strategy β (ψB denotes the virtual value associated to the new distribution of bid) is: And we can remark: find a « good » ψB and then the corresponding β.
  27. 27. Copyright © 2018 Criteo Which is the nicest ?
  28. 28. Copyright © 2018 Criteo Thresholded virtual value Just solve On the uniform example this is And identity for >0.5
  29. 29. Copyright © 2018 Criteo Comparision of revenue • the strategic bidder payoff increases from 0.083 to 0.132 (a 59% increase !! • the payoff of the truthful bidder remains unchanged. • item the payoff of the seller remains unchanged. • In particular, the seller does not lose money. • welfare increases from 0.583 to 0.632. (a 8% increase!!)
  30. 30. Copyright © 2018 Criteo More on the topic Does it cost something to the strategic bidder during the learning stage of the auctioneer: No ! Since the strategy only changes bids below the reserve price, the strategic bidders pay nothing to try to convince the seller to decrease the reserve price. Can we do better Yes! We only presented the simplest way to improve a bidding strategy. There exist some better strategies that lead to even higher payoffs. In this setting, can we find a Nash equilibrium when all the bidders become strategic? : Yes! Are our proposed strategies stable against some approximation error of the seller? Yes! Thresholding the virtual value: a simple method to increase welfare and lower reserve prices in online auction systems Thomas Nedelec, Marc Abeille, Clément Calauzènes, Noureddine El Karoui, Benjamin Heymann, Vianney Perchet Explicit shading strategies for repeated truthful auctions. arXiv preprint arXiv:1805.00256, 2018 Marc Abeille, Clement Calauzenes, Noureddine El Karoui, Thomas Nedelec, Vianney Perchet.
  31. 31. Copyright © 2018 Criteo Outline 1. Fusion of modalities 2. Auction theory meets Machine Learning 3. Hot topics
  32. 32. Copyright © 2018 Criteo 3 Recommend er Systems • Users can get bored seeing similar movies over and over • Getting to know a new system can takes time and increase curiosity at first and then decrease it after a while Task scheduling • It might take a while to master a new task so performance increase after being repeated • Repeating always the same task can reduce productivity because of weariness Resource balancing • Always exploiting the same area can diminish returns if population can not growth again A B A B B B A A B A B Alternating Recommender Systems
  33. 33. Copyright © 2018 Criteo 3 | state click probability on A [A,A,B,B,A,A,A,B,B,A] 8.53% [A,B,B,A,B,B,A,B,A,B] 9.12% [B,B,B,B,A,A,A,B,B,A] 8.91% • We use a real-world A/B testing dataset where our model assumptions are no longer satisfied. Users have been exposed to both A and B. We investigate how a long- term policy alternating A and B on the basis of past choices can outperform each solution individually. • simulator: measure click rate probability on a version based on the last w = 10 pulled versions. 𝒔𝒔𝒔𝒔 𝒔𝒔 𝒔𝒔𝒔𝒔 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗𝒗, 𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔𝒔 = 𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩(𝒑𝒑) Compared algorithms • Oracle optimal optimal policy given the true parameters • Oracle greedy greedy policy given the true parameters • UCRL (Auer, Jaksch, and Ortner 2009) considering each action and state independently • linUCRL our algorithm • Only B always play B (click rate of state [B, …, B]) • Only A always play A (click rate of state [A, …, A]) Avg reward on the T steps Avg reward after T=1600 On Criteo’s A/B tests (NIPS’18) Romain Warlop , Alessandro Lazaric, Jeremie Mary
  34. 34. Copyright © 2018 Criteo More • DPPs for basket completion (look at work of Mike Gartrell) • Exploration / Exploration under brownian evolution of the world • GANs • RNNs (and approximations) for session modelization • Causality, Incrementality and offline A/B tests.
  35. 35. Copyright © 2018 Criteo Thank you ! j.mary@criteo.com https://aiaheadofusbycriteoailab.splashthat.com/

×