SlideShare a Scribd company logo
김창연, 고형권, 김동희, 김준호, 송헌, 이민경, 이재윤
DEAR: Deep Reinforcement Learning
for Online Advertising Impression in
Recommender Systems
강화학습을 이용한 온라인 광고 추천 시스템
Introduction
Online Advertising
Problem Statement
온라인 광고추천
물건을 살 만한 소비자에게 광고를 제공하여 

수익을 최대화하고자 함.

목표 지표
•CTR (Click Through Rate)

- 해당 광고 노출 대비 클릭율

•ROI (Return on Investment)

- 광고 투자 대비 순수익률
Online Advertising
Online Advertising
기존 방식
• Guaranteed Delivery (GD)
• 광고 계약자와 정해진 수량만큼 광고를 노출하는 것.

• Real-Time Bidding (RTB)
• 실시간으로 광고 구좌를 판매하고, 낙찰된 광고를 내보내는 것 (AdExchange)
Online Advertising
기존 방식의 한계점
• GD, MTB의 공통적인 문제점

1. o
ffl
ine/static optimization algorithms that treat each impression
independently

2. maximize the immediate revenue for each impression

• 이에 따라 광고 추천에 강화학습을 적용하려는 다양한 시도들이 등장.
기존 RL 모델들의 문제점
• 대부분의 연구들이 수익 극대화에만
초점을 두었음.

• 그러나, 광고가 많아지게 되면 user
experience에 부정적인 영향.
Online Advertising
Online Advertising
광고 추천 시 고려해야 할 사항들
추천 모델링 시 고려 사항
A. 현재 추천 리스트에 광고를 집어넣을 것인가 말 것인가?

B. 광고를 넣는다면 어떤 광고를 넣을 것인가?

C. 해당 광고를 어떤 위치에 집어넣을 것인가?

→ 위의 세 항목은 긴밀한 연관성이 있음.

추천 모델의 목적
D. 광고 수익 극대화

E. 유저 사용에 대해 부정적인 영향 최소화
RL 간단요약
MDP
Q-Learning
DQN
RL 간단요약
MDP (Markov Decision Process)
RL 간단 요약
Q-learning
• Model-Free: 환경에 대한 정보가 전무한 상태로 시작. 맞으면서 배운다

• O
ff
-policy: current policy target policy

• 앞서 말한 value function 중 Q (action value function)을 최대화하는 policy를 찾아내고자 함.

• Exploitation & Exploration
• 대표적으로 e-greedy (일정 확률로 exploit or explorate) -> 정보를 얻어가면서 동시에 최적 경로.

• 각 (s,a)에 대해서 Q를 저장해두면 재귀적으로 적용 가능.
≠
RL 간단요약
DQN
• State, action의 크기가 커지면 테이블로 저장하기 어려워짐.

• Function approximator로써 deep learning network를 이용하게 됨.

• DQN만의 고유 특징

• Replay bu
ff
er

• agent는 관측 값을 얻을 때마다 이를 Replay Buffer에 저장
.

• 저장된 버퍼 (history)에서 배치 만큼 샘플링하여, Q-러닝 업데이트 사용
• Fixed Target network Q`

• Target과 train network가 같이 변해서 학습이 안 되는 것을 방지.
Introduction
다시 논문의 문제로 돌아와서,,,
• 논문에서 해결하고자 하는 문제를 MDP로 표현.

• state : 유저의 browsing history + current reco-list, contextual info at time t

• Action : AA가 3가지 기준에 관해서 고르게 될 action

• Reward : action 에 따른 유저의 피드백 (ad income + user experience)

• Transition Probability 

• Discount Factor
st
at
r(st, at) at
p(st+1 |st, at)
γ
Introduction
논문의 문제 해결
• 궁극적인 목적

• 주어진 historical MDP 에 대해서,

Cumulative reward from user를 최대화하는 policy 를 찾아라.
(
𝒮
,
𝒜
,
𝒫
, ℛ, γ)
π
Q&A
Proposed
Algorithm
State & action feature
DQN architecture
optimization
Proposed Architecture
기존에 사용되던 DQN 모델들
• 모델 (a): 광고의 location은 판별 가능

• 모델 (b): 특정 광고는 고를 수 있음.

특정 광고 + 위치까지 하려면 

→ 두 모델이 exclusive한 관계

• DEAR에서는 

1. 이것을 아우르는 모델 적용.

2. 시간 복잡도 =>
O(|A| ⋅ L)
O(|A| ⋅ L) O(|A|)
DEAR
Processing of State and Action Features
• State feature
• : user’s rec/ads browsing history, contextual information and rec-list of current
request

• rec/ads browsing history: GRU를 이용하여 sequential preference 추출

• contextual information: OS, app version, fedd type 등을 벡터화

• rec-list of current request: concat of L items -> FFN에 집어넣음.

•
st = concat(prec
t , pad
t , ct, rect)
st
rect = tanh(Wrecconcat(rec1, rec2, . . . , recL) + brec)
DEAR
Processing of State and Action Features
• Action Feature
• Action 

• : item feature of a candidate ad

• : location one-hot vector to interpolate the selected ad
at = (aad
t , aloc
t ) ∈
𝒜
aad
t
aloc
t ∈ ℛL+1
DQN Architecture
• Input: state and candidate ad pair 

• Output: action-value (Q-value) corresponding to
L+2 location

1. Where optimal + which optimal
• For given pair , get all Q-values for all
possible locations
(st, aad
t )
(st, aad
t )
aloc
t
DEAR
DQN Architecture
• Input: state and candidate ad pair 

• Output: action-value (Q-value) corresponding to L+2
location

2. Which ad + where ad location + Whether to insert the ad

• 

• Why L+2?

• L+1 + 1(광고 안 넣는 경우)

• Temporal Complexity: =>
(st, aad
t )
Q(st, aad
t )l
 for integers in 0 ≤ l ≤ L + 2
O(|A| ⋅ L) O(|A|)
DEAR
DQN Architecture
• Divide Q function

• Value function 

• Advantage function 

• Why?

• 광고 삽입여부는 주로 에 의해 결정

• 광고 위치는 모든 feature에 의해 결정
V(st)
A(st, aad
t )
st
DEAR
DEAR
Reward function


• AA는 income of ads는 maximize 해야 되고, negative in uence of ads on user
experience는 minimize해야 함!

• if user continues to browse the next list, else 

• : 해당 광고의 revenue / 해당 광고 삽입 시에만 측정하고 그렇지 않으면 0
rt(st, at) = rad
t + α ⋅ rex
t
rex
t = 1 rex
t = 0
rad
t
DEAR
Optimal Action-value function
• 앞서 주어진 조건들을 모두 이용하면, optimal policy에 의한 action value function
구할 수 있음.



t+1의 모든 ads, locations에 대해서 조회해보아야 함.
Q*(st, at)
Q*(st, at) =
𝔼
st+1
[rt + γ max
at+1
Q*(st+1, at+1 |st, at)]
DEAR
모델 학습 과정
• 과거 데이터들을 기반으로 o
ff
-policy를 통해 학습.

• Storing transitions stage: 기존 광고 노출 전략 에 의한 를 저장
하여 replay bu ff
er 생성

• Training stage: replay bu
ff
er로부터 minibatch 를 뽑아서 AA 패러미터
학습.
b(st) (st, at, rt, st+1)
(s, a, r, s′

)
DEAR
Optimization
• 

• 

• target for the current iteration (과거 데이터를 미리 알고 있으므로, 그걸 기반으로 구한 target 값)

• 를 학습할 때에는 target network 를 고정. (Mnih et al. 2013)

•
L(θ) =
𝔼
st,at,rt,st+1
(yt − Q(st, at; θ))2
yt =
𝔼
st+1
[rt + γ max
at+1
Q(st+1, at+1; θT
)|st, at]
L(θ) θT
∇θL(θ) =
𝔼
st,at,rt,st+1
(yt − Q(st, at; θ))∇θQ(st, at; θ)
DEAR
O
ff
-policy Training of DEAR Framework
DEAR
Online Test of DEAR Framework
Q&A
Experiment
Experiment
Dataset
• 기존에 없는 새로운 dataset 만들어서 사용.

• 2019년 3월의 Doyuin (tik-tok의 중국 버전) 데이터 사용

• 시간 순으로 train(70%) / test (30%) split

• normal videos (recommended video) + ad videos

• Normal video feature: id, like score, comment score, follow score, group
score

• Ad video feature: id, image size, bid-price, hidden-cost, predicted-ctr,
predicted-recall
Experiment
Dataset
Experiment
Implementation Detail
• Length of recommendation list = 6 

• Dimensions

• Ad, video features: 60

• : 64, 64, 13, 360, 60

• Discount factor :0.95

• Size of replay bu
ff
er: 10000
L
prec
t , pad
t , ct, rect, aad
t
γ
Experiment
Metrics
A. Accumulated rewards , 

B. Improvement of DEAR

C. Corresponding p-value
R =
T
∑
1
rt rt = rad
t + α ⋅ rex
t
Experiment
Baselines
A. Wide & Deep

Jointly training FFN with embeddings + linear model with feature transformation

B. DeepFM

A의 업그레이드. Feature embedding을 통해 feature engineering 불필요하게 한 모델

C. GRU4Rec

RNN as GRU + Recommendation system

D. HDQN

Hierarchical DQN framework (high-level -> location, low-level -> speci c ad)
Experiment
Result
Component Study
DEAR-1: supervised learning

DEAR-2: GRU -> FCNs

DEAR-3: DQN Fig2-(b)

DEAR-4: One Q function

DEAR-5: random selected ad

DEAR-6: random slot
Experiment
Parameter Sensitivity Analysis
• 의 영향력이 커지면

• 광고를 적게 넣어 유저의 부정적 영
향은 줄지만

• 광고 매출은 자연스럽게 줄어듦.

• Vice versa

• 온라인 플랫폼은 각자의 비즈니스 특
성에 따라 값을 잘 선택해야 함.
Rex
α
Experiment
Q&A
Conclusion
Conclusion
Contribution of DEAR
1. 3 internally related action at the same time.

2. Simultaneously maximize ad revenue, minimize the negative in uence of ad

3. Signi
fi
cantly improve online advertising performance in reco systems.
Conclusion
DEAR 모델의 의문점
1. 모델의 구체적인 구조가 공개되어 있지 않음.

2. 실험 데이터셋 설계에 대한 의문 (기존 데이터에 대한 최적화는 아닌가?)

3. 모델이 아니라 feature로 인한 성능 향상은 아닐까?

1. 대부분의 feature를 자체 예측한 값으로 사용하고 있음.

2. 재현 가능성에 대한 의문

4. 다른 RL 방법론을 썼을 때에 더 좋은 결과가 나오지 않을까? (Policy-based)
THANK YOU

More Related Content

What's hot

Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Dongmin Choi
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
Dat Nguyen
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
Allen Wu
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
wolf
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
Dongmin Choi
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Dongmin Choi
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
Matthew Opala
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
Kai-Wen Zhao
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
Sunghoon Joo
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
太一郎 遠藤
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
Abhinav Dadhich
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
AllenWu
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
Dongmin Choi
 

What's hot (20)

Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Shai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble trackingShai Avidan's Support vector tracking and ensemble tracking
Shai Avidan's Support vector tracking and ensemble tracking
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)
 
Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
crfasrnn_presentation
crfasrnn_presentationcrfasrnn_presentation
crfasrnn_presentation
 

Similar to Dear - 딥러닝 논문읽기 모임 김창연님

Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
Price movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity marketPrice movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity market
Tc. Ying
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Craig Chao
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
Ben Ball
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
Garrett Teoh Hor Keong
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션
DACON AI 데이콘
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
WeCloudData
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
Arvind Rapaka
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn
철민 권
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
Seiya Tokui
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
Babu Priyavrat
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPK
Jeremy Chen
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
PyData
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
Junho Cho
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Databricks
 

Similar to Dear - 딥러닝 논문읽기 모임 김창연님 (20)

Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Price movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity marketPrice movement prediction in Hong Kong equity market
Price movement prediction in Hong Kong equity market
 
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
Leveraging R in Big Data of Mobile Ads (R在行動廣告大數據的應用)
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Understanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-LearnUnderstanding GBM and XGBoost in Scikit-Learn
Understanding GBM and XGBoost in Scikit-Learn
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Supervised Machine Learning in R
Supervised  Machine Learning  in RSupervised  Machine Learning  in R
Supervised Machine Learning in R
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPK
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 

Recently uploaded (20)

Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 

Dear - 딥러닝 논문읽기 모임 김창연님

  • 1. 김창연, 고형권, 김동희, 김준호, 송헌, 이민경, 이재윤 DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems 강화학습을 이용한 온라인 광고 추천 시스템
  • 3. 온라인 광고추천 물건을 살 만한 소비자에게 광고를 제공하여 수익을 최대화하고자 함.
 목표 지표 •CTR (Click Through Rate) - 해당 광고 노출 대비 클릭율 •ROI (Return on Investment) - 광고 투자 대비 순수익률 Online Advertising
  • 4. Online Advertising 기존 방식 • Guaranteed Delivery (GD) • 광고 계약자와 정해진 수량만큼 광고를 노출하는 것. • Real-Time Bidding (RTB) • 실시간으로 광고 구좌를 판매하고, 낙찰된 광고를 내보내는 것 (AdExchange)
  • 5. Online Advertising 기존 방식의 한계점 • GD, MTB의 공통적인 문제점 1. o ffl ine/static optimization algorithms that treat each impression independently 2. maximize the immediate revenue for each impression • 이에 따라 광고 추천에 강화학습을 적용하려는 다양한 시도들이 등장.
  • 6. 기존 RL 모델들의 문제점 • 대부분의 연구들이 수익 극대화에만 초점을 두었음. • 그러나, 광고가 많아지게 되면 user experience에 부정적인 영향. Online Advertising
  • 7. Online Advertising 광고 추천 시 고려해야 할 사항들 추천 모델링 시 고려 사항 A. 현재 추천 리스트에 광고를 집어넣을 것인가 말 것인가? B. 광고를 넣는다면 어떤 광고를 넣을 것인가? C. 해당 광고를 어떤 위치에 집어넣을 것인가? → 위의 세 항목은 긴밀한 연관성이 있음. 추천 모델의 목적 D. 광고 수익 극대화 E. 유저 사용에 대해 부정적인 영향 최소화
  • 8.
  • 10. RL 간단요약 MDP (Markov Decision Process)
  • 11. RL 간단 요약 Q-learning • Model-Free: 환경에 대한 정보가 전무한 상태로 시작. 맞으면서 배운다 • O ff -policy: current policy target policy • 앞서 말한 value function 중 Q (action value function)을 최대화하는 policy를 찾아내고자 함. • Exploitation & Exploration • 대표적으로 e-greedy (일정 확률로 exploit or explorate) -> 정보를 얻어가면서 동시에 최적 경로. • 각 (s,a)에 대해서 Q를 저장해두면 재귀적으로 적용 가능. ≠
  • 12. RL 간단요약 DQN • State, action의 크기가 커지면 테이블로 저장하기 어려워짐. • Function approximator로써 deep learning network를 이용하게 됨. • DQN만의 고유 특징 • Replay bu ff er • agent는 관측 값을 얻을 때마다 이를 Replay Buffer에 저장 . • 저장된 버퍼 (history)에서 배치 만큼 샘플링하여, Q-러닝 업데이트 사용 • Fixed Target network Q` • Target과 train network가 같이 변해서 학습이 안 되는 것을 방지.
  • 13. Introduction 다시 논문의 문제로 돌아와서,,, • 논문에서 해결하고자 하는 문제를 MDP로 표현. • state : 유저의 browsing history + current reco-list, contextual info at time t • Action : AA가 3가지 기준에 관해서 고르게 될 action • Reward : action 에 따른 유저의 피드백 (ad income + user experience) • Transition Probability • Discount Factor st at r(st, at) at p(st+1 |st, at) γ
  • 14. Introduction 논문의 문제 해결 • 궁극적인 목적 • 주어진 historical MDP 에 대해서, Cumulative reward from user를 최대화하는 policy 를 찾아라. ( 𝒮 , 𝒜 , 𝒫 , ℛ, γ) π
  • 15. Q&A
  • 16. Proposed Algorithm State & action feature DQN architecture optimization
  • 17. Proposed Architecture 기존에 사용되던 DQN 모델들 • 모델 (a): 광고의 location은 판별 가능 • 모델 (b): 특정 광고는 고를 수 있음. 특정 광고 + 위치까지 하려면 → 두 모델이 exclusive한 관계 • DEAR에서는 1. 이것을 아우르는 모델 적용. 2. 시간 복잡도 => O(|A| ⋅ L) O(|A| ⋅ L) O(|A|)
  • 18. DEAR Processing of State and Action Features • State feature • : user’s rec/ads browsing history, contextual information and rec-list of current request • rec/ads browsing history: GRU를 이용하여 sequential preference 추출 • contextual information: OS, app version, fedd type 등을 벡터화 • rec-list of current request: concat of L items -> FFN에 집어넣음. • st = concat(prec t , pad t , ct, rect) st rect = tanh(Wrecconcat(rec1, rec2, . . . , recL) + brec)
  • 19. DEAR Processing of State and Action Features • Action Feature • Action • : item feature of a candidate ad • : location one-hot vector to interpolate the selected ad at = (aad t , aloc t ) ∈ 𝒜 aad t aloc t ∈ ℛL+1
  • 20. DQN Architecture • Input: state and candidate ad pair • Output: action-value (Q-value) corresponding to L+2 location 1. Where optimal + which optimal • For given pair , get all Q-values for all possible locations (st, aad t ) (st, aad t ) aloc t DEAR
  • 21. DQN Architecture • Input: state and candidate ad pair • Output: action-value (Q-value) corresponding to L+2 location 2. Which ad + where ad location + Whether to insert the ad • • Why L+2? • L+1 + 1(광고 안 넣는 경우) • Temporal Complexity: => (st, aad t ) Q(st, aad t )l  for integers in 0 ≤ l ≤ L + 2 O(|A| ⋅ L) O(|A|) DEAR
  • 22. DQN Architecture • Divide Q function • Value function • Advantage function • Why? • 광고 삽입여부는 주로 에 의해 결정 • 광고 위치는 모든 feature에 의해 결정 V(st) A(st, aad t ) st DEAR
  • 23. DEAR Reward function • AA는 income of ads는 maximize 해야 되고, negative in uence of ads on user experience는 minimize해야 함! • if user continues to browse the next list, else • : 해당 광고의 revenue / 해당 광고 삽입 시에만 측정하고 그렇지 않으면 0 rt(st, at) = rad t + α ⋅ rex t rex t = 1 rex t = 0 rad t
  • 24. DEAR Optimal Action-value function • 앞서 주어진 조건들을 모두 이용하면, optimal policy에 의한 action value function 구할 수 있음. t+1의 모든 ads, locations에 대해서 조회해보아야 함. Q*(st, at) Q*(st, at) = 𝔼 st+1 [rt + γ max at+1 Q*(st+1, at+1 |st, at)]
  • 25. DEAR 모델 학습 과정 • 과거 데이터들을 기반으로 o ff -policy를 통해 학습. • Storing transitions stage: 기존 광고 노출 전략 에 의한 를 저장 하여 replay bu ff er 생성 • Training stage: replay bu ff er로부터 minibatch 를 뽑아서 AA 패러미터 학습. b(st) (st, at, rt, st+1) (s, a, r, s′  )
  • 26. DEAR Optimization • • • target for the current iteration (과거 데이터를 미리 알고 있으므로, 그걸 기반으로 구한 target 값) • 를 학습할 때에는 target network 를 고정. (Mnih et al. 2013) • L(θ) = 𝔼 st,at,rt,st+1 (yt − Q(st, at; θ))2 yt = 𝔼 st+1 [rt + γ max at+1 Q(st+1, at+1; θT )|st, at] L(θ) θT ∇θL(θ) = 𝔼 st,at,rt,st+1 (yt − Q(st, at; θ))∇θQ(st, at; θ)
  • 28. DEAR Online Test of DEAR Framework
  • 29. Q&A
  • 31. Experiment Dataset • 기존에 없는 새로운 dataset 만들어서 사용. • 2019년 3월의 Doyuin (tik-tok의 중국 버전) 데이터 사용 • 시간 순으로 train(70%) / test (30%) split • normal videos (recommended video) + ad videos • Normal video feature: id, like score, comment score, follow score, group score • Ad video feature: id, image size, bid-price, hidden-cost, predicted-ctr, predicted-recall
  • 33. Experiment Implementation Detail • Length of recommendation list = 6 • Dimensions • Ad, video features: 60 • : 64, 64, 13, 360, 60 • Discount factor :0.95 • Size of replay bu ff er: 10000 L prec t , pad t , ct, rect, aad t γ
  • 34. Experiment Metrics A. Accumulated rewards , B. Improvement of DEAR C. Corresponding p-value R = T ∑ 1 rt rt = rad t + α ⋅ rex t
  • 35. Experiment Baselines A. Wide & Deep Jointly training FFN with embeddings + linear model with feature transformation B. DeepFM A의 업그레이드. Feature embedding을 통해 feature engineering 불필요하게 한 모델 C. GRU4Rec RNN as GRU + Recommendation system D. HDQN Hierarchical DQN framework (high-level -> location, low-level -> speci c ad)
  • 37. Component Study DEAR-1: supervised learning DEAR-2: GRU -> FCNs DEAR-3: DQN Fig2-(b) DEAR-4: One Q function DEAR-5: random selected ad DEAR-6: random slot Experiment
  • 38. Parameter Sensitivity Analysis • 의 영향력이 커지면 • 광고를 적게 넣어 유저의 부정적 영 향은 줄지만 • 광고 매출은 자연스럽게 줄어듦. • Vice versa • 온라인 플랫폼은 각자의 비즈니스 특 성에 따라 값을 잘 선택해야 함. Rex α Experiment
  • 39. Q&A
  • 41. Conclusion Contribution of DEAR 1. 3 internally related action at the same time. 2. Simultaneously maximize ad revenue, minimize the negative in uence of ad 3. Signi fi cantly improve online advertising performance in reco systems.
  • 42. Conclusion DEAR 모델의 의문점 1. 모델의 구체적인 구조가 공개되어 있지 않음. 2. 실험 데이터셋 설계에 대한 의문 (기존 데이터에 대한 최적화는 아닌가?) 3. 모델이 아니라 feature로 인한 성능 향상은 아닐까? 1. 대부분의 feature를 자체 예측한 값으로 사용하고 있음. 2. 재현 가능성에 대한 의문 4. 다른 RL 방법론을 썼을 때에 더 좋은 결과가 나오지 않을까? (Policy-based)