Multi object Deep reinforcement learning

•Download as PPTX, PDF•

0 likes•381 views

This document discusses multi-objective reinforcement learning and introduces Deep OLS Learning, which combines multi-objective learning with deep Q-networks. It presents Deep OLS Learning with Partial Reuse and Full Reuse to handle multi-objective Markov decision processes by finding a convergence set of policies that optimize multiple conflicting objectives, such as maximizing server performance while minimizing power consumption. The approach is evaluated on multi-objective versions of mountain car and deep sea treasure problems.

Data & Analytics

Multi Objective DRL
https://arxiv.org/pdf/1610.02707.pdf
Dealing Multi Criteria

Before start
Single-objective Markov Decision Process
optimal action-value function ( Q function )
Deep Q Network
Minimizing loss

Before start
Multi-Objective MDP
rl : maximization of sum of received scalar signal
Linear case
http://roijers.info/motutorial.html

Before start
Multi-Objective MDP
Multiple policy

Before start
Optimistic Linear Support (OLS)
Linear Support

Introduction
DQN, AlphaGo, Robot Control
=> Focus on Single-objective
In real world -> multiple conflicting objectives
Maximize
performance of server
Minimize
Power consumption
=> No single optimal policy
=> Convergence set with (multiple policies)

Method
Deep OLS learning : multi objective learning + DQN
-> Deep OLS Learning with Partial (DOL-PR), Full Reuse (DOL-FR)
> Deep OLS Learning
Just Do deep Q-Learning on each step

Method
Deep OLS learning : multi objective learning + DQN
-> Deep OLS Learning with Partial (DOL-PR), Full Reuse (DOL-FR)
> Deep OLS Learning
If max value
https://github.com/hossam-mossalam/multi-
objective-deep-rl
DOL
DOL-FR
DOL-PR

Experiment
multi-objective mountain car problem
single-objective
variant is −1 for all
time steps and 0
when the goal is
reached
fuel consumption for
each time step, which
is proportional to the
force exerted by the
car

Experiment
Deep Sea Treasure
a treasure value was
received
a time penalty of −1
for each time-step

Similar to Multi object Deep reinforcement learning

Alexander Dymo - RailsConf 2014 - Improve performance: Optimize Memory and Up...Alexander Dymo

Puppet Camp Charlotte 2015: Exporting Resources: There and Back AgainPuppet

Building Deep Learning Workflows with DL4JJosh Patterson

Under the Hood 11g Identity ManagementInSync Conference

A Journey from Oracle to PostgreSQLEDB

Module Owb TuningNicholas Goodman

11g Identity Management - InSync10Peter McLarty

Php Site OptimizationAmit Kejriwal

Performance testing in scope of migration to cloud by Serghei RadovValeriia Maliarenko

Why is dev ops for machine learning so differentRyan Dawson

Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Cedric CARBONE

Apache Falcon _ Hadoop User Group France 22-sept-2014Modern Data Stack France

DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala

Emerging technologies /frameworks in Big DataRahul Jain

Open world exadata_top_10_lessons_learnedchet justice

FreeSWITCH as a MicroserviceEvan McGee

PHPUnit your bug exterminatorrjsmelo

Tips and Tricks for SAP Sybase ASEDon Brizendine

Open stack HA - Theory to RealitySriram Subramanian

Advanced Ops Manager TopicsMongoDB

Similar to Multi object Deep reinforcement learning (20)

Alexander Dymo - RailsConf 2014 - Improve performance: Optimize Memory and Up...

Puppet Camp Charlotte 2015: Exporting Resources: There and Back Again

Building Deep Learning Workflows with DL4J

Under the Hood 11g Identity Management

A Journey from Oracle to PostgreSQL

Module Owb Tuning

11g Identity Management - InSync10

Php Site Optimization

Performance testing in scope of migration to cloud by Serghei Radov

Why is dev ops for machine learning so different

Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)

Apache Falcon _ Hadoop User Group France 22-sept-2014

DevoxxUK: Optimizating Application Performance on Kubernetes

Emerging technologies /frameworks in Big Data

Open world exadata_top_10_lessons_learned

FreeSWITCH as a Microservice

PHPUnit your bug exterminator

Tips and Tricks for SAP Sybase ASE

Open stack HA - Theory to Reality

Advanced Ops Manager Topics

Recently uploaded

Ranking and Scoring Exercises for ResearchRajesh Mondal

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940

Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh +966572737505 get cytotec

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi

Introduction to Statistics Presentation.pptxAniqa Zai

Case Study 4 Where the cry of rebellion happen?RemarkSemacio

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv

Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh

Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark

Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh +966572737505 get cytotec

DS Lecture-1 about discrete structure .pptTanveerAhmed817946

Northern New England Tableau User Group (TUG) May 2024patrickdtherriault

如何办理(UPenn毕业证书）宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1

jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech

Recently uploaded (20)

Ranking and Scoring Exercises for Research

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia

Abortion pills in Jeddah | +966572737505 | Get Cytotec

Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...

Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec

Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...

Introduction to Statistics Presentation.pptx

Case Study 4 Where the cry of rebellion happen?

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj

社内勉強会資料_Object Recognition as Next Token Prediction

如何办理(UCLA毕业证书）加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样

Simplify hybrid data integration at an enterprise scale. Integrate all your d...

Harnessing the Power of GenAI for BI and Reporting.pptx

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格

Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit

DS Lecture-1 about discrete structure .ppt

Northern New England Tableau User Group (TUG) May 2024

如何办理(UPenn毕业证书）宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证

jll-asia-pacific-capital-tracker-1q24.pdf

Multi object Deep reinforcement learning

1. Multi Objective DRL https://arxiv.org/pdf/1610.02707.pdf Dealing Multi Criteria

2. Before start Single-objective Markov Decision Process optimal action-value function ( Q function ) Deep Q Network Minimizing loss

3. Before start Multi-Objective MDP rl : maximization of sum of received scalar signal Linear case http://roijers.info/motutorial.html

4. Before start Multi-Objective MDP Multiple policy

5. Before start Optimistic Linear Support (OLS) Linear Support

6. Introduction DQN, AlphaGo, Robot Control => Focus on Single-objective In real world -> multiple conflicting objectives Maximize performance of server Minimize Power consumption => No single optimal policy => Convergence set with (multiple policies)

7. Method Deep OLS learning : multi objective learning + DQN -> Deep OLS Learning with Partial (DOL-PR), Full Reuse (DOL-FR) > Deep OLS Learning Just Do deep Q-Learning on each step

8. Method Deep OLS learning : multi objective learning + DQN -> Deep OLS Learning with Partial (DOL-PR), Full Reuse (DOL-FR) > Deep OLS Learning If max value https://github.com/hossam-mossalam/multi- objective-deep-rl DOL DOL-FR DOL-PR

9. Experiment multi-objective mountain car problem single-objective variant is −1 for all time steps and 0 when the goal is reached fuel consumption for each time step, which is proportional to the force exerted by the car

10. Experiment Deep Sea Treasure a treasure value was received a time penalty of −1 for each time-step

Multi object Deep reinforcement learning

Recommended

Recommended

More Related Content

Similar to Multi object Deep reinforcement learning

Similar to Multi object Deep reinforcement learning (20)

More from Dong Heon Cho

More from Dong Heon Cho (20)

Recently uploaded

Recently uploaded (20)

Multi object Deep reinforcement learning