SlideShare a Scribd company logo
1 of 54
Download to read offline
Policy	Gradients	
	
Pieter	Abbeel	–	OpenAI	/	UC	Berkeley	/	Gradescope	
	
Slides	authored	with	John	Schulman	and	Xi	(Peter)	Chen
8:30:	Breakfast	/	Check-in	
9-10:	Core	Lecture	1	Intro	to	MDPs	and	Exact	SoluQon	Methods	(Pieter	Abbeel)	
10-10:10	Brief	Sponsor	Intros	
10:10-11:10	Core	Lecture	2	Sample-based	ApproximaQons	and	FiTed	Learning	(Rocky	Duan)	
11:10-11:30:	Coffee	Break	
11:30-12:30	Core	Lecture	3	DQN	+	variants	(Vlad	Mnih)	
12:30-1:30	Lunch	[catered]	
1:30-2:15	Core	Lecture	4a	Policy	Gradients	and	Actor	Cri<c	(Pieter	Abbeel)	
2:15-3:00	Core	Lecture	4b	Pong	from	Pixels	(Andrej	Karpathy)	
3:00-3:30	Core	Lecture	5	Natural	Policy	Gradients,	TRPO,	and	PPO	(John	Schulman)	
3:30-3:50	Coffee	Break	
3:50-4:30	Core	Lecture	6	Nuts	and	Bolts	of	Deep	RL	ExperimentaQon		(John	Schulman)	
4:30-7	Labs	1-2-3			
7-8	FronQers	Lecture	I:	Recent	Advances,	FronQers	and	Future	of	Deep	RL	(Vlad	Mnih)	
	
Schedule	--	Saturday
Reinforcement	Learning	
[Figure	source:	SuTon	&	Barto,	1998]	
ut
Policy	OpQmizaQon	in	the	RL	Landscape
Policy	OpQmizaQon	
⇡✓(u|s)
ut
[Figure	source:	SuTon	&	Barto,	1998]
Policy	OpQmizaQon	
n  Consider	control	policy	parameterized	
by	parameter	vector	
	
n  StochasQc	policy	class	(smooths	out	
the	problem):	
																				:	probability	of	acQon	u	in	state	s		
✓
max
✓
E[
HX
t=0
R(st)|⇡✓]
⇡✓(u|s)
⇡✓(u|s)
ut
[Figure	source:	SuTon	&	Barto,	1998]
n  Oien						can	be	simpler	than	Q	or	V	
n  E.g.,	roboQc	grasp	
n  V:	doesn’t	prescribe	acQons	
n  Would	need	dynamics	model	(+	compute	1	Bellman	back-up)	
n  Q:	need	to	be	able	to	efficiently	solve	
n  Challenge	for	conQnuous	/	high-dimensional	acQon	spaces*	
Why	Policy	OpQmizaQon	
⇡
*some	recent	work	(parQally)	addressing	this:		
	NAF:	Gu,	Lillicrap,	Sutskever,	Levine	ICML	2016	
	Input	Convex	NNs:	Amos,	Xu,	Kolter	arXiv	2016	
	Deep	Energy	Q:	Haarnoja,	Tang,	Abbeel,	Levine,	ICML	2017		
arg max
u
Q✓(s, u)
n  Conceptually:	
Policy	OpQmizaQon					Dynamic	Programming	
n  Empirically:	
Optimize what you care
about
Indirect, exploit the problem
structure, self-consistency
More compatible with rich
architectures (including
recurrence)
More versatile
More compatible with
auxiliary objectives
More compatible with
exploration and off-policy
learning
More sample-efficient when
they work
Likelihood	RaQo	Policy	Gradient
Likelihood	RaQo	Policy	Gradient	
[Aleksandrov,	Sysoyev,	&	Shemeneva,	1968]	
[Rubinstein,	1969]	
[Glynn,	1986]	
[Reinforce,	Williams	1992]	
[GPOMDP,	Baxter	&	BartleT,	2001]
Likelihood	RaQo	Policy	Gradient	
[Aleksandrov,	Sysoyev,	&	Shemeneva,	1968]	
[Rubinstein,	1969]	
[Glynn,	1986]	
[Reinforce,	Williams	1992]	
[GPOMDP,	Baxter	&	BartleT,	2001]
Likelihood	RaQo	Policy	Gradient	
[Aleksandrov,	Sysoyev,	&	Shemeneva,	1968]	
[Rubinstein,	1969]	
[Glynn,	1986]	
[Reinforce,	Williams	1992]	
[GPOMDP,	Baxter	&	BartleT,	2001]
Likelihood	RaQo	Policy	Gradient	
[Aleksandrov,	Sysoyev,	&	Shemeneva,	1968]	
[Rubinstein,	1969]	
[Glynn,	1986]	
[Reinforce,	Williams	1992]	
[GPOMDP,	Baxter	&	BartleT,	2001]
Likelihood	RaQo	Policy	Gradient	
[Aleksandrov,	Sysoyev,	&	Shemeneva,	1968]	
[Rubinstein,	1969]	
[Glynn,	1986]	
[Reinforce,	Williams	1992]	
[GPOMDP,	Baxter	&	BartleT,	2001]
Likelihood	RaQo	Policy	Gradient	
[Aleksandrov,	Sysoyev,	&	Shemeneva,	1968]	
[Rubinstein,	1969]	
[Glynn,	1986]	
[Reinforce,	Williams	1992]	
[GPOMDP,	Baxter	&	BartleT,	2001]
n  Valid	even	when		
n  R	is	disconQnuous	and/or	unknown	
n  Sample	space	(of	paths)	is	a	discrete	set		
Likelihood	RaQo	Gradient:	Validity	
rU(✓) ⇡ ˆg =
1
m
mX
i=1
r✓ log P(⌧(i)
; ✓)R(⌧(i)
)
n  Gradient	tries	to:	
n  Increase	probability	of	paths	with	
posiQve	R	
n  Decrease	probability	of	paths	with	
negaQve	R	
Likelihood	RaQo	Gradient:	IntuiQon	
rU(✓) ⇡ ˆg =
1
m
mX
i=1
r✓ log P(⌧(i)
; ✓)R(⌧(i)
)
!	Likelihood	raQo	changes	probabiliQes	of	experienced	paths,	
does	not	try	to	change	the	paths	(<->	Path	DerivaQve)
Let’s	Decompose	Path	into	States	and	AcQons
Let’s	Decompose	Path	into	States	and	AcQons
Let’s	Decompose	Path	into	States	and	AcQons
Let’s	Decompose	Path	into	States	and	AcQons
Likelihood	RaQo	Gradient	EsQmate
DerivaQon	from	Importance	Sampling	
U(✓) = E⌧⇠✓old

P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓U(✓) = E⌧⇠✓old

r✓P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓ U(✓)|✓=✓old
= E⌧⇠✓old

r✓ P(⌧|✓)|✓old
P(⌧|✓old)
R(⌧)
= E⌧⇠✓old
⇥
r✓ log P(⌧|✓)|✓old
R(⌧)
⇤
Note:	Suggests	we	can	also	look	at	more	than	just	gradient!	[Tang&Abbeel,	NIPS	2011]
DerivaQon	from	Importance	Sampling	
U(✓) = E⌧⇠✓old

P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓U(✓) = E⌧⇠✓old

r✓P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓ U(✓)|✓=✓old
= E⌧⇠✓old

r✓ P(⌧|✓)|✓old
P(⌧|✓old)
R(⌧)
= E⌧⇠✓old
⇥
r✓ log P(⌧|✓)|✓old
R(⌧)
⇤
Note:	Suggests	we	can	also	look	at	more	than	just	gradient!	[Tang&Abbeel,	NIPS	2011]
DerivaQon	from	Importance	Sampling	
U(✓) = E⌧⇠✓old

P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓U(✓) = E⌧⇠✓old

r✓P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓ U(✓)|✓=✓old
= E⌧⇠✓old

r✓ P(⌧|✓)|✓old
P(⌧|✓old)
R(⌧)
= E⌧⇠✓old
⇥
r✓ log P(⌧|✓)|✓old
R(⌧)
⇤
Note:	Suggests	we	can	also	look	at	more	than	just	gradient!	[Tang&Abbeel,	NIPS	2011]
DerivaQon	from	Importance	Sampling	
U(✓) = E⌧⇠✓old

P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓U(✓) = E⌧⇠✓old

r✓P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓ U(✓)|✓=✓old
= E⌧⇠✓old

r✓ P(⌧|✓)|✓old
P(⌧|✓old)
R(⌧)
= E⌧⇠✓old
⇥
r✓ log P(⌧|✓)|✓old
R(⌧)
⇤
Note:	Suggests	we	can	also	look	at	more	than	just	gradient!	[Tang&Abbeel,	NIPS	2011]
DerivaQon	from	Importance	Sampling	
U(✓) = E⌧⇠✓old

P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓U(✓) = E⌧⇠✓old

r✓P(⌧|✓)
P(⌧|✓old)
R(⌧)
r✓ U(✓)|✓=✓old
= E⌧⇠✓old

r✓ P(⌧|✓)|✓old
P(⌧|✓old)
R(⌧)
= E⌧⇠✓old
⇥
r✓ log P(⌧|✓)|✓old
R(⌧)
⇤
Suggests	we	can	also	look	at	more	than	just	gradient!	
E.g.,	can	use	importance	sampled	objecQve	as	“surrogate	loss”	(locally)	
[Tang&Abbeel,	NIPS	2011]
n  As	formulated	thus	far:	unbiased	but	very	noisy	
n  Fixes	that	lead	to	real-world	pracQcality	
n  Baseline	
n  Temporal	structure	
n  Also:	KL-divergence	trust	region	/	natural	gradient	(=	general	trick,	
equally	applicable	to	perturbaQon	analysis	and	finite	differences)	
Likelihood	RaQo	Gradient	EsQmate
n  Gradient	tries	to:	
n  Increase	probability	of	paths	with	
posiQve	R	
n  Decrease	probability	of	paths	with	
negaQve	R	
Likelihood	RaQo	Gradient:	IntuiQon	
rU(✓) ⇡ ˆg =
1
m
mX
i=1
r✓ log P(⌧(i)
; ✓)R(⌧(i)
)
!	Likelihood	raQo	changes	probabiliQes	of	experienced	paths,	
does	not	try	to	change	the	paths	(<->	Path	DerivaQve)
à  Consider	baseline	b:	
							
Likelihood	RaQo	Gradient:	Baseline	
rU(✓) ⇡ ˆg =
1
m
mX
i=1
r✓ log P(⌧(i)
; ✓)R(⌧(i)
)
rU(✓) ⇡ ˆg =
1
m
mX
i=1
r✓ log P(⌧(i)
; ✓)(R(⌧(i)
) b)
sQll	unbiased!	
[Williams	1992]	
E [r✓ log P(⌧; ✓)b]
=
X
⌧
P(⌧; ✓)r✓ log P(⌧; ✓)b
=
X
⌧
P(⌧; ✓)
r✓P(⌧; ✓)
P(⌧; ✓)
b
=
X
⌧
r✓P(⌧; ✓)b
=r✓
X
⌧
P(⌧)b
!
= br✓(
X
⌧
P(⌧)) = b ⇥ 0
OK	as	long	as	baseline	doesn’t	depend	on	acQon	
in	logprob(acQon)
n  	Current	esQmate:	
	
	
n  Removing	terms	that	don’t	depend	on	current	acQon	can	lower	variance:	
Likelihood	RaQo	and	Temporal	Structure	
ˆg =
1
m
mX
i=1
r✓ log P(⌧(i)
; ✓)(R(⌧(i)
) b)
=
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
! H 1X
t=0
R(s
(i)
t , u
(i)
t ) b
!
[Policy	Gradient	Theorem:	SuTon	et	al,	NIPS	1999;	GPOMDP:	BartleT	&	Baxter,	JAIR	2001;	Survey:	Peters	&	Schaal,	IROS	2006]		
Doesn’t	depend	on		u
(i)
t Ok	to	depend	on		s
(i)
t
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) b(s
(i)
t )
!
=
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
" t 1X
k=0
R(s
(i)
k , u
(i)
k ) +
H 1X
k=t
R(s
(i)
k , u
(i)
k ) b
# !
n  Good	choice	for	b?		
n  Constant	baseline:	
n  OpQmal	Constant	baseline:	
n  Time-dependent	baseline:		
n  State-dependent	expected	return:		
	
à	Increase	logprob	of	acQon	proporQonally	to	how	much	its	returns	are	
beTer	than	the	expected	return	under	the	current	policy	
Baseline	Choices	
b(st) = E [rt + rt+1 + rt+2 + . . . + rH 1]
b = E [R(⌧)] ⇡
1
m
mX
i=1
R(⌧(i)
)
[See:	Greensmith,	BartleT,	Baxter,	JMLR	2004	for	variance	reducQon	techniques.]		
bt =
1
m
mX
i=1
H 1X
k=t
R(s
(i)
k , u
(i)
k )
= V ⇡
(st)
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
How to estimate?
V ⇡
n  Init		
n  Collect	trajectories		
n  Regress	against	empirical	return:	
V ⇡
0
⌧1, . . . , ⌧m
i+1 arg min
1
m
mX
i=1
H 1X
t=0
V ⇡
✓ (s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k )
!2
EsQmaQon	of
n  Bellman	EquaQon	for		
n  Init		
n  Collect	data	{s,	u,	s’,	r}	
n  FiTed	V	iteraQon:	
EsQmaQon	of						
V ⇡
(s) =
X
u
⇡(u|s)
X
s0
P(s0
|s, u)[R(s, u, s0
) + V ⇡
(s0
)]
V ⇡
V ⇡
V ⇡
0
i+1 min
X
(s,u,s0,r)
kr + V ⇡
i
(s0
) V (s)k2
2 + k ik2
2
Vanilla	Policy	Gradient	
~	[Williams,	1992]
n  EsQmaQon	of	Q	from	single	roll-out	
Recall	Our	Likelihood	RaQo	PG	EsQmator	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
n  =	high	variance	per	sample	based	/	no	generalizaQon	used	
n  Reduce	variance	by	discounQng	
n  Reduce	variance	by	funcQon	approximaQon	(=criQc)
n  EsQmaQon	of	Q	from	single	roll-out	
Recall	Our	Likelihood	RaQo	PG	EsQmator	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
n  =	high	variance	per	sample	based	/	no	generalizaQon	used	
n  Reduce	variance	by	discounQng	
n  Reduce	variance	by	funcQon	approximaQon	(=criQc)
n  EsQmaQon	of	Q	from	single	roll-out	
Recall	Our	Likelihood	RaQo	PG	EsQmator	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
n  =	high	variance	per	sample	based	/	no	generalizaQon	used	
n  Reduce	variance	by	discounQng	
n  Reduce	variance	by	funcQon	approximaQon	(=criQc)
n  EsQmaQon	of	Q	from	single	roll-out	
Recall	Our	Likelihood	RaQo	PG	EsQmator	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
n  =	high	variance	per	sample	based	/	no	generalizaQon	
n  Reduce	variance	by	discounQng	
n  Reduce	variance	by	funcQon	approximaQon	(=criQc)
n  EsQmaQon	of	Q	from	single	roll-out	
Further	Refinements	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
n  =	high	variance	per	sample	based	/	no	generalizaQon	
n  Reduce	variance	by	discounQng	
n  Reduce	variance	by	funcQon	approximaQon	(=criQc)
n  EsQmaQon	of	Q	from	single	roll-out	
Recall	Our	Likelihood	RaQo	PG	EsQmator	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
1
m
mX
i=1
H 1X
t=0
r✓ log ⇡✓(u
(i)
t |s
(i)
t )
H 1X
k=t
R(s
(i)
k , u
(i)
k ) V ⇡
(s
(i)
k )
!
n  =	high	variance	per	sample	based	/	no	generalizaQon	
n  Reduce	variance	by	discounQng	
n  Reduce	variance	by	funcQon	approximaQon	(=criQc)
à	introduce	discount	factor	as	a	hyperparameter	to	improve	
esQmate	of	Q:	
Variance	ReducQon	by	DiscounQng	
Q⇡
(s, u) = E[r0 + r1 + r2 + · · · |s0 = s, a0 = a]
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · |s0 = s, a0 = a]
n  Generalized	Advantage	Es<ma<on	uses	an	exponenQally	
weighted	average	of	these	
n  ~	TD(lambda)	
Reducing	Variance	by	FuncQon	ApproximaQon	
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · | s0 = s, u0 = u]
= E[r0 + V ⇡
(s1) | s0 = s, u0 = u]
= E[r0 + r1 + 2
V ⇡
(s2) | s0 = s, u0 = u]
= E[r0 + r1 + + 2
r2 + 3
V ⇡
(s3) | s0 = s, u0 = u]
= · · ·
n  Generalized	Advantage	Es<ma<on	uses	an	exponenQally	
weighted	average	of	these	
n  ~	TD(lambda)	
Reducing	Variance	by	FuncQon	ApproximaQon	
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · | s0 = s, u0 = u]
= E[r0 + V ⇡
(s1) | s0 = s, u0 = u]
= E[r0 + r1 + 2
V ⇡
(s2) | s0 = s, u0 = u]
= E[r0 + r1 + + 2
r2 + 3
V ⇡
(s3) | s0 = s, u0 = u]
= · · ·
n  Generalized	Advantage	Es<ma<on	uses	an	exponenQally	
weighted	average	of	these	
n  ~	TD(lambda)	
Reducing	Variance	by	FuncQon	ApproximaQon	
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · | s0 = s, u0 = u]
= E[r0 + V ⇡
(s1) | s0 = s, u0 = u]
= E[r0 + r1 + 2
V ⇡
(s2) | s0 = s, u0 = u]
= E[r0 + r1 + + 2
r2 + 3
V ⇡
(s3) | s0 = s, u0 = u]
= · · ·
n  Async	Advantage	Actor	Cri<c	(A3C)	[Mnih	et	al,	2016]	
n  												one	of	the	above	choices	(e.g.		k=5	step	lookahead)	
Reducing	Variance	by	FuncQon	ApproximaQon	
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · | s0 = s, u0 = u]
= E[r0 + V ⇡
(s1) | s0 = s, u0 = u]
= E[r0 + r1 + 2
V ⇡
(s2) | s0 = s, u0 = u]
= E[r0 + r1 + + 2
r2 + 3
V ⇡
(s3) | s0 = s, u0 = u]
= · · ·
ˆQ
n  Generalized	Advantage	Es<ma<on	(GAE)	[Schulman	et	al,	ICLR	2016]	
n  							=	lambda	exponenQally	weighted	average	of	all	the	above	
n  ~	TD(lambda)	/	eligibility	traces	[SuTon	and	Barto,	1990]	
Reducing	Variance	by	FuncQon	ApproximaQon	
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · | s0 = s, u0 = u]
= E[r0 + V ⇡
(s1) | s0 = s, u0 = u]
= E[r0 + r1 + 2
V ⇡
(s2) | s0 = s, u0 = u]
= E[r0 + r1 + + 2
r2 + 3
V ⇡
(s3) | s0 = s, u0 = u]
= · · ·
(1 )
(1 )
(1 ) 2
(1 ) 3
ˆQ
n  Generalized	Advantage	Es<ma<on	(GAE)	[Schulman	et	al,	ICLR	2016]	
n  							=	lambda	exponenQally	weighted	average	of	all	the	above	
n  ~	TD(lambda)	/	eligibility	traces	[SuTon	and	Barto,	1990]	
Reducing	Variance	by	FuncQon	ApproximaQon	
Q⇡,
(s, u) = E[r0 + r1 + 2
r2 + · · · | s0 = s, u0 = u]
= E[r0 + V ⇡
(s1) | s0 = s, u0 = u]
= E[r0 + r1 + 2
V ⇡
(s2) | s0 = s, u0 = u]
= E[r0 + r1 + + 2
r2 + 3
V ⇡
(s3) | s0 = s, u0 = u]
= · · ·
(1 )
(1 )
(1 ) 2
(1 ) 3
ˆQ
n  Policy	Gradient	+	Generalized	Advantage	EsQmaQon:	
n  Init		
n  Collect	roll-outs	{s,	u,	s’,	r}	and		
n  Update:			
Actor-CriQc	with	A3C	or	GAE	
V ⇡
0
⇡✓0
Note:	many	variaQons,	e.g.	could	instead	use	1-step	for	V,	full	roll-out	for	pi:	
i+1 min
X
(s,u,s0,r)
kr + V ⇡
i
(s0
) V (s)k2
2 + k ik2
2
✓i+1 ✓i + ↵
1
m
mX
k=1
H 1X
t=0
r✓ log ⇡✓i (u
(k)
t |s
(k)
t )
H 1X
t0=t
r
(k)
t0 V ⇡
i
(s
(k)
t0 )
!
ˆQi(s, u)
✓i+1 ✓i + ↵
1
m
mX
k=1
H 1X
t=0
r✓ log ⇡✓i (u
(k)
t |s
(k)
t )
⇣
ˆQi(s
(k)
t , u
(k)
t ) V ⇡
i
(s
(k)
t )
⌘
i+1 min
X
(s,u,s0,r)
k ˆQi(s, u) V ⇡
(s)k2
2 + k ik2
2
n  [Mnih	et	al,	ICML	2016]	
n  Likelihood	RaQo	Policy	Gradient	
n  n-step	Advantage	EsQmaQon	
Async	Advantage	Actor	CriQc	(A3C)
A3C	--	labyrinth
GAE:	Effect	of	gamma	and	lambda	
[Schulman	et	al,	2016	--	GAE]
Learning	LocomoQon	(TRPO	+	GAE)	
[Schulman,	Moritz,	Levine,	Jordan,	Abbeel,	2016]
In	Contrast:	Darpa	RoboQcs	Challenge

More Related Content

What's hot

C0314018022
C0314018022C0314018022
C0314018022theijes
 
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis Lake Como School of Advanced Studies
 
Compatible Mapping and Common Fixed Point Theorem
Compatible Mapping and Common Fixed Point TheoremCompatible Mapping and Common Fixed Point Theorem
Compatible Mapping and Common Fixed Point TheoremIOSR Journals
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDChristos Kallidonis
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...asahiushio1
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...asahiushio1
 
Course on Optimal Transport
Course on Optimal TransportCourse on Optimal Transport
Course on Optimal TransportBruno Levy
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...asahiushio1
 
Common Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesCommon Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesIJLT EMAS
 
Soluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosSoluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosMárcio Martins
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Lake Como School of Advanced Studies
 
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...IJMREMJournal
 
10.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.1110.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.11DAVID GIKUNJU
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONijsc
 
On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications Premier Publishers
 

What's hot (20)

Quantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko RobnikQuantum chaos of generic systems - Marko Robnik
Quantum chaos of generic systems - Marko Robnik
 
MM2020-AV
MM2020-AVMM2020-AV
MM2020-AV
 
C0314018022
C0314018022C0314018022
C0314018022
 
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis Complex Dynamics and Statistics  in Hamiltonian 1-D Lattices - Tassos Bountis
Complex Dynamics and Statistics in Hamiltonian 1-D Lattices - Tassos Bountis
 
Compatible Mapping and Common Fixed Point Theorem
Compatible Mapping and Common Fixed Point TheoremCompatible Mapping and Common Fixed Point Theorem
Compatible Mapping and Common Fixed Point Theorem
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCD
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
 
Course on Optimal Transport
Course on Optimal TransportCourse on Optimal Transport
Course on Optimal Transport
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
 
Control system bode plot
Control system bode plotControl system bode plot
Control system bode plot
 
Common Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform SpacesCommon Fixed Point Theorems in Uniform Spaces
Common Fixed Point Theorems in Uniform Spaces
 
Soluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitadosSoluções dos exercícios de cinética química digitados
Soluções dos exercícios de cinética química digitados
 
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...Complexity of exact solutions of many body systems: nonequilibrium steady sta...
Complexity of exact solutions of many body systems: nonequilibrium steady sta...
 
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...Lyapunov-type inequalities for a fractional q, -difference equation involvin...
Lyapunov-type inequalities for a fractional q, -difference equation involvin...
 
10.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.1110.11648.j.pamj.20170601.11
10.11648.j.pamj.20170601.11
 
Beamerpresentation
BeamerpresentationBeamerpresentation
Beamerpresentation
 
basal-ganglia
basal-gangliabasal-ganglia
basal-ganglia
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
 
On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications On the k-Riemann-Liouville fractional integral and applications
On the k-Riemann-Liouville fractional integral and applications
 

Similar to Policy Gradients Lecture

Neurobiological Models of Instrumental Conditioning
Neurobiological Models of Instrumental ConditioningNeurobiological Models of Instrumental Conditioning
Neurobiological Models of Instrumental ConditioningMatthew Crossley
 
Teknik Analisis Jalur
Teknik Analisis JalurTeknik Analisis Jalur
Teknik Analisis JalurKita Sekolah
 
Measurement-induced long-distance entanglement with optomechanical transducers
Measurement-induced long-distance entanglement with optomechanical transducersMeasurement-induced long-distance entanglement with optomechanical transducers
Measurement-induced long-distance entanglement with optomechanical transducersOndrej Cernotik
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theorySeongwon Hwang
 
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...Shu Tanaka
 
Lecture no 9(extended master method)
Lecture no 9(extended master method)Lecture no 9(extended master method)
Lecture no 9(extended master method)jeetesh036
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureNecip Oguz Serbetci
 
Teaching Population Genetics with R
Teaching Population Genetics with RTeaching Population Genetics with R
Teaching Population Genetics with RBruce Cochrane
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018JeremyHeng10
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONAndré Panisson
 
Transverse optimization of spin relaxation in NMR
Transverse optimization of spin relaxation in NMRTransverse optimization of spin relaxation in NMR
Transverse optimization of spin relaxation in NMRKonstantin Pervushin
 
Convolution problems
Convolution problemsConvolution problems
Convolution problemsPatrickMumba7
 

Similar to Policy Gradients Lecture (20)

Neurobiological Models of Instrumental Conditioning
Neurobiological Models of Instrumental ConditioningNeurobiological Models of Instrumental Conditioning
Neurobiological Models of Instrumental Conditioning
 
Presentation ECMTB14
Presentation ECMTB14Presentation ECMTB14
Presentation ECMTB14
 
BSE and TDDFT at work
BSE and TDDFT at workBSE and TDDFT at work
BSE and TDDFT at work
 
Teknik Analisis Jalur
Teknik Analisis JalurTeknik Analisis Jalur
Teknik Analisis Jalur
 
Units Dimensions Error 4
Units Dimensions Error 4Units Dimensions Error 4
Units Dimensions Error 4
 
Measurement-induced long-distance entanglement with optomechanical transducers
Measurement-induced long-distance entanglement with optomechanical transducersMeasurement-induced long-distance entanglement with optomechanical transducers
Measurement-induced long-distance entanglement with optomechanical transducers
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
 
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
Quantum Annealing for Dirichlet Process Mixture Models with Applications to N...
 
Lecture no 9(extended master method)
Lecture no 9(extended master method)Lecture no 9(extended master method)
Lecture no 9(extended master method)
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
Teaching Population Genetics with R
Teaching Population Genetics with RTeaching Population Genetics with R
Teaching Population Genetics with R
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
Transverse optimization of spin relaxation in NMR
Transverse optimization of spin relaxation in NMRTransverse optimization of spin relaxation in NMR
Transverse optimization of spin relaxation in NMR
 
Classification
ClassificationClassification
Classification
 
C0751720
C0751720C0751720
C0751720
 
Convolution problems
Convolution problemsConvolution problems
Convolution problems
 

More from Ronald Teo

07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
 Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodelRonald Teo
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgRonald Teo
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsRonald Teo
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchRonald Teo
 
Lec4b pong from_pixels
Lec4b pong from_pixelsLec4b pong from_pixels
Lec4b pong from_pixelsRonald Teo
 
Lec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingLec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingRonald Teo
 
Lec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methodsLec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methodsRonald Teo
 
02 linear algebra
02 linear algebra02 linear algebra
02 linear algebraRonald Teo
 

More from Ronald Teo (16)

Mc td
Mc tdMc td
Mc td
 
07 regularization
07 regularization07 regularization
07 regularization
 
Dp
DpDp
Dp
 
06 mlp
06 mlp06 mlp
06 mlp
 
Mdp
MdpMdp
Mdp
 
04 numerical
04 numerical04 numerical
04 numerical
 
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
 Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
 
Intro rl
Intro rlIntro rl
Intro rl
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methods
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-research
 
Lec4b pong from_pixels
Lec4b pong from_pixelsLec4b pong from_pixels
Lec4b pong from_pixels
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Lec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingLec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fitting
 
Lec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methodsLec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methods
 
02 linear algebra
02 linear algebra02 linear algebra
02 linear algebra
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Policy Gradients Lecture