SlideShare a Scribd company logo
1 of 60
Download to read offline
WIND AND ENERGY
STORAGE
Optimal Control of Power Systems in Context of Wind
Energy Generation and Storage
Shuyang Li
Overview
๏‚งMotivation
๏‚ง Model
๏‚ง Algorithms
๏‚ง Methodology
๏‚ง Data
๏‚ง Results
๏‚ง Further Exploration
MOTIVATION
๏‚ง Increasing reliance on wind energy
๏‚ง Wind Energy Variance
๏‚ง Avoid waste
๏‚ง Storing Wind
๏‚ง Storage/Unit Allocation
MOTIVATION โ€“ Dynamic Programming
๏‚ง Solving Bellmanโ€™s Equation:
๐‘‰ ๐‘† ๐‘› = max
๐‘Ž
(๐ถ ๐‘† ๐‘›, ๐‘Ž + ๐›พ๐”ผ ๐‘‰ ๐‘† ๐‘›+1 |๐‘† ๐‘› )
From Optimal Learning (Powell & Ryzhov 2012)
๐‘บ ๐‘ป๐‘บ ๐‘ปโˆ’๐Ÿ
๐‘บ ๐‘ปโˆ’๐Ÿ
๐‘บ ๐‘ปโˆ’๐Ÿ‘
. . .๐‘บ ๐ŸŽ
Backward Pass
MOTIVATION โ€“ ADP/RL
๏‚ง Monte Carlo Simulation
๐‘บ ๐ŸŽ
Forward Pass
๐‘บ ๐Ÿ
๐‘บ ๐Ÿ
๐‘บ ๐Ÿ‘ . . .
๐‘บ ๐‘ป
MOTIVATION - Literature
๏‚ง โ€œA Comparison of Approximate Dynamic Programming
Techniques on Benchmark Energy Storage Problems: Does
Anything Work?โ€
๏‚ง Jiang, Pham & Powell, 2014
๏‚ง โ€œBenchmarking a Scalable Approximate Dynamic
Programming Algorithm for Stochastic Control of
Multidimensional Energy Storage Problemsโ€
๏‚ง Salas & Powell, 2014
PROBLEM
๏‚ง Algorithmic Performance
๏‚ง Q-Learning
๏‚ง SARSA(ฮป)
๏‚ง Step Sizes
Ryzhov, Frazier & Powell (2014)
Overview
๏‚ง Motivation
๏‚งModel
๏‚ง Algorithms
๏‚ง Methodology
๏‚ง Data
๏‚ง Results
๏‚ง Further Exploration
MODEL
Demand
Grid
Battery
Wind
๐’‚ ๐’•
๐‘พ๐‘ซ
๐’‚ ๐’•
๐‘น๐‘ซ
๐’‚ ๐’•
๐‘ฎ๐‘ซ
๐’‚ ๐’•
๐‘พ๐‘น
๐’‚ ๐’•
๐‘ฎ๐‘น
๐’‚ ๐’•
๐‘น๐‘ฎ
๐‘บ ๐’• = (๐‘น ๐’•, ๐‘ฌ ๐’•, ๐‘ซ ๐’•, ๐‘ท ๐’•)
๐’‚ ๐’• = (๐’‚ ๐’•
๐‘พ๐‘ซ
, ๐’‚ ๐’•
๐‘น๐‘ซ
, ๐’‚ ๐’•
๐‘ฎ๐‘ซ
, ๐’‚ ๐’•
๐‘พ๐‘น
, ๐’‚ ๐’•
๐‘ฎ๐‘น
, ๐’‚ ๐’•
๐‘น๐‘ฎ
)
MODEL โ€“ Action Constraints
Storage capacity:
๏‚ง ๐‘Ž ๐‘ก
๐‘Š๐‘…
+ ๐‘Ž ๐‘ก
๐บ๐‘…
โ‰ค ๐‘… ๐‘š๐‘Ž๐‘ฅ โˆ’ ๐‘…๐‘ก
๏‚ง ๐‘Ž ๐‘ก
๐‘…๐ท
+ ๐‘Ž ๐‘ก
๐‘…๐บ
โ‰ค ๐‘…๐‘ก
Demand satisfied:
๏‚ง ๐‘Ž ๐‘ก
๐‘Š๐ท
+ ๐œ‚ ๐‘‘ ๐‘Ž ๐‘ก
๐‘…๐ท
+ ๐‘Ž ๐‘ก
๐บ๐ท
= ๐ท๐‘ก
Charging / Discharging
๏‚ง ๐‘Ž ๐‘ก
๐‘Š๐‘…
+ ๐‘Ž ๐‘ก
๐บ๐‘…
โ‰ค ๐›พ ๐‘
๏‚ง ๐‘Ž ๐‘ก
๐‘…๐ท
+ ๐‘Ž ๐‘ก
๐‘…๐บ
โ‰ค ๐›พ ๐‘‘
Flow Conservation
๏‚ง ๐‘Ž ๐‘ก
๐‘Š๐‘…
+ ๐‘Ž ๐‘ก
๐‘Š๐ท
โ‰ค ๐ธ๐‘ก
Maximal Wind Usage:
๏‚ง ๐‘Ž ๐‘ก
๐‘Š๐ท
= min(๐ท๐‘ก, ๐ธ๐‘ก) ๏‚ง ๐‘Ž ๐‘ก
๐‘Š๐‘…
= min(๐‘… ๐‘š๐‘Ž๐‘ฅ โˆ’ ๐‘…๐‘ก, ๐ธ๐‘ก โˆ’ ๐‘Ž ๐‘ก
๐‘Š๐ท
)
From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy
Storage Problems (Salas, Powell) [Except Maximal Wind Usage]
MODEL โ€“ Transition Functions
Storage:
๐‘…๐‘ก+1 = ๐‘…๐‘ก + ๐œ™ ๐‘‡
๐‘Ž ๐‘ก ; ๐œ™ = (0, โˆ’1,0, ๐œ‚ ๐‘
, ๐œ‚ ๐‘
, โˆ’1)
Simplified in the stochastic model:
๐‘…๐‘ก+1 = ๐‘…๐‘ก โˆ’ ๐‘Ž ๐‘ก
๐‘…๐ท
+ ๐‘Ž ๐‘ก
๐‘Š๐‘…
+ ๐‘Ž ๐‘ก
๐บ๐‘…
โˆ’ ๐‘Ž ๐‘ก
๐‘…๐บ
From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy
Storage Problems (Salas, Powell)
Wind:
๏‚ง ๐ธ๐‘ก+1 = ๐ธ๐‘ก + ๐ธ๐‘ก+1
๏‚ง ๐ธ๐‘ก~๐’ฉ(๐œ‡ ๐ธ, ๐œŽ ๐ธ
2
)
Demand:
๏‚ง ๐ท๐‘ก = max 0,3 โˆ’ 4 sin
2๐œ‹๐‘ก
๐‘‡
From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy
Storage Problems (Salas, Powell)
Price:
๏‚ง ๐‘ƒ๐‘ก+1 = ๐‘ƒ๐‘ก + ๐‘ƒ0,๐‘ก+1 + 1{๐‘ข ๐‘ก+1โ‰ค0.031} ๐‘ƒ1,๐‘ก+1
๏‚ง ๐‘ƒ0,๐‘ก~๐’ฉ(๐œ‡ ๐‘ƒ, ๐œŽ ๐‘ƒ
2
)
๏‚ง ๐‘ƒ1,๐‘ก~๐’ฉ(0, 502
)
๏‚ง ๐‘ข ๐‘ก~๐’ฐ(0,1)
MODEL โ€“ Transition Functions
MODEL โ€“ Objective Function
Reward Function:
๐ถ ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘ƒ๐‘ก ๐ท๐‘ก โˆ’ ๐‘ƒ๐‘ก ๐‘Ž ๐‘ก
๐บ๐‘…
โˆ’ ๐œ‚ ๐‘‘
๐‘Ž ๐‘ก
๐‘…๐บ
+ ๐‘Ž ๐‘ก
๐บ๐ท
โˆ’ ๐‘โ„Ž
๐‘…๐‘ก+1
Objective Function:
๐น ๐œ‹โˆ—
= max
๐œ‹โˆˆฮ 
๐”ผ
๐‘กโˆˆ๐‘‡
๐ถ(๐‘†๐‘ก, ๐ด ๐‘ก
๐œ‹
( ๐‘†๐‘ก))
From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy
Storage Problems (Salas, Powell)
Overview
๏‚ง Motivation
๏‚ง Model
๏‚งAlgorithms
๏‚ง Methodology
๏‚ง Data
๏‚ง Results
๏‚ง Further Exploration
ALGORITHMS โ€“ Q-Learning
๏‚ง Action-value function Q
๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐›ผ[๐‘…๐‘ก+1 + ๐›พ ๐ฆ๐š๐ฑ
๐’‚
๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ]
๏‚งOff-policy
๏‚ง Selection policy: ฮต-greedy
๏‚ง Evaluation policy: pure greedy
ALGORITHMS โ€“ Q-Learning
๐‘บ ๐’•
๐’‚ ๐’•
๐‘ธ ๐’•+๐Ÿ ๐‘บ ๐’•, ๐’‚ ๐’• = ๐‘ธ ๐’• ๐‘บ ๐’•, ๐’‚ ๐’• + ๐›ผ[๐‘น๐’•+๐Ÿ + ๐›พ ๐’Ž๐’‚๐’™
๐’‚
๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ]
๐‘Ž1
๐‘Ž2
๐’‚ ๐Ÿ‘
๐‘บ ๐’•+๐Ÿ
๐‘„๐‘ก(๐‘†๐‘ก+1, ๐‘Ž1)
๐‘ธ ๐’•(๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐Ÿ‘)
๐‘„๐‘ก(๐‘†๐‘ก+1, ๐‘Ž2)๐‘ธ ๐’•(๐‘บ ๐’•, ๐’‚ ๐’•)
ALGORITHMS - SARSA
๏‚ง SARSA [๐‘บ ๐‘ก ๐’‚ ๐‘ก ๐‘น ๐‘ก+1 ๐‘บ ๐‘ก+1 ๐’‚ ๐‘ก+1]
๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐›ผ[๐‘…๐‘ก+1 + ๐›พ๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐’•+๐Ÿ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ]
๏‚งOn-policy
๏‚ง Evaluation policy is selection policy
๐‘บ ๐’•
๐’‚ ๐’•
๐‘ธ ๐’•+๐Ÿ ๐‘บ ๐’•, ๐’‚ ๐’• = ๐‘ธ ๐’• ๐‘บ ๐’•, ๐’‚ ๐’• + ๐›ผ[๐‘น๐’•+๐Ÿ + ๐›พ๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐’•+๐Ÿ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ]
๐‘ธ ๐’•(๐‘บ ๐’•, ๐’‚ ๐’•) ๐‘บ ๐’•+๐Ÿ
๐’‚ ๐’•+๐Ÿ
๐‘ธ ๐’•(๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐’•+๐Ÿ)
ALGORITHMS - SARSA
ALGORITHMS โ€“ SARSA(ฮป)
๏‚ง SARSA(ฮป) includes a backward pass along a path (eligibility
trace)
๐‘„๐‘ก+1 ๐‘†, ๐‘Ž = ๐‘„๐‘ก ๐‘†, ๐‘Ž + ๐›ผ๐›ฟ๐‘ก ๐‘๐‘ก(๐‘ , ๐‘Ž)
๐›ฟ๐‘ก = ๐‘…๐‘ก+1 + ๐›พ๐‘„๐‘ก ๐‘†๐‘ก+1, ๐‘Ž ๐‘ก+1 โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก
๐‘๐‘ก =
๐›พ๐œ†๐‘๐‘กโˆ’1 + 1, ๐‘† = ๐‘†๐‘ก, ๐‘Ž = ๐‘Ž ๐‘ก
๐›พ๐œ†๐‘๐‘กโˆ’1, ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’
ALGORITHMS โ€“ SARSA(ฮป)
Reinforcement Learning: An Introduction 2nd Ed. (Richard Sutton & Andrew Barto; 2014)
ALGORITHMS โ€“ Step Sizes
๏‚ง Updating equation:
๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐œถ[๐‘…๐‘ก+1 + ๐›พ max
๐‘Ž
๐‘„๐‘ก ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ]
๏‚ง Rearranged:
๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = 1 โˆ’ ๐œถ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐œถ[๐‘…๐‘ก+1 + ๐›พ max
๐‘Ž
๐‘„๐‘ก ๐‘†๐‘ก+1, ๐‘Ž ]
ALGORITHMS โ€“ Step Sizes
Target
Target
High ๐œถ
Low ๐œถ
ALGORITHMS โ€“ Step Sizes
๏‚ง Constant : ๐›ผ = ๐‘˜
๏‚ง Harmonic : ๐›ผ =
๐‘Ž
๐‘Ž+๐‘›
๏‚ง 1/n : ๐›ผ =
1
๐‘›
๏‚ง Ryzhov Formula
๏‚ง Ryzhov, Frazier & Powell (2014)
Overview
๏‚ง Motivation
๏‚ง Model
๏‚ง Algorithms
๏‚งMethodology
๏‚ง Data
๏‚ง Results
๏‚ง Further Exploration
METHODOLOGY - Simulation
๏‚ง 1 Deterministic Problem
๏‚ง 17 Stochastic Problems
๏‚ง 256 Sample Paths per stochastic problem
๏‚ง Training iterations: transitions by Monte Carlo simulation
๏‚ง ฮต-greedy action selection
๏‚ง Evaluation: Averaged over all sample paths
Overview
๏‚ง Motivation
๏‚ง Model
๏‚ง Algorithms
๏‚ง Methodology
๏‚งData
๏‚ง Results
๏‚ง Further Exploration
DATA โ€“ Deterministic Parameters
๏‚ง ๐‘… ๐‘š๐‘Ž๐‘ฅ
= 100
๏‚ง ๐‘… ๐‘š๐‘–๐‘› = 0
๏‚ง ๐‘…0 = 0
๏‚ง ๐œ‚ ๐‘ = ๐œ‚ ๐‘‘ = 0.90
๏‚ง ๐›พ ๐‘
= ๐›พ ๐‘‘
= 0.10
๏‚ง ๐‘ป = ๐Ÿ๐ŸŽ๐ŸŽ๐ŸŽ
DATA โ€“ Deterministic Problems
DATA โ€“ Deterministic Problems
DATA โ€“ Deterministic Problems
DATA โ€“ Stochastic Parameters
๏‚ง ๐‘… ๐‘š๐‘Ž๐‘ฅ = 30
๏‚ง ๐‘… ๐‘š๐‘–๐‘› = 0
๏‚ง ๐‘…0 = 25
๏‚ง ๐œ‚ ๐‘
= ๐œ‚ ๐‘‘
= 1.00
๏‚ง ๐›พ ๐‘ = ๐›พ ๐‘‘ = 5
๏‚ง ๐‘ป = ๐Ÿ๐ŸŽ๐ŸŽ
๏‚ง ๐‘ƒ ๐‘š๐‘Ž๐‘ฅ = 70
๏‚ง ๐‘ƒ ๐‘š๐‘–๐‘› = 30
๏‚ง ๐ธ ๐‘š๐‘Ž๐‘ฅ = 7.00
๏‚ง ๐ธ ๐‘š๐‘–๐‘›
= 1.00
DATA โ€“ Stochastic Problems
DATA โ€“ Stochastic Problems
DATA โ€“ Stochastic Problems
Overview
๏‚ง Motivation
๏‚ง Model
๏‚ง Algorithms
๏‚ง Methodology
๏‚ง Data
๏‚งResults
๏‚ง Further Exploration
RESULTS โ€“ METRICS
๏‚ง Action Traces (Deterministic)
๏‚ง Step Size over Updates
๏‚ง Stored Energy over Time vs. Benchmark
๏‚ง Performance over # of Training Iterations
DETERMINISTIC ACTIONS
Energy Charged (WR) over Time vs. Benchmark
EnergyUnits
Time Step
DETERMINISTIC ACTIONS
Energy Discharged (RD) over Time vs. Benchmark
EnergyUnits
Time Step
DETERMINISTIC ACTIONS
Energy Sold (RG) over Time vs. Benchmark
EnergyUnits
Time Step
DETERMINISTIC STORAGE
Energy in Storage vs. Benchmark
EnergyUnits
Time Step
DETERMINISTIC PERFORMANCE
Performance of Constant Step Sizes, D2
EnergyUnits
Training Iteration
STEP SIZE TUNING
๏‚ง Declining Step Sizes
๏‚ง Tunable parameters
๏‚ง Harmonic:
๐’‚
๐’‚+๐‘›
๏‚ง Ryzhov: Estimator update factor ๐‚
HARMONIC STEP SIZES
Harmonic Step Sizes vs. Iteration
StepSize
Training Iteration
HARMONIC PERFORMANCE
Harmonic Step Size Performance
ProportionofOptimal
Training Iteration
RYZHOV STEP SIZE
๏‚ง No-discount formula:
๐›ผ ๐‘›โˆ’1 =
( ๐‘ ๐‘›
)2
( ๐‘ ๐‘›)2+( ๐œŽ ๐‘›)2
๏‚ง Tunable parameter:
๐‘ ๐‘› = 1 โˆ’ ๐‚ ๐’โˆ’๐Ÿ ๐‘ ๐‘›โˆ’1 + ๐‚ ๐’โˆ’๐Ÿ ๐‘ ๐‘›
( ๐œŽ ๐‘›)2= 1 โˆ’ ๐‚ ๐’โˆ’๐Ÿ ( ๐œŽ ๐‘›โˆ’1)2+๐‚ ๐’โˆ’๐Ÿ( ๐‘ ๐‘›โˆ’ ๐‘ ๐‘›โˆ’1)2
Ryzhov Step Sizes (Iteration 1)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 128)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 2k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 4k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 8k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 16k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 32k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 65k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 131k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 262k)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 524k)
StepSize
Update Step
RYZHOV STEP SIZES
Ryzhov Step Sizes (Iteration 1M)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 2M)
StepSize
Update Step
Ryzhov Step Sizes (Iteration 4M)
StepSize
Update Step
Ryzhov explanation
๏‚ง It should even out at a certain constant depending on the
variance in the rewards
๏‚ง For this problem, the rewards are deterministic, so the
variance measured is the variance in the different states that
we reach/are in
RYZHOV STEP SIZES vs.
ESTIMATOR VARIANCE
Update Step
StepSize
E.Variance
RYZHOV PERFORMANCE
Ryzhov Step Size Performance
ProportionofOptimal
Training Iteration
AGGREGATE PARAMETERS
From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy
Storage Problems (Salas, Powell)
AGGREGATE STORAGE
Storage vs. Benchmark
StepSize
Time Step
AGGREGATE PERFORMANCE
Performance of All Algorithms, S13
ProportionofOptimal
Training Iteration
AGGREGATE PERFORMANCE
Performance of Q-Learning, S13
ProportionofOptimal
Training Iteration
AGGREGATE PERFORMANCE
Performance of SARSA, S13
ProportionofOptimal
Training Iteration
Explanations
๏‚ง Ryzhov flattens out very noticeably because the step size is
too high, it repeatedly overcompensates and bounces
around.
๏‚ง Harmonic looks like itโ€™s still getting better! Need to have it
decline more slowly / not go to 0 so quickly. Ryzhov harmonic
constant maybe?
STEP SIZE PERFORMANCE
Performance of Constant 0.1, All Problems
ProportionofOptimal
Training Iteration
Performance of Harmonic, All Problems
ProportionofOptimal
Training Iteration
Performance of 1/n, All Problems
ProportionofOptimal
Training Iteration
Performance of Ryzhov, All Problems
ProportionofOptimal
Training Iteration
AGGREGATE PERFORMANCE
Overview
๏‚ง Motivation
๏‚ง Model
๏‚ง Algorithms
๏‚ง Methodology
๏‚ง Data
๏‚ง Results
๏‚งFurther Exploration
FURTHER EXPLORATION
๏‚ง Additional renewable energy sources / storage devices
๏‚ง Finer levels of discretization
๏‚ง Remove wind usage restriction
๏‚ง Additional algorithms
๏‚ง Actor-Critic
๏‚ง Gradient Descent (Linear/Non-Linear VFAs)
Questions?

More Related Content

Similar to shuyangli_summerpresentation08082014

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxSeungeon Baek
ย 
Evolutionary Optimization Algorithms & Large-Scale Machine Learning
Evolutionary Optimization Algorithms & Large-Scale Machine LearningEvolutionary Optimization Algorithms & Large-Scale Machine Learning
Evolutionary Optimization Algorithms & Large-Scale Machine LearningUniversity of Maribor
ย 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...University Utara Malaysia
ย 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
ย 
WEI_ZHENGTAI_SPC
WEI_ZHENGTAI_SPCWEI_ZHENGTAI_SPC
WEI_ZHENGTAI_SPCZHENGTAI WEI
ย 
Passivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorPassivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorHancheol Choi
ย 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1Euijin Jeong
ย 
Problem Based Learning Presentation NACA
Problem Based Learning Presentation NACAProblem Based Learning Presentation NACA
Problem Based Learning Presentation NACAWahid Dino Samo
ย 
STLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationSTLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationJulieDash5
ย 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
ย 
QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...
QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...
QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...QAFest
ย 
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjdArjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd12345arjitcs
ย 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Raman Kannan
ย 
Statistical Process Control WithAdrianโ„ข AQP
Statistical Process Control WithAdrianโ„ข AQPStatistical Process Control WithAdrianโ„ข AQP
Statistical Process Control WithAdrianโ„ข AQPAdrian Beale
ย 

Similar to shuyangli_summerpresentation08082014 (20)

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
ย 
Evolutionary Optimization Algorithms & Large-Scale Machine Learning
Evolutionary Optimization Algorithms & Large-Scale Machine LearningEvolutionary Optimization Algorithms & Large-Scale Machine Learning
Evolutionary Optimization Algorithms & Large-Scale Machine Learning
ย 
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
Strategic Oscillation for Exploitation and Exploration of ACS Algorithm for J...
ย 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
ย 
WEI_ZHENGTAI_SPC
WEI_ZHENGTAI_SPCWEI_ZHENGTAI_SPC
WEI_ZHENGTAI_SPC
ย 
14.pdf
14.pdf14.pdf
14.pdf
ย 
Passivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorPassivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulator
ย 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1
ย 
Problem Based Learning Presentation NACA
Problem Based Learning Presentation NACAProblem Based Learning Presentation NACA
Problem Based Learning Presentation NACA
ย 
STLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationSTLtalk about statistical analysis and its application
STLtalk about statistical analysis and its application
ย 
Qc tools
Qc toolsQc tools
Qc tools
ย 
Qc tools
Qc toolsQc tools
Qc tools
ย 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
ย 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
ย 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
ย 
QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...
QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...
QA Fest 2018. ะะธะบะธั‚ะฐ ะšั€ะธั‡ะบะพ. ะœะตั‚ะพะดะพะปะพะณะธั ะธัะฟะพะปัŒะทะพะฒะฐะฝะธั ะผะฐัˆะธะฝะฝะพะณะพ ะพะฑัƒั‡ะตะฝะธั ะฒ ะฝ...
ย 
06-2402.pdf
06-2402.pdf06-2402.pdf
06-2402.pdf
ย 
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjdArjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
Arjrandomjjejejj3ejjeejjdjddjjdjdjdjdjdjdjdjdjd
ย 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923
ย 
Statistical Process Control WithAdrianโ„ข AQP
Statistical Process Control WithAdrianโ„ข AQPStatistical Process Control WithAdrianโ„ข AQP
Statistical Process Control WithAdrianโ„ข AQP
ย 

shuyangli_summerpresentation08082014

  • 1. WIND AND ENERGY STORAGE Optimal Control of Power Systems in Context of Wind Energy Generation and Storage Shuyang Li
  • 2. Overview ๏‚งMotivation ๏‚ง Model ๏‚ง Algorithms ๏‚ง Methodology ๏‚ง Data ๏‚ง Results ๏‚ง Further Exploration
  • 3. MOTIVATION ๏‚ง Increasing reliance on wind energy ๏‚ง Wind Energy Variance ๏‚ง Avoid waste ๏‚ง Storing Wind ๏‚ง Storage/Unit Allocation
  • 4. MOTIVATION โ€“ Dynamic Programming ๏‚ง Solving Bellmanโ€™s Equation: ๐‘‰ ๐‘† ๐‘› = max ๐‘Ž (๐ถ ๐‘† ๐‘›, ๐‘Ž + ๐›พ๐”ผ ๐‘‰ ๐‘† ๐‘›+1 |๐‘† ๐‘› ) From Optimal Learning (Powell & Ryzhov 2012) ๐‘บ ๐‘ป๐‘บ ๐‘ปโˆ’๐Ÿ ๐‘บ ๐‘ปโˆ’๐Ÿ ๐‘บ ๐‘ปโˆ’๐Ÿ‘ . . .๐‘บ ๐ŸŽ Backward Pass
  • 5. MOTIVATION โ€“ ADP/RL ๏‚ง Monte Carlo Simulation ๐‘บ ๐ŸŽ Forward Pass ๐‘บ ๐Ÿ ๐‘บ ๐Ÿ ๐‘บ ๐Ÿ‘ . . . ๐‘บ ๐‘ป
  • 6. MOTIVATION - Literature ๏‚ง โ€œA Comparison of Approximate Dynamic Programming Techniques on Benchmark Energy Storage Problems: Does Anything Work?โ€ ๏‚ง Jiang, Pham & Powell, 2014 ๏‚ง โ€œBenchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problemsโ€ ๏‚ง Salas & Powell, 2014
  • 7. PROBLEM ๏‚ง Algorithmic Performance ๏‚ง Q-Learning ๏‚ง SARSA(ฮป) ๏‚ง Step Sizes Ryzhov, Frazier & Powell (2014)
  • 8. Overview ๏‚ง Motivation ๏‚งModel ๏‚ง Algorithms ๏‚ง Methodology ๏‚ง Data ๏‚ง Results ๏‚ง Further Exploration
  • 9. MODEL Demand Grid Battery Wind ๐’‚ ๐’• ๐‘พ๐‘ซ ๐’‚ ๐’• ๐‘น๐‘ซ ๐’‚ ๐’• ๐‘ฎ๐‘ซ ๐’‚ ๐’• ๐‘พ๐‘น ๐’‚ ๐’• ๐‘ฎ๐‘น ๐’‚ ๐’• ๐‘น๐‘ฎ ๐‘บ ๐’• = (๐‘น ๐’•, ๐‘ฌ ๐’•, ๐‘ซ ๐’•, ๐‘ท ๐’•) ๐’‚ ๐’• = (๐’‚ ๐’• ๐‘พ๐‘ซ , ๐’‚ ๐’• ๐‘น๐‘ซ , ๐’‚ ๐’• ๐‘ฎ๐‘ซ , ๐’‚ ๐’• ๐‘พ๐‘น , ๐’‚ ๐’• ๐‘ฎ๐‘น , ๐’‚ ๐’• ๐‘น๐‘ฎ )
  • 10. MODEL โ€“ Action Constraints Storage capacity: ๏‚ง ๐‘Ž ๐‘ก ๐‘Š๐‘… + ๐‘Ž ๐‘ก ๐บ๐‘… โ‰ค ๐‘… ๐‘š๐‘Ž๐‘ฅ โˆ’ ๐‘…๐‘ก ๏‚ง ๐‘Ž ๐‘ก ๐‘…๐ท + ๐‘Ž ๐‘ก ๐‘…๐บ โ‰ค ๐‘…๐‘ก Demand satisfied: ๏‚ง ๐‘Ž ๐‘ก ๐‘Š๐ท + ๐œ‚ ๐‘‘ ๐‘Ž ๐‘ก ๐‘…๐ท + ๐‘Ž ๐‘ก ๐บ๐ท = ๐ท๐‘ก Charging / Discharging ๏‚ง ๐‘Ž ๐‘ก ๐‘Š๐‘… + ๐‘Ž ๐‘ก ๐บ๐‘… โ‰ค ๐›พ ๐‘ ๏‚ง ๐‘Ž ๐‘ก ๐‘…๐ท + ๐‘Ž ๐‘ก ๐‘…๐บ โ‰ค ๐›พ ๐‘‘ Flow Conservation ๏‚ง ๐‘Ž ๐‘ก ๐‘Š๐‘… + ๐‘Ž ๐‘ก ๐‘Š๐ท โ‰ค ๐ธ๐‘ก Maximal Wind Usage: ๏‚ง ๐‘Ž ๐‘ก ๐‘Š๐ท = min(๐ท๐‘ก, ๐ธ๐‘ก) ๏‚ง ๐‘Ž ๐‘ก ๐‘Š๐‘… = min(๐‘… ๐‘š๐‘Ž๐‘ฅ โˆ’ ๐‘…๐‘ก, ๐ธ๐‘ก โˆ’ ๐‘Ž ๐‘ก ๐‘Š๐ท ) From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems (Salas, Powell) [Except Maximal Wind Usage]
  • 11. MODEL โ€“ Transition Functions Storage: ๐‘…๐‘ก+1 = ๐‘…๐‘ก + ๐œ™ ๐‘‡ ๐‘Ž ๐‘ก ; ๐œ™ = (0, โˆ’1,0, ๐œ‚ ๐‘ , ๐œ‚ ๐‘ , โˆ’1) Simplified in the stochastic model: ๐‘…๐‘ก+1 = ๐‘…๐‘ก โˆ’ ๐‘Ž ๐‘ก ๐‘…๐ท + ๐‘Ž ๐‘ก ๐‘Š๐‘… + ๐‘Ž ๐‘ก ๐บ๐‘… โˆ’ ๐‘Ž ๐‘ก ๐‘…๐บ From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems (Salas, Powell)
  • 12. Wind: ๏‚ง ๐ธ๐‘ก+1 = ๐ธ๐‘ก + ๐ธ๐‘ก+1 ๏‚ง ๐ธ๐‘ก~๐’ฉ(๐œ‡ ๐ธ, ๐œŽ ๐ธ 2 ) Demand: ๏‚ง ๐ท๐‘ก = max 0,3 โˆ’ 4 sin 2๐œ‹๐‘ก ๐‘‡ From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems (Salas, Powell) Price: ๏‚ง ๐‘ƒ๐‘ก+1 = ๐‘ƒ๐‘ก + ๐‘ƒ0,๐‘ก+1 + 1{๐‘ข ๐‘ก+1โ‰ค0.031} ๐‘ƒ1,๐‘ก+1 ๏‚ง ๐‘ƒ0,๐‘ก~๐’ฉ(๐œ‡ ๐‘ƒ, ๐œŽ ๐‘ƒ 2 ) ๏‚ง ๐‘ƒ1,๐‘ก~๐’ฉ(0, 502 ) ๏‚ง ๐‘ข ๐‘ก~๐’ฐ(0,1) MODEL โ€“ Transition Functions
  • 13. MODEL โ€“ Objective Function Reward Function: ๐ถ ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘ƒ๐‘ก ๐ท๐‘ก โˆ’ ๐‘ƒ๐‘ก ๐‘Ž ๐‘ก ๐บ๐‘… โˆ’ ๐œ‚ ๐‘‘ ๐‘Ž ๐‘ก ๐‘…๐บ + ๐‘Ž ๐‘ก ๐บ๐ท โˆ’ ๐‘โ„Ž ๐‘…๐‘ก+1 Objective Function: ๐น ๐œ‹โˆ— = max ๐œ‹โˆˆฮ  ๐”ผ ๐‘กโˆˆ๐‘‡ ๐ถ(๐‘†๐‘ก, ๐ด ๐‘ก ๐œ‹ ( ๐‘†๐‘ก)) From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems (Salas, Powell)
  • 14. Overview ๏‚ง Motivation ๏‚ง Model ๏‚งAlgorithms ๏‚ง Methodology ๏‚ง Data ๏‚ง Results ๏‚ง Further Exploration
  • 15. ALGORITHMS โ€“ Q-Learning ๏‚ง Action-value function Q ๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐›ผ[๐‘…๐‘ก+1 + ๐›พ ๐ฆ๐š๐ฑ ๐’‚ ๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ] ๏‚งOff-policy ๏‚ง Selection policy: ฮต-greedy ๏‚ง Evaluation policy: pure greedy
  • 16. ALGORITHMS โ€“ Q-Learning ๐‘บ ๐’• ๐’‚ ๐’• ๐‘ธ ๐’•+๐Ÿ ๐‘บ ๐’•, ๐’‚ ๐’• = ๐‘ธ ๐’• ๐‘บ ๐’•, ๐’‚ ๐’• + ๐›ผ[๐‘น๐’•+๐Ÿ + ๐›พ ๐’Ž๐’‚๐’™ ๐’‚ ๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ] ๐‘Ž1 ๐‘Ž2 ๐’‚ ๐Ÿ‘ ๐‘บ ๐’•+๐Ÿ ๐‘„๐‘ก(๐‘†๐‘ก+1, ๐‘Ž1) ๐‘ธ ๐’•(๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐Ÿ‘) ๐‘„๐‘ก(๐‘†๐‘ก+1, ๐‘Ž2)๐‘ธ ๐’•(๐‘บ ๐’•, ๐’‚ ๐’•)
  • 17. ALGORITHMS - SARSA ๏‚ง SARSA [๐‘บ ๐‘ก ๐’‚ ๐‘ก ๐‘น ๐‘ก+1 ๐‘บ ๐‘ก+1 ๐’‚ ๐‘ก+1] ๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐›ผ[๐‘…๐‘ก+1 + ๐›พ๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐’•+๐Ÿ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ] ๏‚งOn-policy ๏‚ง Evaluation policy is selection policy
  • 18. ๐‘บ ๐’• ๐’‚ ๐’• ๐‘ธ ๐’•+๐Ÿ ๐‘บ ๐’•, ๐’‚ ๐’• = ๐‘ธ ๐’• ๐‘บ ๐’•, ๐’‚ ๐’• + ๐›ผ[๐‘น๐’•+๐Ÿ + ๐›พ๐‘ธ ๐’• ๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐’•+๐Ÿ โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ] ๐‘ธ ๐’•(๐‘บ ๐’•, ๐’‚ ๐’•) ๐‘บ ๐’•+๐Ÿ ๐’‚ ๐’•+๐Ÿ ๐‘ธ ๐’•(๐‘บ ๐’•+๐Ÿ, ๐’‚ ๐’•+๐Ÿ) ALGORITHMS - SARSA
  • 19. ALGORITHMS โ€“ SARSA(ฮป) ๏‚ง SARSA(ฮป) includes a backward pass along a path (eligibility trace) ๐‘„๐‘ก+1 ๐‘†, ๐‘Ž = ๐‘„๐‘ก ๐‘†, ๐‘Ž + ๐›ผ๐›ฟ๐‘ก ๐‘๐‘ก(๐‘ , ๐‘Ž) ๐›ฟ๐‘ก = ๐‘…๐‘ก+1 + ๐›พ๐‘„๐‘ก ๐‘†๐‘ก+1, ๐‘Ž ๐‘ก+1 โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ๐‘๐‘ก = ๐›พ๐œ†๐‘๐‘กโˆ’1 + 1, ๐‘† = ๐‘†๐‘ก, ๐‘Ž = ๐‘Ž ๐‘ก ๐›พ๐œ†๐‘๐‘กโˆ’1, ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’
  • 20. ALGORITHMS โ€“ SARSA(ฮป) Reinforcement Learning: An Introduction 2nd Ed. (Richard Sutton & Andrew Barto; 2014)
  • 21. ALGORITHMS โ€“ Step Sizes ๏‚ง Updating equation: ๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐œถ[๐‘…๐‘ก+1 + ๐›พ max ๐‘Ž ๐‘„๐‘ก ๐‘†๐‘ก+1, ๐‘Ž โˆ’ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก ] ๏‚ง Rearranged: ๐‘„๐‘ก+1 ๐‘†๐‘ก, ๐‘Ž ๐‘ก = 1 โˆ’ ๐œถ ๐‘„๐‘ก ๐‘†๐‘ก, ๐‘Ž ๐‘ก + ๐œถ[๐‘…๐‘ก+1 + ๐›พ max ๐‘Ž ๐‘„๐‘ก ๐‘†๐‘ก+1, ๐‘Ž ]
  • 22. ALGORITHMS โ€“ Step Sizes Target Target High ๐œถ Low ๐œถ
  • 23. ALGORITHMS โ€“ Step Sizes ๏‚ง Constant : ๐›ผ = ๐‘˜ ๏‚ง Harmonic : ๐›ผ = ๐‘Ž ๐‘Ž+๐‘› ๏‚ง 1/n : ๐›ผ = 1 ๐‘› ๏‚ง Ryzhov Formula ๏‚ง Ryzhov, Frazier & Powell (2014)
  • 24. Overview ๏‚ง Motivation ๏‚ง Model ๏‚ง Algorithms ๏‚งMethodology ๏‚ง Data ๏‚ง Results ๏‚ง Further Exploration
  • 25. METHODOLOGY - Simulation ๏‚ง 1 Deterministic Problem ๏‚ง 17 Stochastic Problems ๏‚ง 256 Sample Paths per stochastic problem ๏‚ง Training iterations: transitions by Monte Carlo simulation ๏‚ง ฮต-greedy action selection ๏‚ง Evaluation: Averaged over all sample paths
  • 26. Overview ๏‚ง Motivation ๏‚ง Model ๏‚ง Algorithms ๏‚ง Methodology ๏‚งData ๏‚ง Results ๏‚ง Further Exploration
  • 27. DATA โ€“ Deterministic Parameters ๏‚ง ๐‘… ๐‘š๐‘Ž๐‘ฅ = 100 ๏‚ง ๐‘… ๐‘š๐‘–๐‘› = 0 ๏‚ง ๐‘…0 = 0 ๏‚ง ๐œ‚ ๐‘ = ๐œ‚ ๐‘‘ = 0.90 ๏‚ง ๐›พ ๐‘ = ๐›พ ๐‘‘ = 0.10 ๏‚ง ๐‘ป = ๐Ÿ๐ŸŽ๐ŸŽ๐ŸŽ
  • 31. DATA โ€“ Stochastic Parameters ๏‚ง ๐‘… ๐‘š๐‘Ž๐‘ฅ = 30 ๏‚ง ๐‘… ๐‘š๐‘–๐‘› = 0 ๏‚ง ๐‘…0 = 25 ๏‚ง ๐œ‚ ๐‘ = ๐œ‚ ๐‘‘ = 1.00 ๏‚ง ๐›พ ๐‘ = ๐›พ ๐‘‘ = 5 ๏‚ง ๐‘ป = ๐Ÿ๐ŸŽ๐ŸŽ ๏‚ง ๐‘ƒ ๐‘š๐‘Ž๐‘ฅ = 70 ๏‚ง ๐‘ƒ ๐‘š๐‘–๐‘› = 30 ๏‚ง ๐ธ ๐‘š๐‘Ž๐‘ฅ = 7.00 ๏‚ง ๐ธ ๐‘š๐‘–๐‘› = 1.00
  • 35. Overview ๏‚ง Motivation ๏‚ง Model ๏‚ง Algorithms ๏‚ง Methodology ๏‚ง Data ๏‚งResults ๏‚ง Further Exploration
  • 36. RESULTS โ€“ METRICS ๏‚ง Action Traces (Deterministic) ๏‚ง Step Size over Updates ๏‚ง Stored Energy over Time vs. Benchmark ๏‚ง Performance over # of Training Iterations
  • 37. DETERMINISTIC ACTIONS Energy Charged (WR) over Time vs. Benchmark EnergyUnits Time Step
  • 38. DETERMINISTIC ACTIONS Energy Discharged (RD) over Time vs. Benchmark EnergyUnits Time Step
  • 39. DETERMINISTIC ACTIONS Energy Sold (RG) over Time vs. Benchmark EnergyUnits Time Step
  • 40. DETERMINISTIC STORAGE Energy in Storage vs. Benchmark EnergyUnits Time Step
  • 41. DETERMINISTIC PERFORMANCE Performance of Constant Step Sizes, D2 EnergyUnits Training Iteration
  • 42. STEP SIZE TUNING ๏‚ง Declining Step Sizes ๏‚ง Tunable parameters ๏‚ง Harmonic: ๐’‚ ๐’‚+๐‘› ๏‚ง Ryzhov: Estimator update factor ๐‚
  • 43. HARMONIC STEP SIZES Harmonic Step Sizes vs. Iteration StepSize Training Iteration
  • 44. HARMONIC PERFORMANCE Harmonic Step Size Performance ProportionofOptimal Training Iteration
  • 45. RYZHOV STEP SIZE ๏‚ง No-discount formula: ๐›ผ ๐‘›โˆ’1 = ( ๐‘ ๐‘› )2 ( ๐‘ ๐‘›)2+( ๐œŽ ๐‘›)2 ๏‚ง Tunable parameter: ๐‘ ๐‘› = 1 โˆ’ ๐‚ ๐’โˆ’๐Ÿ ๐‘ ๐‘›โˆ’1 + ๐‚ ๐’โˆ’๐Ÿ ๐‘ ๐‘› ( ๐œŽ ๐‘›)2= 1 โˆ’ ๐‚ ๐’โˆ’๐Ÿ ( ๐œŽ ๐‘›โˆ’1)2+๐‚ ๐’โˆ’๐Ÿ( ๐‘ ๐‘›โˆ’ ๐‘ ๐‘›โˆ’1)2
  • 46. Ryzhov Step Sizes (Iteration 1) StepSize Update Step Ryzhov Step Sizes (Iteration 128) StepSize Update Step Ryzhov Step Sizes (Iteration 2k) StepSize Update Step Ryzhov Step Sizes (Iteration 4k) StepSize Update Step Ryzhov Step Sizes (Iteration 8k) StepSize Update Step Ryzhov Step Sizes (Iteration 16k) StepSize Update Step Ryzhov Step Sizes (Iteration 32k) StepSize Update Step Ryzhov Step Sizes (Iteration 65k) StepSize Update Step Ryzhov Step Sizes (Iteration 131k) StepSize Update Step Ryzhov Step Sizes (Iteration 262k) StepSize Update Step Ryzhov Step Sizes (Iteration 524k) StepSize Update Step RYZHOV STEP SIZES Ryzhov Step Sizes (Iteration 1M) StepSize Update Step Ryzhov Step Sizes (Iteration 2M) StepSize Update Step Ryzhov Step Sizes (Iteration 4M) StepSize Update Step
  • 47. Ryzhov explanation ๏‚ง It should even out at a certain constant depending on the variance in the rewards ๏‚ง For this problem, the rewards are deterministic, so the variance measured is the variance in the different states that we reach/are in
  • 48. RYZHOV STEP SIZES vs. ESTIMATOR VARIANCE Update Step StepSize E.Variance
  • 49. RYZHOV PERFORMANCE Ryzhov Step Size Performance ProportionofOptimal Training Iteration
  • 50. AGGREGATE PARAMETERS From Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems (Salas, Powell)
  • 51. AGGREGATE STORAGE Storage vs. Benchmark StepSize Time Step
  • 52. AGGREGATE PERFORMANCE Performance of All Algorithms, S13 ProportionofOptimal Training Iteration
  • 53. AGGREGATE PERFORMANCE Performance of Q-Learning, S13 ProportionofOptimal Training Iteration
  • 54. AGGREGATE PERFORMANCE Performance of SARSA, S13 ProportionofOptimal Training Iteration
  • 55. Explanations ๏‚ง Ryzhov flattens out very noticeably because the step size is too high, it repeatedly overcompensates and bounces around. ๏‚ง Harmonic looks like itโ€™s still getting better! Need to have it decline more slowly / not go to 0 so quickly. Ryzhov harmonic constant maybe?
  • 56. STEP SIZE PERFORMANCE Performance of Constant 0.1, All Problems ProportionofOptimal Training Iteration Performance of Harmonic, All Problems ProportionofOptimal Training Iteration Performance of 1/n, All Problems ProportionofOptimal Training Iteration Performance of Ryzhov, All Problems ProportionofOptimal Training Iteration
  • 58. Overview ๏‚ง Motivation ๏‚ง Model ๏‚ง Algorithms ๏‚ง Methodology ๏‚ง Data ๏‚ง Results ๏‚งFurther Exploration
  • 59. FURTHER EXPLORATION ๏‚ง Additional renewable energy sources / storage devices ๏‚ง Finer levels of discretization ๏‚ง Remove wind usage restriction ๏‚ง Additional algorithms ๏‚ง Actor-Critic ๏‚ง Gradient Descent (Linear/Non-Linear VFAs)