- 1. 1/26 Factored MDPs for Optimal Prosumer Decision-Making Angelos Angelidakis aggelos@intelligence.tuc.gr Georgios Chalkiadakis gehalk@intelligence.tuc.gr School of Electronic and Computer Engineering Technical University of Crete Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 2. 2/26 Outline 1 Introduction 2 Background 3 Our Model 4 Solving the Factored MDP 5 Prosumer Production and Consumption Models 6 Experiments and Results Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 3. 3/26 Prosumer Produces and consumes energy Single residence, an industry, a neighbourhood Connected to the electric Grid (or not) Key role to stabilization of the electricity network Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 4. 4/26 What we do in this paper Focus on micro-grid prosumers: – Encompassing, e.g., wind–turbine–generators (WTG), photovoltaic systems (PVS), batteries and household neighbourhoods Optimize prosumer operation decisions: – buy and sell energy from/to utility companies – store energy – select electricity tariffs to subscribe to while ensuring consumer needs are satisﬁed Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 5. 5/26 Key concepts and contributions A complete framework for microgrid–prosumer decision making: A Factored Markov Decision Process to model the prosumer decision problem – 24 hours ahead Exact optimal solution, works for a microgrid of any size Consumption and production-predicting submodels Test on a real–world dataset Comparison with SPUDD – a robust method for stochastic planning in large environments Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 6. 6/26 Outline 1 Introduction 2 Background 3 Our Model 4 Solving the Factored MDP 5 Prosumer Production and Consumption Models 6 Experiments and Results Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 7. 7/26 Stochastic Planning Using Decision Diagrams (SPUDD) ﬁnds (near-)optimal policies in very large problems combines value iteration with algebraic decision diagrams In our problem, SPUDD: produces policies that coincide with ours but cannot solve the problem in the required 24-hours – operates over an input script which can grow large Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 8. 8/26 Outline 1 Introduction 2 Background 3 Our Model FMDPs Factored Representation Physical Constraints Transition Function Factored Reward Representation 4 Solving the Factored MDP 5 Prosumer Production and Consumption Models 6 Experiments and Results Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 9. 9/26 Factored Markov Decision Process (FMDPs) A compact alternative to standard MDP representation Set of states correspond to multivariate random variables, s = si , with the si ∈ DOM(si) Reward functions used are assumed to be factored into speciﬁc components FMDP allow for external signals affecting state variables Various solution methods exist1, e.g.: – linear value functions – approximate linear programming – SPUDD 1 – [Guestrin, Carlos, et al. "Efﬁcient solution algorithms for factored MDPs." Journal of Artiﬁcial Intelligence Research 2003] – [Hoey, Jesse, et al. "SPUDD: Stochastic planning using decision diagrams." Proceedings of the Fifteenth conference on Uncertainty in artiﬁcial intelligence 1999] Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 10. 10/26 A Factored Representation of our model States Hour-of-Day, DOM(tms): {1 . . . 24} Energy stored on batteries, DOM(bat): {0 . . . Batterymax} Tariff prosumer has subscribed into, DOM(tf): {tf1, · · · , tfK} Actions buy energy, DOM(buy):{−RESnom . . . Loadmax} charge batteries, DOM(chg):{−Batterymax . . . Batterymax} select tariff by the prosumer, DOM(seltf):{0 . . . K} External Signals available price tariffs - buying–selling prices provided by multiple utility companies, for each hour of the day predicted production, DOM(prod):{0 . . . RESnom} predicted consumption, DOM(cons):{0 . . . Loadmax} Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 11. 11/26 Physical Constraints electricity energy balance must be maintained prodt − const − chgt + buyt = 0 storage unit cannot be charged over its capacity chgt ≤ Batterymax − batt energy quantity discharged cannot exceed current quantity stored: −chgt ≤ batt the state of charge must be 20% to 100% 2: 0.2 ≤ batt Batterymax ≤ 1 2 – [Chiasson, John, and Baskar Vairamohan. "Estimating the state of charge of a battery." IEEE Transactions on Control Systems Technology 2005] Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 12. 12/26 Transition Function stochastic state transitions in our model: – successful charge (store c) with probability p: Pr(batt+1 = batt + c | chgt = c, batt) = p – unsuccessful charge (store c) with probability 1 − p: Pr(batt+1 = bat ∈ boundbat | chgt = c, batt) = (1 − p)/N – while tariff is affected by tariff selection action: - seltf1 . . . seltfK Overall transition probability: Pr(tmst+1, batt+1, tft+1|tmst, batt, tft, chgt, seltf,t) = Pr(batt+1|batt, chgt) · Pr(tft+1|tft, seltf,t) Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 13. 13/26 Factored Reward Representation Our rewards correspond to costs: Cost(st, at, st+1) = Cenergy + Cperiod + Cbl Cenergy, cost per Wh for buying electricity Cperiodic, periodic subscription cost of the tariff Cbl, cost associated with battery life losses Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 14. 14/26 Cenergy cost per Wh for buying–selling electricity: Cenergy(tft+1, buyt) = buyt · buyingtft+1 if buyt ≥ 0 buyt · sellingtft+1 if buyt < 0 Cperiodic cost of tariff Cperiod(tft+1, pricet+1 tf ) = C1 exp{−C2 · (buyingt+1 tf − sellingt+1 tf )} 3 −0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25 0.005 0.01 0.015 0.02 0.025 buying price − selling price periodiccost periodic cost 3 http://www.eia.gov/state/search/#?1=102&3=21&a=true&2=211 Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 15. 15/26 Cbl, costs associated with battery life losses: Cbl = Lloss · Cinit−bat with Cinit−bat initial investment cost for the batteries: Lloss = Ac Atotal with Ac the battery effective throughput and Atotal the total cumulative throughput 4 4 A battery size of Q Ah will deliver an effective Atotal = 390 · QAh over its lifetime Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 16. 16/26 Ac is then expressed as: Ac = λsocAc where λsoc is an effective weighting factor: λsoc = k · SOC + d 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 Time kWh empirical datapoints (soc,λ soc ) fitted line λ soc = k soc + d state of charge of the battery: SOC = batt Batterymax actual throughput: Ac = chgt Vbattery Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 17. 17/26 Outline 1 Introduction 2 Background 3 Our Model 4 Solving the Factored MDP 5 Prosumer Production and Consumption Models 6 Experiments and Results Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 18. 18/26 Solving the Factored MDP for all instantiations of s do set VT+1(s) = 0 end for all time-steps t in descending order (i.e., with 1, · · · , T stages-to-go) do for all instantiations of st do Vt(st) ← max at st+1 Pr(st+1 |at, st)· R(st, at, st+1) + Vt+1(st+1) end end for all instantiations of s and all time-steps t do π(s, t) = arg max a s Pr(s |a, s) (R(s, a, s ) + Vt+1(s )) end Value Iteration operating on a ﬁnite–horizon problem provides the optimal solution for a prosumer of any size within the required time Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 19. 19/26 Outline 1 Introduction 2 Background 3 Our Model 4 Solving the Factored MDP 5 Prosumer Production and Consumption Models Production Prediction Consumption Prediction 6 Experiments and Results Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 20. 20/26 Production Prediction RENES: a web-based PVS and WTG production prediction tool employs free-of-charge weather forecasts Developed in our lab 5 5 http://www.intelligence.tuc.gr/renes/ Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 21. 21/26 Consumption prediction for real households data Polynomial Degree MSE 1 0.022372 2 0.021312 3 0.020175 4 0.017679 5 0.016861 6 0.017329 7 0.017355 8 0.017167 9 0.017399 10 0.017611 MSE of Bayesian linear regression Φ functions Polynomial Degree MSE GP with polynomial kernel (GP-poly) 0.0173 GP with Gaussian kernel (GP-G) 0.006943 Bayesian linear Regres- sion (BLR) 0.0169 MSE of GP & Bayesian Linear Regression 0 5 10 15 20 25 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Time kWh variance of trained area (x,y) (xtrain ,ytrain ) (x test ,y test ) GP−poly GP−G BLR Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 22. 22/26 Outline 1 Introduction 2 Background 3 Our Model 4 Solving the Factored MDP 5 Prosumer Production and Consumption Models 6 Experiments and Results Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 23. 23/26 Experiments and Results 30 households of New Hampshire 20 PV modules with nominal power 60kW per module 2 windturbines with nominal power 1000kW each 24 deep cycle 12Volts batteries 212AH C20 / FMD200 – VRLA/AGM, with cost e269,00 each, Battery lifetime: 10-12 years 0 5 10 15 20 25 30 0 100 200 300 400 500 600 700 800 900 RES−Load Time kWh Load RES Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 24. 24/26 Actions – States battery capacity bat = [0kWh : 1kWh : 60kWh] charge action chg = [−60kWh : 1kWh : 60kWh] tariffs Tariff Buy Sell 1 0.1 0.1 2 0.1 0.2 3 0.1 0.3 4 0.2 0.1 5 0.2 0.2 6 0.2 0.3 7 0.4 0.1 8 0.3 0.2 9 0.3 0.3 transition boundaries – boundarybat=1kWh – boundarytf=0.1e – maximum number of transitions are ∼ 15 Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 25. 25/26 Results VI–SPUDD Both SPUDD and our method compute the same (optimal) policies. . . However. . . Results Horizon |S × A| bounded region size Our method (hours) SPUDD (hours) Script Genera- tion Execution Time Total Time 24 664290 15 1.76 13.4992 0.184 13.6832 90 15.84 46.9188 1.19 48.1088 2624490 15 8.7603 36.98 0.73975 37.71975 48 664290 15 3.5 16.8221 0.4271 17.2492 Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 26. 26/26 Wrapping–Up A complete framework for optimal microgrid-prosumer decision-making Simple yet effective solution method Tested on a real-world dataset Vastly outperforms a known stochastic model (SPUDD) in terms of solution computation time In progress: test alternative methods6 and develop novel techniques for tackling large scale problems 6 – [Munos, Remi, and Csaba Szepesvari. "Finite-time bounds for ﬁtted value iteration." The Journal of Machine Learning Research 2008] – [Guestrin, Carlos, et al. "Efﬁcient solution algorithms for factored MDPs." Journal of Artiﬁcial Intelligence Research 2003] Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making
- 27. 26/26 Wrapping–Up A complete framework for optimal microgrid-prosumer decision-making Simple yet effective solution method Tested on a real-world dataset Vastly outperforms a known stochastic model (SPUDD) in terms of solution computation time In progress: test alternative methods6 and develop novel techniques for tackling large scale problems Thank you, any questions? 6 – [Munos, Remi, and Csaba Szepesvari. "Finite-time bounds for ﬁtted value iteration." The Journal of Machine Learning Research 2008] – [Guestrin, Carlos, et al. "Efﬁcient solution algorithms for factored MDPs." Journal of Artiﬁcial Intelligence Research 2003] Angelos Angelidakis & Georgios Chalkiadakis Factored MDPs for Optimal Prosumer Decision-Making