3. 3
Overview
placement of charging stations task as a reinforcement learning problem (PCRL)
Data Sources Utility Model RL Environment
4. 4
Implementation Details
Score for charging plans
• A charging plan 𝑝 on 𝐺
• Road Network, 𝐺 = (𝑉,𝐸)
• 𝑉 = the road network junctions
• 𝐸 = the roads direction
• 𝐺 is defined as a tuple 𝑠 = (𝑣,𝑡)
• 𝑣 = the location node
• 𝑡 = types of chargers (𝑡1, ..., 𝑡𝑚 )
• 𝑡𝑖 ∈ N = the number of chargers of type 𝑖 at 𝑠
• Charging plan 𝑝 ⊂ 𝑆
• Charging plans (CP)
• Charing stations (CS)
• Denote the set of all possible charging plans (CP)
as 𝑃
5. 5
Related Work
Electric vehicle charging station placement and related optimization algorithms
• NP-hard
• Minimize the drivers’ discomfort
• Maximize the gathered revenue
• Cover the entire demand region while minimizing
drivers’ travel times and the total number of
charging stations by adopting a genetic algorithm
• Minimize the CO_2 emissions
• Considered optimizing of the number of chargers
per station
• Greedy search
• Deep RL
6. 6
Implementation Details
Score for charging plans
• The benefit (𝑃 → R) of a CP 𝑝 quantifies the positive effect of plan 𝑝
for satisfying the charging demand.
• The cost (𝑃 → R) of 𝑝 quantifies the effort required to realize 𝑝.
• 𝜆 ∈ [0,1] is a weighting parameter to trade off 𝑏𝑒𝑛𝑒𝑓𝑖𝑡(𝑝) and 𝑐𝑜𝑠𝑡 (𝑝
)
• Charging plans (CP)
• Charing places (CS)
7. 7
Implementation Details
Purpose of RL
• Find the optimal charging plan
• Maximize the overall utility by trading off 𝑏𝑒𝑛𝑒𝑓𝑖𝑡(𝑝)
and 𝑐𝑜𝑠𝑡(𝑝).
8. 8
Implementation Details
Premises for 𝑏𝑒𝑛𝑒𝑓𝑖𝑡
• CS with a higher capacity should result in a higher benefit since
they can provide service for a higher number of vehicles.
• Existing nearby CS should decrease the benefit of a new CS
since existing demands are already partially satisfied.
9. 9
Implementation Details
Premises for 𝑏𝑒𝑛𝑒𝑓𝑖𝑡
• Charging station capacity
• Let 𝑠 = (𝑣,𝑡) be a CS with 𝑚 charger types
• 𝑡 = (𝑡1, ..., 𝑡𝑚 ) ∈ N𝑚
• 𝐶(𝑠) = 𝑖=1
𝑚 𝑡𝑖𝑐𝑖
• Capacity 𝐶 (𝑠 ) of a CS 𝑠 sum of the charging power
• 𝑐𝑖 : the available charging power of the charger 𝑡𝑖
• Influential radius & coverage
• 𝑟max : The maximal influential radius
• CS 𝑠 with scaled down and dimensionless capacity 𝐶~(𝑠)
10. 10
Implementation Details
𝑏𝑒𝑛𝑒𝑓𝑖𝑡
• Coverage cov(𝑣) of a vertex 𝑣 is defined as the number of CS in whose influential radius
𝑣 is : cov(𝑣) = |{𝑠 ∈ 𝑃|𝑑(𝑣,𝑠) ≤ 𝑟(𝑠)}|
• 𝑑(𝑣,𝑠) denotes the haversine distance between the CS 𝑠 and the junction 𝑣
• The coverage of a node describes how good it is supplied with charging stations.
• The coverage cov(v) is less critical in areas where there is much opportunity for home
charging, e.g. by CS in private garages.
12. 12
Implementation Details
Premises for cost
• Formulate the cost of a CP 𝑝 based on the expected required
time to charge a vehicle.
• Consider travel time, charging time, and waiting time (if all
chargers are occupied)
• Travel time consider…
• The distances between the junctions of the road network and the
respective nearest CS
• The charging demand at the individual junctions
• 𝑑𝑒𝑚(𝑣) ∈ [0, 1] : the charging demand of a junction
• weakened demand : The charging demand is partly satisfied by the home charging
infrastructure
13. 13
Implementation Details
Cost : Travel time
• 𝑑𝑖𝑠𝑡(𝑠,𝑣) : the haversine distance
• V : the average velocity in town
• 𝑑𝑒𝑚(𝑣) ∈ [0, 1] : the charging demand of a junction
• weakened demand : The charging demand is partly satisfied
by the home charging infrastructure
14. 14
Implementation Details
Cost : Charging Time
• 𝐷(𝑠) estimates the number of vehicles approaching a CS 𝑠
• service rate 𝜇(𝑠)
• 𝐸 is the energy required to charge a single-engine
15. 15
Implementation Details
Cost : Waiting Time
• Pollaczek–Khinchine formula states a relationship between the
queue length and service time distribution
18. 18
Constraints & Optimization
Optimization
• Installation costs
• The financial cost of installing new CS for our budget constraint
• Estate cost for Charging station s for the radius v
• Charge cost for m chargers
19. 19
REINFORCEMENT LEARNING
RL Problem Formulation
Observations
• Captures the intermediate placement of the CS
• Consists of …
• The current CP
• The latitude and longitude coordinate
• The resulting charging demand 𝑑𝑒𝑚(𝑣)
• The private charging infrastructure 𝑝𝑟𝑖𝑣(𝑣)
• The estate price estate-cost(𝑣) for each node 𝑣 ∈ 𝑉
Actions
• Action 𝑎𝑖 ∈ {Create by benefit, Create by demand, Increase by benefit, Increase by demand,
Relocate}
• The agent chooses the action from the set of discrete actions and can either
• Create a new CS
• Increase the number of chargers at an existing CS
• Relocate a charger from one CS to another
20. 20
REINFORCEMENT LEARNING
RL Problem Formulation
Rewards
• Calculate the reward based on the proposed utility function
• Compare the current CP 𝑝 with the CP resulting from action 𝑎𝑖 using the 𝑠𝑐𝑜𝑟𝑒(𝑝)
function
• The initial score is zero if we start with an empty CP.
• If we extend the existing charging infrastructure, we initiate the training with the score of
the original CP.
• Combine the score difference to obtain the reward function
21. 21
REINFORCEMENT LEARNING
RL
Implementation
Deep Q learning
• Combines a convolutional neural network
• Trained via stochastic gradient descent with an experience replay mechanism
• Address data correlation and non-stationary distributions.
Paramete
rs
• 𝛼 = 0.4 (weighting parameter for cost)
• 𝑚=3 (# of types of charger)
• 𝐾=8 (total number of chargers for each CS)
• 𝐵=€5Mio. (Installment budget)
• 𝐸=85kWh (charging capacity)
• 𝑟𝑚𝑎𝑥 =1km (max radius)
• 𝑤=0.1 (weighting parameter for benefit)
• 𝜆 = 0.5 : To prioritize the cost function and the benefit function equally
22. 22
EVALUATION SETUP
Paramete
rs
Charging power
• 𝑐1, 𝑐2, 𝑐3 for the three charger types is 7 kW, 22 kW, 50kW
• The respective estate-cost is €300, €750, €28000
• Prices based on the usual manufacturer prices
23. 23
EVALUATION SETUP
Baselines
Existing Charging
• status quo of charging infrastructure,
Bounding & Optimizing
• Bounding & Optimizing Based Greedy
Bounding & Optimizing +
• Replace the unbounded knapsack algorithm with the same algorithm as in the
PCRL action space to get an initial charging configuration.
Best Benefit
• Always chooses the junction with the highest possible 𝑏𝑒𝑛𝑒f𝑖𝑡 as the CS
Highest Demand
• This baseline places the next CS at the junctions with the highest demand
Charger-based Greedy
• Fast Charger- based greedy algorithm
• Always chooses the action that increases its utility score the most.
24. 24
EVALUATION SETUP
Datasets
• Two German cities : Hannover, Dresden
Charging Demand
• Divide the datasets into 32 x 32 grid cells
• Count the number of trips ending in individual grid cells for the period.
• Expect that a high number of trips ending in a cell
Charging infrastructure
• Open Charge Map portal
• Provides information about CS containing location
and the total number of ports available with the port type
house private charging
• How much of the city area is used as residential land (OpenStreetMap)
Estate price
• Extract the estate price data from the city municipality’s rent index 2021
25. 25
EVALUATION SETUP
Evaluation
Metrics
Score (𝑠𝑐𝑜𝑟𝑒)
• The overall score of the CP according to the utility
Benefit (𝑏𝑒𝑛𝑒f𝑖𝑡 )
• quantifies the potential usefulness of the respective model (higher is better).
Waiting Time (𝑤𝑎𝑖𝑡)
• Waiting times occurring at all CS in the road network (lower is better)
Travel Time (𝑡𝑟𝑎𝑣𝑒𝑙)
• Sum of travelling times within the road network (lower is better)
Charging Time (𝑐h𝑎𝑟𝑔𝑖𝑛𝑔)
• Sum of charging times within the road network (lower is better)
Maximum Travel Time (𝑡𝑟𝑎𝑣𝑒𝑙max )
• The maximum time it takes 𝑚𝑎𝑥 to travel to reach a CS from any junction in the road
network any CS (lower is better)
Maximum Waiting Time (𝑤𝑎𝑖𝑡 max )
• Maximum Waiting Time (𝑤𝑎𝑖𝑡 ). The maximum time spent 𝑚𝑎𝑥 waiting at any CS in the
road network (lower is better)