RL.pdf

Journal of Building Engineering 73 (2023) 106805
Available online 8 May 2023
2352-7102/© 2023 Elsevier Ltd. All rights reserved.
Indoor temperature preference setting control method for thermal
comfort and energy saving based on reinforcement learning
Wei Li a,b,*
, Yifan Zhao a
, Jili Zhang c
, Changwei Jiang a
, Siyu Chen a
, Liangxi Lin a
,
Yuegui Wang a
a
School of Energy and Power Engineering, Changsha University of Science and Technology, Changsha, 410114, China
b
State Key Laboratory of Green Building in Western China, Xian University of Architecture & Technology, Xi’an, 710055, China
c
Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian, 116024, China
A R T I C L E I N F O
Keywords:
Indoor temperature
Thermal comfort
Energy consumption
Smart control
Reinforcement learning
A B S T R A C T
The optimal control of heating, ventilation, and air-conditioning (HVAC) systems affects people’s
living and building energy consumption. Many studies have introduced personal thermal sensa
tion into the air-conditioning system to reduce energy consumption while improving the indoor
thermal environment. However, there are limitations that require multiple adjusted temperature
settings to create a room thermal environment that meets personal thermal comfort needs, and
the impact of room temperature set values on air-conditioning energy consumption is often
ignored. This study proposes an indoor temperature preference setting control method (ITPSCM)
for thermal comfort and energy saving based on reinforcement learning (RL), which uses personal
thermal comfort and the energy-saving effect of the system as the reward function. Q-learning is
employed to obtain optimized room temperature settings to meet users’ temperature preferences.
Several comparative experiments are carried out in an indoor environment control testbed, and
the experimental data is used to train the RL algorithm. As the results show, compared with the
temperature set value-based control method, the ITPSCM can realize an indoor thermal envi
ronment that meets personal thermal comfort preferences in the early stages of air-conditioning
system operation. With just the thermal comfort as a reward, 22.34% of daily energy consumption
can be saved; with both the thermal comfort reward and energy-saving effect reward are
considered, 26.48% of daily energy consumption can be saved.
1. Introduction
As people spend about 90% of their time indoors [1], the indoor conditions of buildings are a necessary part of the personnel living
environment, and buildings account for about 40% of global energy consumption [2]. Heating, ventilation, and air-conditioning
(HVAC) systems, which are mainly responsible for providing a satisfactory thermal environment in buildings, are the major energy
consumers within buildings, accounting for 40–60% of the energy demand [3]. Therefore, thermal comfort and HVAC system energy
consumption are the primary concerns of building smart artificial environments and intelligent HVAC systems.
In fact, most buildings use the static temperature set value-based control method for air-conditioning systems, which requires
managers to continuously monitor the HVAC system’s behavior and set a suitable room temperature set value according to the relevant
standard during the operation. However, under this control method, constant temperature set values are used to construct the room
* Corresponding author. School of Energy and Power Engineering, Changsha University of Science and Technology, Changsha, 410114, China.
E-mail address: dlut_liwei@163.com (W. Li).
Contents lists available at ScienceDirect
Journal of Building Engineering
journal homepage: www.elsevier.com/locate/jobe
https://doi.org/10.1016/j.jobe.2023.106805
Received 31 January 2023; Received in revised form 5 May 2023; Accepted 8 May 2023

2
thermal environment, which ignores the individual preference of personal demand. In order to improve this situation, many studies
have been published on introducing thermal sensation into the HVAC system operation strategy to achieve the optimal control of
people-centric HVAC systems [4–6]. However, such a control method must obtain the accurate thermal sensation of personnel.
Chaudhuri et al. [7,8] explored the possibility of predicting personal thermal sensation through physiological parameters, and then
proposed a thermal state evaluation model with a skin temperature and gradient feature. It is possible to obtain satisfactory personal
thermal sensation prediction (TSP) results using the local physiological parameters from different body positions [9–11]. With the
development of infrared technology, some researchers have utilized infrared thermography to monitor the body’s facial temperature,
then predict personal thermal sensation using a specific algorithm [12–17]. Nevertheless, the identification accuracy of this method is
limited when the occupant is far from the location of the monitoring device or in a state of motion, resulting in unstable TSP results.
Other scholars have studied the use of wearable devices to monitor physiological parameters and predict personal thermal sensation,
finding this method more accurate and reliable [18–20]. Developing a fast and accurate prediction model of personal thermal
sensation is significant for achieving the optimal operation control of HVAC systems to better meet personal requirements. Some
scholars predicted personal thermal comfort by using a wristband device to obtain the physiological parameter, then performing
timely temperature set value changes, which was shown to save energy consumption while improving the occupants’ thermal comfort
level [21,22]. Li et al. [23,24] put forward a neural network model for predicting room temperatures, which can more effectively
achieve the control of indoor thermal environments. Xiong et al. [4] developed an adaptive thermal comfort model to enable the
construction of a personalized indoor thermal comfort environment that can be applied to actual building systems. By introducing
personal TSP into the operation strategy of HVAC systems, the energy consumption of air-conditioning can be reduced while improving
the satisfaction of personnel.
With the development of machine learning algorithms, reinforcement learning (RL) as a branch of machine learning in the artificial
intelligence category has been widely applied in actual control engineering. In recent years, some researchers have begun to focus on
RL as a possible control method for HVAC systems because it can handle HVAC systems with time delays. Costanzo et al. [25] proposed
a data-driven building climate control method based on RL and verified it in the laboratory, showing that it could converge within 20
days and obtain a 90% optimal solution for different outdoor air temperatures. Yu et al. [26] applied deep RL to building energy
management systems and developed an energy management algorithm based on deep deterministic policy gradients which had good
effectiveness and robustness when the thermal dynamics model was not built and there were many parameter uncertainties. On the
other hand, some studies used RL to optimize window opening and closing time, and the results indicated that the control method
enabled the improvement of the indoor thermal environment and air quality [27,28]. Moreover, researchers optimized HVAC system
units by using RL to reduce the energy consumption of the air-conditioning. Nagarathinam et al. [29] put forward a multi-agent RL
algorithm by adjusting the set values of air handling units and chillers, which not only improved comfort but also reduced energy
consumption by 17%. Zou et al. [30] realized the optimal control of air handling units using deep deterministic policy gradients, which
achieved energy reduction based on maintaining a 10% prediction discomfort percentage compared with actual energy consumption.
Valladares et al. [31] proposed an RL agent for the combined control of ventilation fans and air-conditioning units in subtropical
environments without heating. Other researchers attempted to optimize the operation strategy of HVAC systems to improve indoor
thermal comfort while decreasing energy consumption. Gao et al. [32] developed a deep deterministic policy gradient-based HVAC
system optimal control method that achieved the reduction of laboratory energy consumption while maintaining indoor thermal
comfort through the combinatorial regulation of the temperature and humidity set values. Yuan et al. [33] put forward an RL-based
model-free control strategy that combined rule-based and RL-based control algorithms as well as a complete process, resulting in the
RL controller performing the best in terms of both the discomfort time and energy cost of the air-conditioning system. Gupta et al. [34]
suggested a residential building HVAC system control method based on model-free RL that could save energy costs and reduce thermal
discomfort. The above studies indicate that needn’t build an accurate HVAC system mathematical model when applying the RL method
to the HVAC system smart control process.
Most existing control methods involve introducing personal thermal sensation to HVAC systems, which is achieved to save energy
while satisfying the thermal comfort needs of personnel. However, there is still much room for improvement in its control performance
and energy-saving effect. First, multiple adjusted temperature set values are needed to meet the thermal comfort needs of personnel in
early HVAC system operation. If an air-conditioning system can remember personal preferences for room temperature set values, it can
quickly achieve temperature setting during operation and ensure system efficiency and stability. Second, most existing control
methods take meeting human thermal comfort as the primary goal and do not include energy consumption in the control algorithm, so
there remains further potential for energy-saving.
To solve the problem that the temperature value can be set according to the occupants’ preferences, obtain and learn the occupants’
personalized setting characteristics of the room temperature set value and consider the influence of the room temperature set value on
the system’s energy-saving effect, this paper proposes an indoor temperature preference setting control method (ITPSCM) for thermal
comfort and energy saving based on reinforcement learning that can drive the optimal regulation of air-conditioning systems through
actual operation data. Experiments are conducted to complete the training of the RL model for room temperature setting with the
obtained experimental data, and ultimately verify the effectiveness of the ITPSCM for air conditioning systems.
The rest of the paper is organized as follows: Section 1 makes a comprehensive analysis of the thermal comfort and energy-saving
effects of the temperature set value; Section 2 introduces the ITPSCM and explains the experimental process; Section 3 presents and
discusses the experimental results; and Section 4 draws conclusions, reviews limitations and looks ahead to future work.
W. Li et al.

3
2. Room temperature setting comprehensive effect analysis
The room temperature influences not only human thermal comfort but also the energy consumption of air-conditioning systems.
Existing studies have shown that increasing the temperature set value within the range of the room temperature comfort zone can
reduce energy consumption of air-conditioning systems while ensuring personal thermal comfort [35,36]. Thus, it is possible to further
optimize the energy-saving potential of air-conditioning systems by raising the temperature set value to meet the thermal comfort of
the occupants. When the indoor air temperature is lower than the temperature of the most thermal comfort of personnel, increasing the
room temperature set value means improving the energy-saving effect and increasing the occupants’ thermal comfort to a certain
extent. When the indoor air temperature is higher than the temperature of the most thermal comfort of personnel, increasing the room
temperature set value means decreasing the occupants’ thermal comfort. Therefore, the process of determining the room temperature
setting for the air conditioning system can be divided into the following two cases.
(1) The temperature set value should be set at room temperature corresponding to the optimal value of the occupants’ thermal
comfort point when only considering the occupants’ thermal comfort. At this temperature, the energy-saving effect that the air-
conditioning system is not optimal.
(2) When the temperature setting takes into account not only the occupants’ thermal comfort but also the energy-saving effect, we
should first ensure that the occupants’ thermal comfort is within the comfort zone, then appropriately increase the room
temperature set value, making the sum of thermal comfort and energy-saving effect at their maximum at a certain room
temperature. This temperature can effectively meet the occupants’ thermal comfort needs while obtaining a better energy-
saving effect at the same time.
As such, in determining the room temperature set value process, we must first determine the comfort zone that can meet the
occupants’ thermal comfort needs. However, the temperature set value requires multiple adjustments to meet the occupants’ thermal
comfort needs during early air-conditioning system operation. In fact, personnel have good adaptability to ambient temperature, so the
occupants’ thermal comfort will lie within a fixed range. If the air-conditioning system can remember personal preferences for room
temperature set values, it can quickly achieve air-conditioning temperature settings during operation and ensure efficiency and sta
bility. Second, the setting of the temperature set value influences the occupants’ thermal comfort and the energy consumption of the
system, so the energy-saving potential can be further explored. Therefore, an RL algorithm is applied to the room temperature setting
process and further research is carried out on the smart control of the room temperature set value in order to improve the energy-saving
effect while maintaining the occupants’ thermal comfort.
The room temperature set value control conforms to the Markov Decision Process (MDP) in terms of three basic characteristics in
the interaction process between the indoor environment and air-conditioning controller: first, the temperature value selection only
relies on the current room temperature and the occupants’ thermal sensation state, which are independent of the previous room
temperature and the occupants’ thermal comfort state; second, the temperature set value output decision is completely made by the
air-conditioning system controller, enabling it to change the room temperature and the occupants’ thermal sensation state by adjusting
the room temperature set value; and third, the interaction process between the indoor environment and the air-conditioning system
controller is based on the time dimension, which means that the environment state will change with the passage of time after the
controller outputs the control command of the room temperature set value, and the cycle will be repeated when it senses a new
environment state. Therefore, the control process of temperature setting can be regarded as a classical MDP, and the RL method can be
used to achieve the energy-efficient regulation of room temperature settings.
3. Method
RL is an intelligent method for obtaining habitual behaviors that maximize benefits based on external stimuli and perfecting its
learning ability through continuous trial and error to achieve continuous optimal control [37]. Fig. 1 shows a typical information
interaction process between the agent and environment in the RL process. At moment t when the environment state is st, the agent will
Fig. 1. Schematic diagram of RL process [38].
W. Li et al.

4
implement action at when observing the current environment state, which causes the environment state to change to st+1, for which the
agent will be rewarded with reward rt+1.
The policy π is a mapping relation that exists in the RL process that is a state corresponding to an action or taking different actions
with different probabilities. The RL agent begins with a random policy that obtains a series of state, action and reward training samples
and the goal is to seek out the policy that can maximize the reward, which can be expressed as:
{s1, a1, r1, s2, a2, r2, · · · , st, at, rt} (1)
Wherein s represents the environment state; a is the action that the agent will implement; r is the reward that the agent will be
obtained.
After obtaining a training sample, RL optimizes the policy based on the sample so that it obtains the optimal reward for the current
sample.
3.1. Model construction
In the interaction process between the indoor environment and the air-conditioning system controller, the controller will adjust the
room temperature set value according to the current room temperature value and the occupants’ group thermal sensation, then the air-
conditioning system operation will change the room temperature state and the controller will calculate the corresponding reward
based on its gathering of environment and thermal sensation data. The room temperature preference setting RL algorithm code was
implemented in Python. The room temperature set value RL flow is shown in Fig. 2.
3.1.1. Parameter setting
The MDP represents the RL problem in terms of five constituent elements {S,A,P,R,γ}, wherein S is the state; A is the action; P is the
state transition probability that describes the probability of the state transfer from st to st+1 after executing action at; R is the reward
function that provides the reward obtained by taking action at; and γ is the discount factor, γ∈[0,1].
(1) State St: In the temperature setting process, the controller makes a response according to the current moment indoors and the
group thermal sensation. Hence, room temperature tr and group thermal sensation GTS are defined as state parameters in the RL
model, then the state space at moment t can be defined as:
S(st) = [tr t, GTSt] (2)
Wherein the room temperature is monitored by the relevant temperature sensors; the personal skin temperature and heart rate are
collected by the wearable sensor. The data are transmitted to a smartphone APP to predict the individualized thermal sensation of
personnel, and the individualized thermal sensation is expressed by ASHRAE 7-point scale [39], and then calculate the occupant’s
group thermal sensation through the multi-occupant thermal sensation fuzzy comprehensive evaluation method [40]. Finally, the
calculation is uploaded to the controller through a smartphone APP. That group thermal sensation is a comprehensive representation
of the individualized thermal sensation of all personnel in a controlled zone, which can reflect the group thermal comfort of the
personnel in the zone.
(2) Action At: Defining the temperature set value as an action parameter, action at starts to execute in state st and changes to at+1
when in state st+1. All actions are selected from the action space, which at moment t can be defined as:
A(at) = [tset t] (3)
Fig. 2. Room temperature setting RL flow.
W. Li et al.

5
(3) Reward Rt: The thermal comfort and the energy-saving effect are defined as reward functions, wherein the thermal comfort
reward of the evaluation is the group thermal sensation derived from the wearable device monitoring data; the energy-saving
effect mainly considers the temperature set value, with a better energy-saving effect when the temperature set value is higher
under summer conditions. Therefore, the reward at moment t can be defined as:
rt+1 = f(GTSt+1, tset t) (4)
3.1.2. Reward function
In the process of using the RL model to solve the room temperature optimization problem, the reward function design will decide
the implementation effect of the policy. The room temperature set value RL control algorithm takes two indices into account: thermal
comfort and the energy-saving effect. Thermal comfort means that the room temperature needs to meet the thermal comfort needs of a
majority of occupants, which is reflected in thermal comfort as far as possible approaching 0 (neutral); the energy-saving effect means
that the room temperature set value under summer conditions should not be set too low but as high as possible based on meeting the
thermal comfort needs. The two indices will interact during the process of room temperature set value RL algorithm operation,
whereby improving the energy-saving effect means raising the room temperature set value, which will reduce the occupants’ thermal
comfort to some extent, but decrease the energy-saving effect when the occupants’ thermal comfort is completely set as the control
target. Hence, there need to be suitable rewards for thermal comfort and the energy-saving effect.
3.1.2.1. Thermal comfort reward. The thermal comfort reward is mainly the reward value brought by the occupants’ group thermal
sensation. While designing the thermal comfort reward, the occupants’ group thermal comfort should be between − 1 (slightly cool)
and 1 (slightly warm) as far as possible. The system will obtain a greater reward as the occupants’ group thermal comfort approaches 0;
the system will obtain a punishment value when the occupants’ group thermal comfort is greater than 1 or less than − 1. The formula
for calculating the thermal comfort reward is shown in Equation (5):
Rcomfort =
{
1 − |GTS|
− |GTS|
|GTS| ≤ 1
|GTS|＞1
(5)
Wherein Rcomfort is the thermal comfort reward and GTS is the occupants’ group thermal sensation. When the occupants’ GTS value is
0, the Rcomfort value obtained by the system is at its maximum of 1. The Rcomfort value becomes smaller and smaller when the occupants’
GTS is far away 0; and the Rcomfort value will be less than − 1 when the occupants’ GTS value is greater than 1 or less than − 1.
3.1.2.2. Energy-saving effect reward. The energy-saving effect reward is mainly the reward value brought by the room temperature set
value size. To make the output value of the RL model more reasonable and reliable, and reduce the controller learning size, the rule of
the change temperature set value is added when designing the energy-saving effect reward. GTS ≥ 0 indicates that the occupants are in
a relatively hot state, so the room temperature set value should be adjusted downward. As far as possible, the adjusted room tem
perature set value should approach the current room temperature since the lower the room temperature set value, the higher the
energy consumption of the air-conditioning system; that is, the system obtains a larger reward when the adjusted room temperature set
value is closer to the current room temperature, and vice versa. GTS < 0 indicates that the occupants are in a relatively cold state, so
increasing the room temperature set value. As far as possible, the adjusted room temperature set value should approach the upper limit
value of the room temperature set value. As a result, as the room temperature set value approaches the upper limit value, the system
receives a larger reward, and vice versa. The formula for calculating the energy-saving effect reward is shown in equations (6) and (7):
if GTS ≥ 0, Renergy 1 =
{
1 − |Tset − Tr|
− |Tset − Tr|
|Tset − Tr| ≤ 1
|Tset − Tr|＞1
(6)
if GTS＜0, Renergy 2 =
{
1 − |Tset − Tsu|
− |Tset − Tsu|
|Tset − Tsu| ≤ 1
|Tset − Tsu|＞1
(7)
Wherein Renergy_1 is the energy-saving effect reward when GTS ≥ 0; Renergy_2 is the energy-saving effect reward when GTS < 0; Tset is the
next moment room temperature set value; Tr is the current moment room temperature; and Tsu is the upper limit value of the room
temperature set value.
Based on the above analysis, the sum of the thermal comfort reward and energy-saving effect reward are determined as the total
reward of the room temperature preference setting RL control algorithm. The specific calculation formula is shown in Equation (8):
Rtotal =
{
Rcomfort + βRenergy 1
Rcomfort + βRenergy 2
if GTS ≥ 0
if GTS＜0
(8)
Wherein Rtotal is the total reward; β is the proportion of the energy-saving effect reward to the total reward, β∈[0,1]. From Equation
(8), we can see that we only consider the thermal comfort reward when β is equal to 0; when β is equal to 1, the thermal comfort reward
and energy-saving effect reward have the same weight.
3.1.3. Q-learning algorithm
In the RL process, it is necessary to find the optimal policy to obtain the maximum reward, and when the agent obtains accu
mulative rewards from moment t, it can be expressed as:
W. Li et al.

6
Rt = rt+1 + γrt+2 + · · · =
∑∞
k=0
γk
rt+k+1 (9)
Wherein γ is the discount factor γ∈[0,1], which describes the importance of future rewards to the current reward, and a smaller γ value
means a smaller future reward weight. The state value function is the expectation of future accumulative rewards, and the accumu
lative rewards obtained in state s can be expressed as:
v(s) = E[rt+1 + γv(st+1)|st = s] (10)
In the process of seeking out the optimal policy, not only is the value of the current state considered but action is taken to obtain the
value under the current state, and then the action that has the maximum value is selected. Thus, the action value function is defined as:
q∗
(s, a) = max q(s, a) = E[rt+1 + γq max(st+1, at+1)|st = s, at = a] (11)
The Q-learning algorithm calculates the optimal valuation of an action value function by the value iteration method, then calculates
the error between the optimal valuation and the old valuation, and uses the error to update the Q-value by a similar gradient update
method. The calculation formula is shown in Equation (12):
q(st, at) ← q(st, at) + α[rt+1 + γ max q(st+1, at+1) − q(st, at)] (12)
In the calculation process of the Q-learning algorithm, a limited sample can be utilized to calculate the gradient and then the
gradient can be used to update all Q-values and obtain the optimal policy by continuous iteration of the sample. The ε-greedy policy is
selected as the policy in this paper; that is, selects a random policy by ε probability, then selects the greedy policy by 1-ε, where the ε
value should be smaller. After the strategy training is completed, the agent can quickly find the action corresponding to the maximum
Q-value reward in a certain state of the environment. The specific flow is shown in Table 1.
3.2. Model learning and operation
Based on the constructed room temperature preference setting RL model in the previous section, we first need to train the algorithm
to obtain the optimal policy. The flow of the room temperature preference setting RL algorithm is shown in Fig. 3.
The room temperature preference setting RL algorithm learning flow can be divided into the following four steps.
(1) Initializing the Q-table, setting the learning rate α, reward discount factor γ, exploration of probability ε and maximum iter
ations, etc.
(2) Obtaining the current room temperature and the occupants’ group thermal sensation, determining the room temperature set
value relying on the ε-greedy policy, and enabling the system of air-conditioning to implement the room temperature set value
to the next moment;
(3) Monitoring the current room temperature and the occupants’ group thermal sensation, calculating the reward obtained, and
updating the Q-table;
(4) If the maximum number of iterations has been reached, the learning is complete and the optimal policy is obtained; if the
maximum number of iterations has not been reached, the learning is not complete and steps (2) and (3) should be repeated.
The algorithm can be applied to the actual air-conditioning control system after obtaining the optimal policy of room temperature
preference RL, the operation flow of which is shown in Fig. 4. In the operation process of the room temperature preference setting RL,
according to the current room temperature and group thermal sensation, the control system selects the room temperature set value that
corresponds to the maximum Q-value, and the reward obtained when the system operates to the next state will further update the Q-
table. With the increase of learning times, the control system can achieve effective room temperature and energy-saving regulation for
different group thermal sensation states.
3.3. Experimental data-based RL algorithm training
To evaluate the performance of the room temperature preference setting RL algorithm, a number of experiments were performed by
using temperature set value-based control and TSP-based control under summer conditions, and the experimental data thus obtained
was used to train and analyze the ITPSCM.
3.3.1. Experimental process
The experiments were held in Office Rooms No. 1 to No. 3 at the Dalian University of Technology, China, which were equipped with
a variable air volume system. Fig. 5 shows that the testbed has a construction area of about 160 m2
, including one equipment room and
three office rooms. Meanwhile, the field photo of the experiments and the heat transfer coefficient of the building envelopes are
Table 1
Pseudo-code of Q-Learning algorithm.
1. Initialize q (s, a);
2. Based on the current state st, select action at according to a certain policy;
3. After performing the action and obtaining reward rt+1, the state transfers to st+1;
4. Update q: q(st,at)←q(st,at) + α[rt+1 + γ max q(st+1,at+1) − q(st,at)];
5. Repeat steps 2 to 4 until the end condition is satisfied
W. Li et al.

7
included in Fig. 5, and the basic information of office rooms are shown in Table 2.
The experiments were performed for 6 days in August 2020, in which 31 healthy subjects participated, and their basic information
on the subjects was listed in Table 3. For each subject, they took part in the experiments with stable physiological conditions and
without any disease such as a cold or a fever. Meanwhile, they had good sleep and diet before the experiments. The number of days and
the weather conditions under the two different control methods are similar. The subjects are divided into two groups of experiments
performed, the first group with 15 subjects performed experiments over 4 days, and the second group with 16 subjects performed
experiments over 2 days.
Fig. 6 shows the time schedule of experiments. On each day, the experiment time was from 8:30 to 17:00, with 2 h to breaks. The
subjects were required to perform regular office activities in the office room during the experiments, such as working and walking
around. The subjects wore smart wristbands and submitted their satisfaction votes every 30 min on the smartphone app [41]. The
smartphone APP could collect the real-time thermal feedback of indoor people and upload it to the air conditioning control system. The
personnel could control the air conditioning system by inputting the temperature set point through the smartphone APP interface. The
smartphone app room settings interface is shown in Fig. 7.
The temperature setting period was set at 30 min, meaning that the optimal room temperature set value was updated by the control
system every 30 min. The initial room temperature set value was set at 25 ◦
C for the two control methods with only the temperature set
value changing and the other control operation strategies had the same settings for the system of air-conditioning. The variable air
Fig. 3. Learning flow chart of room temperature setting RL algorithm.
W. Li et al.

8
volume box controlled the supply airflow by utilizing the Proportional-Integral-Derivative algorithm according to the actual value and
set value to control the room temperature. The parameters except indoor temperature need to be controlled to a certain range to reduce
their influence on the thermal comfort of the personnel, including the relative humidity and CO2 concentration. During the experi
ments, the subjects were requested to wear a short-sleeve T-shirt without a jacket and trousers, so that the clothing insulation value was
Fig. 4. Operation flow chart of room temperature setting RL algorithm.
Fig. 5. Layout of the testbed and building information.
W. Li et al.

9
nearly 0.57 clo. The CO2 concentration was kept under 800 ppm, air velocity was controlled at less than 0.2 m/s and relative humidity
was controlled between 40% and 60% during the experiment according to ARHRAE Standard 62.1 [42]. Appropriate sensors were used
to monitor the environmental parameters every minute that equipment information is summarized in Table 4.
Table 2
Basic information of office rooms.
Building structure Area (m2
) Window-wall ratio Number of office staff Wall Thickness (mm)
No.1 office room Steel structure 33.0 0.11 6 180
No.2 office room Concrete construction 39.6 0.19 4 320
No.3 office room Steel structure 50.6 0.25 6 320
Table 3
Basic information of subjects.
Subjects Male Female Total
Number 20 11 31
Age 24.65 ± 1.82 24.27 ± 1.28 24.52 ± 1.66
Body Mass Index (BMI) 22.63 ± 3.08 19.62 ± 2.16 21.55 ± 3.14
Clothing Insulation (clo) 0.57 0.57 0.57
Fig. 6. Schedule of experiments.
Fig. 7. Interface of room temperature setting and satisfaction evaluation.
W. Li et al.

10
3.3.2. Model training
By collecting the experimental data, the variation range of room temperature Tr was determined to be 24–30 ◦
C, and the upper limit
of temperature set value Tsu was set at 30 ◦
C in the RL process. The variation range of the group thermal sensation was − 3 to 3, since we
did not obtain the group thermal sensation under all room temperatures during the experiment. In order to guarantee the completeness
of the RL training algorithm, the linear interpolation principle was used to supplement the data to obtain the occupants’ group thermal
sensation under different room temperature states. To reduce the frequency of controller learning, the following rule of room tem
perature setting was introduced: the room temperature decreased when the occupants’ group thermal sensation was greater than 0,
and increased when the occupants’ group thermal sensation was less than 0.
To ensure the stability of the RL algorithm, the learning rate α was set at 0.1. During the room temperature control process, reward
factor γ was set at 0.1 because the current reward obtained was mainly considered rather than the future reward. In addition, explore
probability ε was set at 0.1 and maximum iterations were set at 1000. During the experiment, the initial room temperature for the three
office rooms was set at 28 ◦
C. The main reason is that the initial room temperature for the three rooms approached 28 ◦
C under the
temperature set value-based control and the TSP-based control systems.
4. Results and discussion
During the experiment, the room and outdoor parameters and system energy consumption were monitored, and 1846 satisfaction
votes were collected from the smartphone app. The experimental data was then used to train a room temperature preference setting RL
model. To evaluate the performance of the ITPSCM for air-conditioning systems, the room temperature control performance, personal
thermal comfort, and system energy consumption were compared and analyzed.
4.1. Model training results
In order to analyze the output results of the optimal room temperature set value under different reward conditions, the β value was
set in Equation (8). The first condition was that only the thermal comfort reward was considered (β = 0); that is, the occupants’ group
thermal sensation was met, which is the target of the room temperature setting, while energy consumption was not considered. The
second condition was that the weight of thermal comfort and energy-saving effect was the same (β = 1); that is, the room temperature
set value was assigned as far as possible to save the energy consumption of the air-conditioning system based on meeting the thermal
comfort of the occupants.
Fig. 8 shows the end state information for the room temperature set value (Tset) and group thermal comfort based on the RL system.
When only considering the thermal comfort reward for the RL system (β = 0), the room temperature set value was set at the position
where the group thermal comfort was closest to 0, which means that the occupants obtained maximum thermal comfort. When
considering both the thermal comfort reward and energy-saving effect reward (β = 1), the room temperature set value in all office
rooms increased, with an average increase of 0.3 ◦
C. Meanwhile, the thermal sensation in Office Rooms No. 1 and No. 2 increased
slightly, and the thermal sensation in Office Room No. 3 improved, but the distance from the optimal thermal sensation (0) remained
the same. The variation in group thermal sensation was within an acceptable range, indicating that the occupants’ thermal comfort was
Table 4
Information of all equipment.
Parameter Model Range Resolution Accuracy
Indoor air temperature TS-FTD04 0–50 ◦
C 0.1 ◦
C ±0.3 ◦
C
Indoor air relative humidity TS-FTD04 0–100% 0.1% ±3%
Outdoor air temperature Weather Station PTS-3 − 40~+80 ◦
C 0.1 ◦
C ±0.4 ◦
C
Outdoor air relative humidity TS-FTD04 0–100% 0.1% ±3%
CO2 concentration C7232A5810 0–2000 ppm 1.5% ±30 ppm
Fig. 8. End state information of RL algorithm.
W. Li et al.

11
not widely reduced after considering the energy-saving effect reward. Therefore, the ITPSCM could achieve the effective and energy-
saving operation of the air-conditioning system while ensuring the occupants’ thermal comfort.
4.2. Experiment and simulation comparison
The ITPSCM was trained on the experiment data, and the optimal room temperature set value was calculated by the model after
training. The calculated results were then compared to the experiment of the temperature set value-based control and TSP-based
control systems. The control performance and energy consumption were compared on two typical sunny days with similar outdoor
air temperatures. The average outdoor air temperature under TSP-based control and temperature set value-based systems were 27.3 ◦
C
and 27.6 ◦
C respectively. Fig. 9 shows the variations of the outdoor air temperature for the two different control strategies on a typical
day.
4.2.1. Control performance
Fig. 10 shows the variations of the room temperature set value (Tset) for the three office rooms under the temperature set value-
based and ITPSCM. Under temperature set value-based control, the average room temperature set value in Office Room No. 1 was
26.2 ◦
C; that in Office Room No. 2 was 25.8 ◦
C; and that in Office Room No. 3 was 25.2 ◦
C. The average temperature set value in the
three office rooms was 25.7 ◦
C.
For the three offices, the temperature set value under the two different reward functions of the ITPSCM all exceeded the room
temperature set value under temperature set value-based control. Office Room NO.3 received more heat gain from outside at the same
time due to the higher exterior wall area than Offices Room NO.1 and NO.2. In such a situation, the original static temperature set
points no longer met the thermal needs of personnel. Thus, the user would set a lower temperature set point for rapid indoor cooling.
However, the personnel often neglect to adjust the temperature to a suitable range when they entered the working state, which
impacted both the thermal comfort of the occupants and the energy consumption of the air-conditioning system.
Comparing temperature set value-based control, when ITPSCM only considered the thermal comfort reward (β = 0), the room
temperature set value in Office Room No. 1 increased by an average of 1.4 ◦
C, that in Office Room No. 2 increased by an average of
1.7 ◦
C and that in the Office Room No. 3 increased by an average of 1.7 ◦
C. The average room temperature set value of the three office
rooms increased by 1.6 ◦
C. When ITPSCM considered both the thermal comfort reward and energy-saving effect reward (β = 1), the
room temperature set value in Office Room No. 1 increased by an average of 1.8 ◦
C, that in Office Room No. 2 increased by an average
of 1.9 ◦
C and that in Office Room No. 3 increased by an average of 2 ◦
C. The room temperature set value of three office rooms increased
by an average of 1.9 ◦
C.
Fig. 11 shows room temperature set value variations under TSP-based control and ITPSCM. Under TSP-based control, the average
room temperature set value in Office Room No. 1 was 27.4 ◦
C; that in Office Room No. 2 was 27.5 ◦
C; and that in Office Room No. 3 was
26.5 ◦
C. The average room temperature set value in the three office rooms was 27.1 ◦
C. It can be seen that the room temperature set
value was multiply adjusted by the system relying on the indoor occupants’ thermal sensation, then reached the stable room tem
perature set value. The number of adjustments to the room temperature set value in the afternoon was less than that in the morning,
indicating that the TSP-based control system found the optimal room temperature set value after regulation in the morning hours, then
stabilized.
Under the ITPSCM, the control system calculated the optimal temperature set value that met the thermal comfort of personnel and
minimum energy consumption according to the thermal comfort information of the personnel and the energy-saving effect. Since the
personnel were well-adapted to the environment’s temperature, the optimal temperature set value was used to control the indoor
environment, which reduced the number of adjustments compared to TSP-based control; that is, the ITPSCM skipped the process of
finding the optimal room temperature set value from the initial room temperature set value, which reduced energy consumption by
adjusting the equipment power, thereby guaranteeing the system’s stability to a certain degree and achieving better control
performance.
When the ITPSCM only considered the thermal comfort reward (β = 0), the room temperature set value in Office Room No. 1
increased by an average of 0.2 ◦
C, that in Office Room No. 2 was the same and that in Office Room No. 3 increased by an average of
Fig. 9. The variations of outdoor air temperature.
W. Li et al.

12
0.4 ◦
C. The average room temperature set value of the three office rooms increased by 0.2 ◦
C. When ITPSCM considered both the
thermal comfort reward and energy-saving effect reward (β = 1), the room temperature set value in Office Room No. 1 increased by an
average of 0.6 ◦
C, that in Office Room No. 2 increased by an average of 0.2 ◦
C and that in Office Room No. 3 increased by an average of
0.7 ◦
C. The room temperature set value of three office rooms increased by an average of 0.5 ◦
C. Under the two rewards of the ITPSCM,
the average room temperature set value improved only slightly. The main reason is that the TSP-based control method introduced
information about personal thermal sensation into the control process, which made the final room temperature set value of the TSP-
based control method close to the room temperature set value under the ITPSCM only consider the thermal comfort reward. This
indicates that the room temperature set value calculated by the ITPSCM was the optimal temperature set value that met personal
thermal comfort. However, TSP-based control did not consider the energy-saving effect, whereas the ITPSCM can further exploit the
energy-saving effect of the system based on meeting the thermal comfort requirements of personnel when the energy-saving effect
reward is considered.
4.2.2. Personal thermal comfort analysis
Fig. 12 shows group thermal sensation variations under the TSP-based control method and ITPSCM. Under the ITPSCM the optimal
room temperature set value was calculated according to the thermal sensation of the indoor personnel. The occupants’ group thermal
sensation in the environment was between − 0.5 and 0.5, indicating that they were working in a comfortable environment.
Under the TSP-based control method, the indoor personal group thermal sensation had multiple adjustments because the initial
room temperature set value could not meet the requirements of personal thermal comfort. The air-conditioning system adjusted the
room temperature set value according to the real-time personal thermal sensation information to build the indoor environment that
best met the personal demand. The personal group thermal sensation in the three office rooms experienced larger changes in the
Fig. 10. Room temperature set value variations under temperature set value-based control and the ITPSCM.
W. Li et al.

13
morning and tended to be stable in the afternoon, while the group thermal sensation value tended to fluctuate under the ITPSCM. The
indoor thermal sensation when the ITPSCM considered both the thermal comfort reward and energy-saving effect reward (β = 1) was
higher than when the ITPSCM only considered the thermal comfort reward (β = 0), with all values above 0. This indicates that the
energy-saving potential of the air-conditioning system can be further explored by the appropriate improvement of personal thermal
sensation in the range of thermal comfort when considering the energy-saving effect reward in the strategy.
According to the above analysis, the indoor personal group thermal sensation can be stable between − 0.5 and 0.5 under the TSP-
based control method, but it needs to be adjusted to the environment for a long time according to personal thermal sensation. Since
humans are good at adapting to environmental temperature, the system of air-conditioning regulated the office environment at the
optimal temperature set value calculated by the ITPSCM according to the personal thermal sensation and energy-saving effect, creating
a long-term comfortable indoor environment and reducing the influence of thermal sensation changes on the occupants.
4.2.3. Energy consumption comparison
Based on the data of the room temperature set value and system energy consumption under the TSP-based control method and
ITPSCM, a linear relationship equation was adopted to fit the amount of room temperature variation and energy consumption of the
air-conditioning system. This was then used to estimate the energy consumption of the air-conditioning system under the ITPSCM. The
fitting results are as follows:
ΔE = 0.146 + 9.30Δt (13)
Wherein Δt is the increment of the room temperature set value (◦
C), ΔE is the reduction of air-conditioning system consumption (kW),
R2
= 0.97 and RMSE = 1.14. This indicates that the linear equation can effectively predict the energy consumption amount of the air-
conditioning system under room temperature set value variation.
Fig. 11. Room temperature set value variations under the TSP-based control and the ITPSCM.
W. Li et al.

14
According to the experimental data, the daily energy consumption of the air-conditioning system was 67.29 kW when the average
temperature set value was 25.7 ◦
C under the temperature set value-based control strategy. According to Equation (13), the energy
consumption of the air-conditioning system will be reduced by 15.03 kW when the room temperature set value is improved by 1.6 ◦
C.
When ITPSCM considers both the thermal comfort reward and energy-saving effect reward, the room temperature set value will
improve by 1.9 ◦
C and energy consumption will be reduced by 17.82 kW. Fig. 13 shows the daily system electrical power also called
energy consumption for different control methods during manuscript recording. Comparing temperature set value-based control,
22.34% of the daily energy consumption can be saved when the ITPSCM only considers the thermal comfort reward (β = 0); and
26.48% can be saved when the ITPSCM considers both the thermal comfort reward and energy-saving effect reward (β = 1), with the
energy-saving effect improved by 4.14%. Comparing TSP-based control, 2.84% of the daily energy consumption can be saved when the
ITPSCM only considers the thermal comfort reward (β = 0); and 8.01% can be saved when the ITPSCM considers both the thermal
comfort reward and energy-saving effect reward (β = 1), with the energy-saving effect improved by 5.17%.
Comparing temperature set value-based control, the energy-saving effect was slightly improved when ITPSCM is compared with
TSP-based control. This is because that thermal sensation was introduced into the operation process of the air-conditioning system
under both TSP-based control and ITPSCM. However, the initial room temperature set value that meets the personal thermal pref
erence can be used to directly regulate the indoor environment under the ITPSCM, which is higher than the temperature set value
under the other control methods. The operation time of the system under the low temperature set value will be reduced, and the
number of adjustments to the temperature set value will also be reduced, preventing energy consumption and device breakdown
caused by multiple adjustments of device power, and thus reducing the operating energy consumption of the air-conditioning system
and achieving a better energy-saving effect.
To gain an intuitive exhibition of the experimental results under the three different control methods, environmental parameters,
Fig. 12. Group thermal sensation variations for different control strategies.
W. Li et al.

15
average thermal sensation and daily energy consumption are summarized in Table 5. The results show that compared with the TSP-
based control method and the temperature set value-based control method, the ITPSCM has more stable control performance and a
better energy-saving effect based on effectively meeting indoor personal thermal comfort, giving it better practical application
prospects.
5. Conclusion
This paper analyzes the process of room temperature setting under both personnel and the air-conditioning system, then proposes
an ITPSCM to solve the qualitative comprehensive evaluation problem of thermal comfort and the energy consumption of the air-
conditioning system. The proposed method not only considers the occupants’ preference of room temperature set value but also
the influence of the room temperature set value on the energy consumption of the air-conditioning system. Compared with the control
method based on temperature set points, the control method can achieve an indoor thermal environment that meets personnel’s
thermal comfort preferences at the early stage of air conditioning system operation, reduce the impact of thermal sensation continuous
changes on users and improve the control performance and energy use efficiency of the air conditioning system. In the ITPSCM, the
room temperature setting preference of the occupants and the energy-saving effect of the air-conditioning system are set as rewards;
the room temperature and occupants’ group thermal sensation are set as states; and the room temperature set value is set as an action.
The influence of the ITPSCM under different weights for the thermal comfort reward and energy-saving effect reward is then
compared.
In order to research the effectiveness of the ITPSCM, experiments are performed on temperature set value-based and TSP-based
control, and the experimental data obtained was used to train and analyze an RL room temperature preference setting model. The
room temperature set value for meeting personal thermal comfort while maximizing the energy-saving effect was obtained by training
the RL model, and the temperature set value was used to regulate the indoor environment in the early stages of air-conditioning system
operation.
In summary, the main conclusions include.
1) Compared with temperature set value-based control and TSP-based control, the ITPSCM can skip the process of finding the optimal
temperature set value during the working time. The room temperature set value that meets both the occupants’ thermal comfort
preference and minimum energy consumption can regulate the indoor environment during air-conditioning system operation,
Fig. 13. Air-conditioning system daily power consumption under different control methods.
Table 5
Environmental parameters, thermal sensation and daily energy consumption under different control methods.
Parameter Temperature set value-based TSP-based ITPSCM (β = 0) ITPSCM (β = 1)
Average outdoor air temperature (◦
C) 27.6 27.3 – –
Average outdoor relative humidity (%) 73.7 85.7 – –
Average room temperature set value (◦
C) No.1 office 26.2 27.4 27.6 28
No.2 office 25.8 27.5 27.5 27.7
No.3 office 25.2 26.5 26.9 27.2
Average thermal sensation No.1 office − 0.36 − 0.14 0.1 0.3
No.2 office − 0.33 − 0.25 0.1 0.2
No.3 office − 0.20 − 0.05 − 0.1 0.1
Daily energy consumption (kWh) 67.29 53.79 52.26 49.47
(Eset value-EITPSCM)/Eset value – – 22.34% 26.48%
(Eprediction-EITPSCM)/Eprediction – – 2.84% 8.01%
W. Li et al.

16
reduce energy consumption and device wastage caused by multiple adjustments to the room temperature set value, guarantee the
stability of the air-conditioning system and gain better control performance.
2) Temperature set value-based and TSP-based control can basically meet the occupants’ thermal comfort demands, but there are still
cases where personnel feel relatively cold or hot. The ITPSCM can obtain the preference setting features of the occupants online,
regulate the indoor environment using the initial temperature set value to meet the occupants’ thermal comfort demands, and
avoid influence on the occupants due to continuous changes in thermal sensation caused by multiple adjustments to the room
temperature set value.
3) The room temperature set value under the ITPSCM is greater than those under temperature set value-based and TSP-based control.
When the ITPSCM only considers the thermal comfort reward (β = 0), 22.4% of the daily energy consumption of the air-
conditioning system can be saved when compared with the temperature set value-based control, and 2.84% can be saved when
compared with TSP-based control. When the ITPSCM considers both the thermal sensation reward and energy-saving effect reward
(β = 1), 26.48% of the daily energy consumption of the air-conditioning system can be saved when compared with the temperature
set value-based control; and 8.01% can be saved when compared with TSP-based control.
The limitation of this study is that the linear relationship equation is used to estimate the energy consumption of the system by
fitting the relationship between the room temperature set value and the historical energy consumption of the air-conditioning system,
which can quantify the energy consumption of the system but differs from the actual energy consumption. Due to the experiment was
conducted during the COVID-19 epidemic, the experiment period is relatively short, which was mainly limited by the number of
volunteers and policies during the COVID-19 epidemic. Therefore, we will conduct the experiment over an extended period and carry
out extended period simulation in our future work. An integrity experiment with the ITPSCM will be performed in an environment
testbed to verify the effectiveness of this method. When the indoor temperature preference setting control method is applied in the air-
conditioning system of the commercial building and the residential building, the thermal comfort of personnel and the energy-saving
effect of the air-conditioning system will be improved, which will meet the demand of personnel optimal thermal comfort by minimum
energy consumption and contribute to decreasing the energy consumption of the building.
Author statement
Wei Li: Methodology, Funding acquisition, Writing-original draft, Writing-review & editing.
Yifan Zhao: Methodology, Writing-review & editing.
Jili Zhang: Formal analysis, Funding acquisition, Writing-review.
Changwei Jiang: Formal analysis, Writing-review.
Siyu Chen: Data curation, Investigation, Formal analysis, Writing-review.
Liangxi Lin: Data curation, Investigation, Writing-review.
Yuegui Wang: Data curation, Investigation, Writing-review.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.
Data availability
Data will be made available on request.
Acknowledgments
This work was supported by the Opening Fund of State Key Laboratory of Green Building in Western China (LSKF202326), the
Scientific Research Fund of Hunan Provincial Education Department (NO. 22C0156), and the National Natural Science Foundation of
China (Grant No. 51978120).
References
[1] N.E. Klepeis, W.C. Nelson, W.R. Ott, et al., The National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants,
J. Expo. Anal. Environ. Epidemiol. 11 (3) (2001) 231–252, https://doi.org/10.1038/sj.jea.7500165.
[2] A. Costa, M.M. Keane, J.I. Torrens, et al., Building operation and energy performance: monitoring, analysis and optimisation toolkit, Appl. Energy 101 (2013)
310–316, https://doi.org/10.1016/j.apenergy.2011.10.037.
[3] A. Allouhi, Y. El Fouih, T. Kousksou, et al., Energy consumption and efficiency in buildings: current status and future trends, J. Clean. Prod. 109 (2015)
118–130, https://doi.org/10.1016/j.jclepro.2015.05.139.
[4] L. Xiong, Y. Yao, Study on an adaptive thermal comfort model with K-nearest-neighbors (KNN) algorithm, Build. Environ. 202 (2021), 108026, https://doi.org/
10.1016/j.buildenv.2021.108026.
[5] F. Jazizadeh, W. Jung, Personalized thermal comfort inference using RGB video images for distributed HVAC control, Appl. Energy 220 (2018) 829–841,
https://doi.org/10.1016/j.apenergy.2018.02.049.
[6] Y.R. Yoon, H.J. Moon, Performance based thermal comfort control (PTCC) using deep reinforcement learning for space cooling, Energy Build. 203 (2019),
109420, https://doi.org/10.1016/j.enbuild.2019.109420.
[7] T. Chaudhuri, D. Zhai, Y.C. Soh, et al., Random forest based thermal comfort prediction from gender-specific physiological parameters using wearable sensing
technology, Energy Build. 166 (2018) 391–406, https://doi.org/10.1016/j.enbuild.2018.02.035.
W. Li et al.

17
[8] T. Chaudhuri, D. Zhai, Y.C. Soh, et al., Thermal comfort prediction using normalized skin temperature in a uniform built environment, Energy Build. 159 (2018)
426–440, https://doi.org/10.1016/j.enbuild.2017.10.098.
[9] Z. Fang, H. Liu, B. Li, et al., Experimental investigation on thermal comfort model between local thermal sensation and overall thermal sensation, Energy Build.
158 (2018) 1286–1295, https://doi.org/10.1016/j.enbuild.2017.10.099.
[10] B. Salehi, A.H. Ghanbaran, M. Maerefat, Intelligent models to predict the indoor thermal sensation and thermal demand in steady state based on occupants’ skin
temperature, Build. Environ. 169 (2020), 106579, https://doi.org/10.1016/j.buildenv.2019.106579.
[11] C. Shan, J. Hu, J. Wu, et al., Towards non-intrusive and high accuracy prediction of personal thermal comfort using a few sensitive physiological parameters,
Energy Build. 207 (2020), 109594, https://doi.org/10.1016/j.enbuild.2019.109594.
[12] W. Duan, Y. Wang, J. Li, et al., Real-time surveillance-video-based personalized thermal comfort recognition, Energy Build. 244 (2021), 110989, https://doi.
org/10.1016/j.enbuild.2021.110989.
[13] A. Ghahramani, G. Castro, B. Becerik-Gerber, et al., Infrared thermography of human face for monitoring thermoregulation performance and estimating
personal thermal comfort, Build. Environ. 109 (2016) 1–11, https://doi.org/10.1016/j.buildenv.2016.09.005.
[14] A. Ghahramani, G. Castro, S.A. Karvigh, et al., Towards unsupervised learning of thermal comfort using infrared thermography, Appl. Energy 211 (2018) 41–49,
https://doi.org/10.1016/j.apenergy.2017.11.021.
[15] D. Li, C.C. Menassa, V.R. Kamat, Non-intrusive interpretation of human thermal comfort through analysis of facial infrared thermography, Energy Build. 176
(2018) 246–261, https://doi.org/10.1016/j.enbuild.2018.07.025.
[16] H. Metzmacher, D. Woelki, C. Schmidt, et al., Real-time human skin temperature analysis using thermal image recognition for thermal comfort assessment,
Energy Build. 158 (2018) 1063–1078, https://doi.org/10.1016/j.enbuild.2017.09.032.
[17] B. Yang, X. Li, Y. Hou, et al., Non-invasive (non-contact) measurements of human thermal physiology signals and thermal comfort/discomfort poses -A review,
Energy Build. 224 (2020), 110261, https://doi.org/10.1016/j.enbuild.2020.110261.
[18] S. Liu, S. Schiavon, H.P. Das, et al., Personal thermal comfort models with wearable sensors, Build. Environ. 162 (2019), 106281, https://doi.org/10.1016/j.
buildenv.2019.106281.
[19] M.H. Fakir, J.K. Kim, Prediction of individual thermal sensation from exhaled breath temperature using a smart face mask, Build. Environ. 207 (2022), 108507,
https://doi.org/10.1016/j.buildenv.2021.108507.
[20] J.-H. Choi, D. Yeom, Investigation of the relationships between thermal sensations of local body areas and the whole body in an indoor built environment,
[21] W. Li, J. Zhang, T. Zhao, Indoor thermal environment optimal control for thermal comfort and energy saving based on online monitoring of thermal sensation,
[22] Z. Deng, Q. Chen, Development and validation of a smart HVAC control system for multi-occupant offices by using occupants’ physiological signals from
wristband, Energy Build. 214 (2020), 109872, https://doi.org/10.1016/j.enbuild.2020.109872.
[23] X. Li, Z. Han, T. Zhao, et al., Online model for indoor temperature control based on building thermal process of air conditioning system, J. Build. Eng. 39 (2021),
102270, https://doi.org/10.1016/j.jobe.2021.102270.
[24] X. Li, Z. Han, T. Zhao, et al., Modeling for indoor temperature prediction based on time-delay and Elman neural network in air conditioning system, J. Build.
Eng. 33 (2021), 101854, https://doi.org/10.1016/j.jobe.2020.101854.
[25] G.T. Costanzo, S. Iacovella, F. Ruelens, et al., Experimental analysis of data-driven control for a building heating system, Sustainable Energy, Grids and
Networks 6 (2016) 81–90, https://doi.org/10.1016/j.segan.2016.02.002.
[26] L. Yu, W.W. Xie, D. Xie, et al., Deep reinforcement learning for smart home energy management, IEEE Internet Things J. 7 (4) (2020) 2751–2762, https://doi.
org/10.1109/jiot.2019.2957289.
[27] Y. Chen, L.K. Norford, H.W. Samuelson, et al., Optimal control of HVAC and window systems for natural ventilation through reinforcement learning, Energy
Build. 169 (2018) 195–205, https://doi.org/10.1016/j.enbuild.2018.03.051.
[28] M. Han, R. May, X. Zhang, et al., A novel reinforcement learning method for improving occupant comfort via window opening and closing, Sustain. Cities Soc.
61 (2020), 102247, https://doi.org/10.1016/j.scs.2020.102247.
[29] S. Nagarathinam, V. Menon, A. Vasan, et al., MARCO - Multi-Agent Reinforcement Learning Based COntrol of Building HVAC Systems, 11th ACM International
Conference on Future Energy Systems, E-Energy 2020, Association for Computing Machinery, Inc, Virtual, Australia, 2020, pp. 57–67, https://doi.org/10.1145/
3396851.3397694. June 22, 2020 - June 26, 2020.
[30] Z.B. Zou, X.R. Yu, S. Ergan, Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network, Build. Environ. 168
(2020) 15 106535, https://doi.org/10.1016/j.buildenv.2019.106535.
[31] W. Valladares, M. Galindo, J. Gutierrez, et al., Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning
algorithm, Build. Environ. 155 (2019) 105–117, https://doi.org/10.1016/j.buildenv.2019.03.038.
[32] G.Y. Gao, J. Li, Y.G. Wen, DeepComfort: energy-efficient thermal comfort control in buildings via reinforcement learning, IEEE Internet Things J. 7 (9) (2020)
8472–8484, https://doi.org/10.1109/jiot.2020.2992117.
[33] X. Yuan, Y. Pan, J. Yang, et al., Study on the application of reinforcement learning in the operation optimization of HVAC system, Build. Simulat. 14 (1) (2020)
75 75–87, https://doi.org/10.1007/s12273-020-0602-9.
[34] A. Gupta, Y. Badr, A. Negahban, et al., Energy-efficient heating control for smart buildings with deep reinforcement learning, J. Build. Eng. 34 (2021), 101739,
https://doi.org/10.1016/j.jobe.2020.101739.
[35] A.C. Roussac, J. Steinfeld, R. de Dear, A preliminary evaluation of two strategies for raising indoor air temperature setpoints in office buildings, Architect. Sci.
Rev. 54 (2) (2011) 148–156, https://doi.org/10.1080/00038628.2011.582390.
[36] S.B. Sadineni, R.F. Boehm, Measurements and simulations for peak electrical load reduction in cooling dominated climate, Energy 37 (1) (2012) 689 689–697,
https://doi.org/10.1016/j.energy.2011.10.026.
[37] D. Azuatalam, W.-L. Lee, F. de Nijs, et al., Reinforcement learning for whole-building HVAC control and demand response, Energy and AI 2 (2020), 100020,
https://doi.org/10.1016/j.egyai.2020.100020.
[38] R. Sutton, A. Barto, Reinforcement Learning: an Introduction[M], MIT Press, 1998.
[39] ASHRAE, Standard 55-2017. Thermal Environmental Conditions for Human Occupancy (ANSI Approved), American Society of Heating, Refrigerating, and
AirConditioning Engineers (ASHRAE), 2017.
[40] W. Li, J. Zhang, T. Zhao, et al., Experimental study of an indoor temperature fuzzy control method for thermal comfort and energy saving using wristband
device, Build. Environ. 187 (2021), 107432, https://doi.org/10.1016/j.buildenv.2020.107432.
[41] W. Li, S. Chen, J. Zhang, et al., Development and validation of mobile app and data management system for intelligent control of indoor thermal environment,
J. Build. Eng. 69 (2023), 106272, https://doi.org/10.1016/j.jobe.2023.106272.
[42] ASHRAE Standard 62.1. Ventilation for Acceptable Indoor Air Quality, American Society of Heating, Refrigerating, and Air conditioning, Inc, Atlanta, 2016.
W. Li et al.

RL.pdf

Recommended

Recommended

More Related Content

Similar to RL.pdf

Similar to RL.pdf (20)

Recently uploaded

Recently uploaded (20)

RL.pdf