Data-Driven (Reinforcement Learning-Based) Control

DATA DRIVEN (REINFORCEMENT LEARNING BASED)
CONTROL
INDUSTRIAL APPLICATIONS OF REINFORCEMENT LEARNING
DEBMALYA BISWAS
WIPRO AI

AGENDA
 Introduction
 Reinforcement Learning (RL) Fundamentals
 Industrial Control Systems
 Control Theory Limitations
 RL/Data-driven Control to the Rescue
 Case Study: RL based HVAC Energy Optimization

INTRODUCTION
Knowledge
Base
Synthesize
Response
(Natural
Language
Generation -
NLG)
Understand User
Intent
(Natural Language
Understanding -
NLU)
Prompt (Natural
Language Query - NLQ)
Return Response
S
E
C
U
R
I
T
Y
User feedback loop
(Reinforcement
Learning with Human
Feedback - RLHF)
E
X
P
L
A
I
N
Reinforcement
Learning in
ChatGPT
A significant amount of
manual effort has been
incorporated in the
form of user feedback,
to improve the accuracy
of ChatGPT
leveraging
Reinforcement
Learning*.”
*E. Ricciardelli, D. Biswas. Self-improving Chatbots based
on Reinforcement Learning. 4th Multidisciplinary
Conference on Reinforcement Learning and Decision
Making (RLDM), 2019.

BACKGROUND
Reinforcement Learning (RL) Basics
- RL refers to a branch of AI/ML, which are targeted towards
goal-oriented problems.
- RL algorithms are able to achieve complex goals by
maximizing a reward function over many steps, e.g. the
points won in a game over many steps.
- The reward function works similar to incentivizing a child
with candy and spankings, such that the algorithm is
penalized when it takes a wrong decision and rewarded
when it takes a right one – this is reinforcement.

RL FORMULATION
RL Policy and Rewards Functions
- Reward: refers to the feedback by which we measure the
success or failure of an agent’s recommended action.
- Policy: is the strategy that the agent employs to select
the next best action.
- The roles and responsibilities of the Reward function vs. RL
Agent policies are not very well defined, and can vary
between architectures. A naïve understanding would be
that given an associated reward / cost with every state-
action pair, the policy would always try to minimize the
overall cost. Apparently, it seems that sometimes
keeping the ecosystem in a stable state can be
more important than minimizing the cost (e.g. in a
climate control use-case).

RL IN RECOMMENDATION SYSTEMS
Recommendation Systems
- Recommenders: Given a user profile and
categorized content, the system makes a
recommendation based on popularity,
interests, demographics, frequency and other
features.
- The reinforcement aspect of RL allows it to
adapt faster to real-time changes in user
sentiment and profile, without need for
explicit (re-)training.
- Enterprise adoption also seems to be gaining
momentum with the availability of Cloud APIs
(e.g. Azure Personalizer) and Google’s RecSim.
Article recommendation based on Azure
Personalizer
*D. Biswas. Delayed Rewards in the Context of Reinforcement
Learning based Recommender Systems. AAI4H@ECAI 2020: 49-
53

RL FOR INDUSTRIAL CONTROL
RL is a good fit for Industrial Control Systems as it is able to learn and adapt
to multi-parameterized system dynamics in real-time, without requiring any
knowledge of the underlying system model.
Leading to widespread adoption of RL in Control systems, from controlling
combustion engines, to robotic arms cutting metals, to air conditioning systems in
buildings.
“We define Data Driven Control as simply Machine Learning (ML) techniques
applied to Control Systems.”
We deep dive into the underlying reasons / trends, starting with an understanding of
the limitations of Control Theory for Control Systems.

CONTROL THEORY
System & Controller
A Control System consists of a System & Controller:
- System to control
- Controller applies a control strategy to control the system
in an optimal fashion.
Any strategy that the Controller can apply is constrained by:
- its knowledge of the system state — in most cases,
provided by the System Sensors;
- and the system parameters that it can control — also
referred to as the System Actuators. E.g., an engine can
only drive a car within a certain speed range, at a certain
acceleration..

CONTROL THEORY - LIMITATIONS
Linear Equations
Designing a control strategy then consists of solving the equations
characterizing the system behavior — often modeled in the form of
linear equations.
Most of Control Theory is targeted towards solving linear
equations.
Unfortunately, real-world systems are (mostly) non-linear. E.g., even
the equation to capture the motion of a pendulum is non-linear.
There has been a lot of research on linearization methods, basically
techniques to convert non-linear equations to linear ones and then
trying to solve them using linear state space control theory.
Unfortunately, such linearization methods are very
limited to specific classes of non-linear equations and
cannot be generalized easily.

CONTROL THEORY – LIMITATIONS (2)
Model Driven Control
A model of the system and its corresponding
equations are required.
This is the reason that traditional control strategies,
also referred to as Model Driven Control, still
exclude a lot of systems that we do not know how to
model (whose system equations are not known).
And, the complexity of such systems is only increasing
day by day, where we want to solve hyper-scale
problems, e.g., climate control, disease control,
automated vehicles, financial markets, etc.

MACHINE LEARNING (ML) TO THE RESCUE
Data Driven Control
ML/Data based approaches show a lot of
promise in this context.
The underlying logic here is that even for a very high
dimensional system that we cannot model, there are
dominant patterns that characterize the system
behavior — and Machine Learning (Deep
Learning) is very good at learning these
patterns.
This would (most likely) be an approximation, and
while we still would not understand the system fully
— it is good enough for most real-life use-cases
(including predictions), barring some exceptional
scenarios.

MODEL BASED RL
Offline Training
RL allows further fine-tuning the
developed ML Model.
In Model based RL, it is possible to develop a
model of the problem scenario, and bootstrap
initial RL training based on the model
simulation values.
For complex scenarios (e.g. games, robotic tasks),
where it is not possible to build a model of the
problem scenario, it might still be possible to
bootstrap an RL model based on historical values –
referred to as Offline Training.
Structured
Raw /
Staging
(Bronze)
Cleansed /
Standardize
d (Silver)
Transformed
/ Modeled
(Gold)
Unstructure
d
BI / Reporting
AI/ML
Feature
extraction
Training
dataset
Test
dataset
Model
Training
Exploratory
Data
Analysis
Model
Serving
(Inference)
Model
Monitoring
DataOps DQ/Validation Filtering
Historization Aggregation
RL based Model
Improvement
DQ/Cleaning Encoding
Selection Normalization
ML Outputs
(Inferences,
Predictions )
*D. Biswas. MLOps for Compositional AI. NeurIPS Workshop on
Challenges in Deploying and Monitoring Machine Learning Systems
(DMML), 2022.

CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION
HVAC Optimization
The primary goal of the the HVAC (Heating, Ventilation and Air
Conditioning) units is to keep the temperature and
(relative) humidity within the prescribed manufacturing
tolerance ranges.
By controlling 4 Output valves: Cooling, Heating, Re-
heating and Humidifier. This needs to be balanced with
energy savings and CO2 emission reductions to offset
the environmental impact of running them.
This is a complex problem as it requires computing an
optimal state taking into account multiple variable factors,
e.g. the occupancy in a building zone, temperature
requirements of operating machines, air flow dynamics within
the building, external weather conditions, etc.
D. Biswas. Reinforcement Learning based Energy Optimization in
Factories. In proceedings of the 11th ACM e-Energy Conference, Jun
2020.

CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (2)
HVAC -RL Formulation
At any point in time, a factory zone is in a state
characterized by the temperature and (relative)
humidity values observed inside and outside the
zone.
The game environment in this case corresponds
to the temperature and humidity tolerance levels,
which basically mandate that the zone temperature
and humidity values should be within the
range: 19–25 degrees and 45–55%.
The set of available actions in this case are the
Cooling, Heating, Re-heating and Humidifier valve
opening percentages (%).

The Reward Function assigns a reward to each action based on the following three parameters:
A control strategy is to decide on the weightage of the three parameters: Setpoint Closeness (SC), Energy Cost
(EC), Tolerance Violation (TV). The Energy Cost is captured in terms of electricity consumption and CO2 emission.
Setpoint Closeness encourages a "business friendly" policy where the RL model attempts to keep the zone temperature
as close as possible to the temperature / humidity setpoints, implicitly reducing the risk of violations, but at a higher
Energy Cost.
We opt for a "balanced” control policy which maximizes Energy Savings and Setpoint Closeness, while
minimizing the risk of Tolerance Violations.

Optimization Results
Within a 6-month pilot, we were able to develop
and operationalize a RL based HVAC controller
that is able to learn and adapt to real-life factory
settings, without the need for any offline training.
It showcases the successful transition of an
Industrial Control System run by a traditional
PID controller for the last 10+ years, to a more
efficient RL based controller.
Benchmarking results show the potential to
save up to 25% in energy efficiency (as
compared to when they were operated by
PID controllers).

Thank
You
&
Question
s
Contact: Debmalya Biswas
LinkedIn:
https://www.linkedin.com/in/debmalya-
biswas-3975261/
Medium:
https://medium.com/@debmalyabiswas

Data-Driven (Reinforcement Learning-Based) Control

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data-Driven (Reinforcement Learning-Based) Control

Similar to Data-Driven (Reinforcement Learning-Based) Control (20)

More from Debmalya Biswas

More from Debmalya Biswas (15)

Recently uploaded

Recently uploaded (20)

Data-Driven (Reinforcement Learning-Based) Control