SlideShare a Scribd company logo
DATA DRIVEN (REINFORCEMENT LEARNING BASED)
CONTROL
INDUSTRIAL APPLICATIONS OF REINFORCEMENT LEARNING
DEBMALYA BISWAS
WIPRO AI
AGENDA
 Introduction
 Reinforcement Learning (RL) Fundamentals
 Industrial Control Systems
 Control Theory Limitations
 RL/Data-driven Control to the Rescue
 Case Study: RL based HVAC Energy Optimization
INTRODUCTION
Knowledge
Base
Synthesize
Response
(Natural
Language
Generation -
NLG)
Understand User
Intent
(Natural Language
Understanding -
NLU)
Prompt (Natural
Language Query - NLQ)
Return Response
S
E
C
U
R
I
T
Y
User feedback loop
(Reinforcement
Learning with Human
Feedback - RLHF)
E
X
P
L
A
I
N
Reinforcement
Learning in
ChatGPT
A significant amount of
manual effort has been
incorporated in the
form of user feedback,
to improve the accuracy
of ChatGPT
leveraging
Reinforcement
Learning*.”
*E. Ricciardelli, D. Biswas. Self-improving Chatbots based
on Reinforcement Learning. 4th Multidisciplinary
Conference on Reinforcement Learning and Decision
Making (RLDM), 2019.
BACKGROUND
Reinforcement Learning (RL) Basics
- RL refers to a branch of AI/ML, which are targeted towards
goal-oriented problems.
- RL algorithms are able to achieve complex goals by
maximizing a reward function over many steps, e.g. the
points won in a game over many steps.
- The reward function works similar to incentivizing a child
with candy and spankings, such that the algorithm is
penalized when it takes a wrong decision and rewarded
when it takes a right one – this is reinforcement.
RL FORMULATION
RL Policy and Rewards Functions
- Reward: refers to the feedback by which we measure the
success or failure of an agent’s recommended action.
- Policy: is the strategy that the agent employs to select
the next best action.
- The roles and responsibilities of the Reward function vs. RL
Agent policies are not very well defined, and can vary
between architectures. A naïve understanding would be
that given an associated reward / cost with every state-
action pair, the policy would always try to minimize the
overall cost. Apparently, it seems that sometimes
keeping the ecosystem in a stable state can be
more important than minimizing the cost (e.g. in a
climate control use-case).
RL IN RECOMMENDATION SYSTEMS
Recommendation Systems
- Recommenders: Given a user profile and
categorized content, the system makes a
recommendation based on popularity,
interests, demographics, frequency and other
features.
- The reinforcement aspect of RL allows it to
adapt faster to real-time changes in user
sentiment and profile, without need for
explicit (re-)training.
- Enterprise adoption also seems to be gaining
momentum with the availability of Cloud APIs
(e.g. Azure Personalizer) and Google’s RecSim.
Article recommendation based on Azure
Personalizer
*D. Biswas. Delayed Rewards in the Context of Reinforcement
Learning based Recommender Systems. AAI4H@ECAI 2020: 49-
53
RL FOR INDUSTRIAL CONTROL
RL is a good fit for Industrial Control Systems as it is able to learn and adapt
to multi-parameterized system dynamics in real-time, without requiring any
knowledge of the underlying system model.
Leading to widespread adoption of RL in Control systems, from controlling
combustion engines, to robotic arms cutting metals, to air conditioning systems in
buildings.
“We define Data Driven Control as simply Machine Learning (ML) techniques
applied to Control Systems.”
We deep dive into the underlying reasons / trends, starting with an understanding of
the limitations of Control Theory for Control Systems.
CONTROL THEORY
System & Controller
A Control System consists of a System & Controller:
- System to control
- Controller applies a control strategy to control the system
in an optimal fashion.
Any strategy that the Controller can apply is constrained by:
- its knowledge of the system state — in most cases,
provided by the System Sensors;
- and the system parameters that it can control — also
referred to as the System Actuators. E.g., an engine can
only drive a car within a certain speed range, at a certain
acceleration..
CONTROL THEORY - LIMITATIONS
Linear Equations
Designing a control strategy then consists of solving the equations
characterizing the system behavior — often modeled in the form of
linear equations.
Most of Control Theory is targeted towards solving linear
equations.
Unfortunately, real-world systems are (mostly) non-linear. E.g., even
the equation to capture the motion of a pendulum is non-linear.
There has been a lot of research on linearization methods, basically
techniques to convert non-linear equations to linear ones and then
trying to solve them using linear state space control theory.
Unfortunately, such linearization methods are very
limited to specific classes of non-linear equations and
cannot be generalized easily.
CONTROL THEORY – LIMITATIONS (2)
Model Driven Control
A model of the system and its corresponding
equations are required.
This is the reason that traditional control strategies,
also referred to as Model Driven Control, still
exclude a lot of systems that we do not know how to
model (whose system equations are not known).
And, the complexity of such systems is only increasing
day by day, where we want to solve hyper-scale
problems, e.g., climate control, disease control,
automated vehicles, financial markets, etc.
MACHINE LEARNING (ML) TO THE RESCUE
Data Driven Control
ML/Data based approaches show a lot of
promise in this context.
The underlying logic here is that even for a very high
dimensional system that we cannot model, there are
dominant patterns that characterize the system
behavior — and Machine Learning (Deep
Learning) is very good at learning these
patterns.
This would (most likely) be an approximation, and
while we still would not understand the system fully
— it is good enough for most real-life use-cases
(including predictions), barring some exceptional
scenarios.
MODEL BASED RL
Offline Training
RL allows further fine-tuning the
developed ML Model.
In Model based RL, it is possible to develop a
model of the problem scenario, and bootstrap
initial RL training based on the model
simulation values.
For complex scenarios (e.g. games, robotic tasks),
where it is not possible to build a model of the
problem scenario, it might still be possible to
bootstrap an RL model based on historical values –
referred to as Offline Training.
Structured
Raw /
Staging
(Bronze)
Cleansed /
Standardize
d (Silver)
Transformed
/ Modeled
(Gold)
Unstructure
d
BI / Reporting
AI/ML
Feature
extraction
Training
dataset
Test
dataset
Model
Training
Exploratory
Data
Analysis
Model
Serving
(Inference)
Model
Monitoring
DataOps DQ/Validation Filtering
Historization Aggregation
RL based Model
Improvement
DQ/Cleaning Encoding
Selection Normalization
ML Outputs
(Inferences,
Predictions )
*D. Biswas. MLOps for Compositional AI. NeurIPS Workshop on
Challenges in Deploying and Monitoring Machine Learning Systems
(DMML), 2022.
CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION
HVAC Optimization
The primary goal of the the HVAC (Heating, Ventilation and Air
Conditioning) units is to keep the temperature and
(relative) humidity within the prescribed manufacturing
tolerance ranges.
By controlling 4 Output valves: Cooling, Heating, Re-
heating and Humidifier. This needs to be balanced with
energy savings and CO2 emission reductions to offset
the environmental impact of running them.
This is a complex problem as it requires computing an
optimal state taking into account multiple variable factors,
e.g. the occupancy in a building zone, temperature
requirements of operating machines, air flow dynamics within
the building, external weather conditions, etc.
D. Biswas. Reinforcement Learning based Energy Optimization in
Factories. In proceedings of the 11th ACM e-Energy Conference, Jun
2020.
CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (2)
HVAC -RL Formulation
At any point in time, a factory zone is in a state
characterized by the temperature and (relative)
humidity values observed inside and outside the
zone.
The game environment in this case corresponds
to the temperature and humidity tolerance levels,
which basically mandate that the zone temperature
and humidity values should be within the
range: 19–25 degrees and 45–55%.
The set of available actions in this case are the
Cooling, Heating, Re-heating and Humidifier valve
opening percentages (%).
CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (3)
The Reward Function assigns a reward to each action based on the following three parameters:
A control strategy is to decide on the weightage of the three parameters: Setpoint Closeness (SC), Energy Cost
(EC), Tolerance Violation (TV). The Energy Cost is captured in terms of electricity consumption and CO2 emission.
Setpoint Closeness encourages a "business friendly" policy where the RL model attempts to keep the zone temperature
as close as possible to the temperature / humidity setpoints, implicitly reducing the risk of violations, but at a higher
Energy Cost.
We opt for a "balanced” control policy which maximizes Energy Savings and Setpoint Closeness, while
minimizing the risk of Tolerance Violations.
CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (4)
Optimization Results
Within a 6-month pilot, we were able to develop
and operationalize a RL based HVAC controller
that is able to learn and adapt to real-life factory
settings, without the need for any offline training.
It showcases the successful transition of an
Industrial Control System run by a traditional
PID controller for the last 10+ years, to a more
efficient RL based controller.
Benchmarking results show the potential to
save up to 25% in energy efficiency (as
compared to when they were operated by
PID controllers).
Thank
You
&
Question
s
Contact: Debmalya Biswas
LinkedIn:
https://www.linkedin.com/in/debmalya-
biswas-3975261/
Medium:
https://medium.com/@debmalyabiswas

More Related Content

What's hot

Introduction to IoT Architectures and Protocols
Introduction to IoT Architectures and ProtocolsIntroduction to IoT Architectures and Protocols
Introduction to IoT Architectures and Protocols
Abdullah Alfadhly
 
Particle Swarm optimization
Particle Swarm optimizationParticle Swarm optimization
Particle Swarm optimization
midhulavijayan
 
Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...
Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...
Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...
Leonardo ENERGY
 
Artificial intelligence agents and environment
Artificial intelligence agents and environmentArtificial intelligence agents and environment
Artificial intelligence agents and environment
Minakshi Atre
 
What is Reinforcement Learning in Machine Learning
What is  Reinforcement Learning in Machine LearningWhat is  Reinforcement Learning in Machine Learning
What is Reinforcement Learning in Machine Learning
Lesa Cote
 
Flowchart of GA
Flowchart of GAFlowchart of GA
Flowchart of GA
Ishucs
 
introduction to machin learning
introduction to machin learningintroduction to machin learning
introduction to machin learning
nilimapatel6
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
Pratheeban Rajendran
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical Systems
Bob Marcus
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
Justin Sebok
 
Applications of IOT (internet of things)
Applications of IOT (internet of things)Applications of IOT (internet of things)
Applications of IOT (internet of things)
Vinesh Gowda
 
Embedded system design process
Embedded system design processEmbedded system design process
Embedded system design process
RajalakshmiSermadurai
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
Wagston Staehler
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Vinod Kumar Meghwar
 
Artificial Intelligence in Power Systems
Artificial Intelligence in Power SystemsArtificial Intelligence in Power Systems
Artificial Intelligence in Power Systems
manogna gwen
 
Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
Saurabh Kaushik
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm ppt
Mayank Jain
 

What's hot (20)

Introduction to IoT Architectures and Protocols
Introduction to IoT Architectures and ProtocolsIntroduction to IoT Architectures and Protocols
Introduction to IoT Architectures and Protocols
 
Particle Swarm optimization
Particle Swarm optimizationParticle Swarm optimization
Particle Swarm optimization
 
Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...
Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...
Cybersecurity for Smart Grids: Vulnerabilities and Strategies to Provide Cybe...
 
Artificial intelligence agents and environment
Artificial intelligence agents and environmentArtificial intelligence agents and environment
Artificial intelligence agents and environment
 
What is Reinforcement Learning in Machine Learning
What is  Reinforcement Learning in Machine LearningWhat is  Reinforcement Learning in Machine Learning
What is Reinforcement Learning in Machine Learning
 
Flowchart of GA
Flowchart of GAFlowchart of GA
Flowchart of GA
 
introduction to machin learning
introduction to machin learningintroduction to machin learning
introduction to machin learning
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical Systems
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
Applications of IOT (internet of things)
Applications of IOT (internet of things)Applications of IOT (internet of things)
Applications of IOT (internet of things)
 
Embedded system design process
Embedded system design processEmbedded system design process
Embedded system design process
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Artificial Intelligence in Power Systems
Artificial Intelligence in Power SystemsArtificial Intelligence in Power Systems
Artificial Intelligence in Power Systems
 
Machine learning
Machine learningMachine learning
Machine learning
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm ppt
 

Similar to Data-Driven (Reinforcement Learning-Based) Control

IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
IRJET Journal
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
Pooyan Jamshidi
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingSteve Feldman
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)Steve Feldman
 
Optimazation
OptimazationOptimazation
SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...
SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...
SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...
Oliver Barreto Rodríguez
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems Architectures
Obeo
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business framework
CBOD ANR project U-PSUD
 
Introduction to System, Simulation and Model
Introduction to System, Simulation and ModelIntroduction to System, Simulation and Model
Introduction to System, Simulation and Model
Md. Hasan Imam Bijoy
 
25
2525
Solving big data challenges for enterprise application
Solving big data challenges for enterprise applicationSolving big data challenges for enterprise application
Solving big data challenges for enterprise applicationTrieu Dao Minh
 
Sbi simulation
Sbi simulationSbi simulation
Sbi simulation
Egart'z Sarawak
 
Enterprise performance engineering solutions
Enterprise performance engineering solutionsEnterprise performance engineering solutions
Enterprise performance engineering solutionsInfosys
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Reza Farrahi Moghaddam, PhD, BEng
 
Cybernetics in supply chain management
Cybernetics in supply chain managementCybernetics in supply chain management
Cybernetics in supply chain management
Luis Cabrera
 
Capella Based System Engineering Modelling and Multi-Objective Optimization o...
Capella Based System Engineering Modelling and Multi-Objective Optimization o...Capella Based System Engineering Modelling and Multi-Objective Optimization o...
Capella Based System Engineering Modelling and Multi-Objective Optimization o...
MehdiJahromi
 
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IJCSEA Journal
 
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IJCSEA Journal
 
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IJCSEA Journal
 

Similar to Data-Driven (Reinforcement Learning-Based) Control (20)

IRJET- Machine Learning Techniques for Code Optimization
IRJET-  	  Machine Learning Techniques for Code OptimizationIRJET-  	  Machine Learning Techniques for Code Optimization
IRJET- Machine Learning Techniques for Code Optimization
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
 
Optimazation
OptimazationOptimazation
Optimazation
 
SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...
SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...
SLALOM Webinar Final Technical Outcomes Explanined "Using the SLALOM Technica...
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems Architectures
 
Victor Chang: Cloud computing business framework
Victor Chang: Cloud computing business frameworkVictor Chang: Cloud computing business framework
Victor Chang: Cloud computing business framework
 
Introduction to System, Simulation and Model
Introduction to System, Simulation and ModelIntroduction to System, Simulation and Model
Introduction to System, Simulation and Model
 
25
2525
25
 
Solving big data challenges for enterprise application
Solving big data challenges for enterprise applicationSolving big data challenges for enterprise application
Solving big data challenges for enterprise application
 
Sbi simulation
Sbi simulationSbi simulation
Sbi simulation
 
Enterprise performance engineering solutions
Enterprise performance engineering solutionsEnterprise performance engineering solutions
Enterprise performance engineering solutions
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing...
 
Cybernetics in supply chain management
Cybernetics in supply chain managementCybernetics in supply chain management
Cybernetics in supply chain management
 
Capella Based System Engineering Modelling and Multi-Objective Optimization o...
Capella Based System Engineering Modelling and Multi-Objective Optimization o...Capella Based System Engineering Modelling and Multi-Objective Optimization o...
Capella Based System Engineering Modelling and Multi-Objective Optimization o...
 
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
 
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
 
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
 

More from Debmalya Biswas

Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking
Constraints Enabled Autonomous Agent Marketplace: Discovery and MatchmakingConstraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking
Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking
Debmalya Biswas
 
Responsible Generative AI Design Patterns
Responsible Generative AI Design PatternsResponsible Generative AI Design Patterns
Responsible Generative AI Design Patterns
Debmalya Biswas
 
Sustainable & Composable Generative AI
Sustainable & Composable Generative AISustainable & Composable Generative AI
Sustainable & Composable Generative AI
Debmalya Biswas
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
Debmalya Biswas
 
MLOps for Compositional AI
MLOps for Compositional AIMLOps for Compositional AI
MLOps for Compositional AI
Debmalya Biswas
 
A Privacy Framework for Hierarchical Federated Learning
A Privacy Framework for Hierarchical Federated LearningA Privacy Framework for Hierarchical Federated Learning
A Privacy Framework for Hierarchical Federated Learning
Debmalya Biswas
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
Debmalya Biswas
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML Services
Debmalya Biswas
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
Debmalya Biswas
 
Privacy Preserving Chatbot Conversations
Privacy Preserving Chatbot ConversationsPrivacy Preserving Chatbot Conversations
Privacy Preserving Chatbot Conversations
Debmalya Biswas
 
Reinforcement Learning based HVAC Optimization in Factories
Reinforcement Learning based HVAC Optimization in FactoriesReinforcement Learning based HVAC Optimization in Factories
Reinforcement Learning based HVAC Optimization in Factories
Debmalya Biswas
 
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Debmalya Biswas
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Debmalya Biswas
 
Privacy-Preserving Outsourced Profiling
Privacy-Preserving Outsourced ProfilingPrivacy-Preserving Outsourced Profiling
Privacy-Preserving Outsourced Profiling
Debmalya Biswas
 
Privacy Policies Change Management for Smartphones
Privacy Policies Change Management for SmartphonesPrivacy Policies Change Management for Smartphones
Privacy Policies Change Management for Smartphones
Debmalya Biswas
 

More from Debmalya Biswas (15)

Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking
Constraints Enabled Autonomous Agent Marketplace: Discovery and MatchmakingConstraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking
Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking
 
Responsible Generative AI Design Patterns
Responsible Generative AI Design PatternsResponsible Generative AI Design Patterns
Responsible Generative AI Design Patterns
 
Sustainable & Composable Generative AI
Sustainable & Composable Generative AISustainable & Composable Generative AI
Sustainable & Composable Generative AI
 
Regulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with TransparencyRegulating Generative AI - LLMOps pipelines with Transparency
Regulating Generative AI - LLMOps pipelines with Transparency
 
MLOps for Compositional AI
MLOps for Compositional AIMLOps for Compositional AI
MLOps for Compositional AI
 
A Privacy Framework for Hierarchical Federated Learning
A Privacy Framework for Hierarchical Federated LearningA Privacy Framework for Hierarchical Federated Learning
A Privacy Framework for Hierarchical Federated Learning
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML Services
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
 
Privacy Preserving Chatbot Conversations
Privacy Preserving Chatbot ConversationsPrivacy Preserving Chatbot Conversations
Privacy Preserving Chatbot Conversations
 
Reinforcement Learning based HVAC Optimization in Factories
Reinforcement Learning based HVAC Optimization in FactoriesReinforcement Learning based HVAC Optimization in Factories
Reinforcement Learning based HVAC Optimization in Factories
 
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...Delayed Rewards in the context of Reinforcement Learning based Recommender ...
Delayed Rewards in the context of Reinforcement Learning based Recommender ...
 
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...Building an enterprise Natural Language Search Engine with ElasticSearch and ...
Building an enterprise Natural Language Search Engine with ElasticSearch and ...
 
Privacy-Preserving Outsourced Profiling
Privacy-Preserving Outsourced ProfilingPrivacy-Preserving Outsourced Profiling
Privacy-Preserving Outsourced Profiling
 
Privacy Policies Change Management for Smartphones
Privacy Policies Change Management for SmartphonesPrivacy Policies Change Management for Smartphones
Privacy Policies Change Management for Smartphones
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Data-Driven (Reinforcement Learning-Based) Control

  • 1. DATA DRIVEN (REINFORCEMENT LEARNING BASED) CONTROL INDUSTRIAL APPLICATIONS OF REINFORCEMENT LEARNING DEBMALYA BISWAS WIPRO AI
  • 2. AGENDA  Introduction  Reinforcement Learning (RL) Fundamentals  Industrial Control Systems  Control Theory Limitations  RL/Data-driven Control to the Rescue  Case Study: RL based HVAC Energy Optimization
  • 3. INTRODUCTION Knowledge Base Synthesize Response (Natural Language Generation - NLG) Understand User Intent (Natural Language Understanding - NLU) Prompt (Natural Language Query - NLQ) Return Response S E C U R I T Y User feedback loop (Reinforcement Learning with Human Feedback - RLHF) E X P L A I N Reinforcement Learning in ChatGPT A significant amount of manual effort has been incorporated in the form of user feedback, to improve the accuracy of ChatGPT leveraging Reinforcement Learning*.” *E. Ricciardelli, D. Biswas. Self-improving Chatbots based on Reinforcement Learning. 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019.
  • 4. BACKGROUND Reinforcement Learning (RL) Basics - RL refers to a branch of AI/ML, which are targeted towards goal-oriented problems. - RL algorithms are able to achieve complex goals by maximizing a reward function over many steps, e.g. the points won in a game over many steps. - The reward function works similar to incentivizing a child with candy and spankings, such that the algorithm is penalized when it takes a wrong decision and rewarded when it takes a right one – this is reinforcement.
  • 5. RL FORMULATION RL Policy and Rewards Functions - Reward: refers to the feedback by which we measure the success or failure of an agent’s recommended action. - Policy: is the strategy that the agent employs to select the next best action. - The roles and responsibilities of the Reward function vs. RL Agent policies are not very well defined, and can vary between architectures. A naïve understanding would be that given an associated reward / cost with every state- action pair, the policy would always try to minimize the overall cost. Apparently, it seems that sometimes keeping the ecosystem in a stable state can be more important than minimizing the cost (e.g. in a climate control use-case).
  • 6. RL IN RECOMMENDATION SYSTEMS Recommendation Systems - Recommenders: Given a user profile and categorized content, the system makes a recommendation based on popularity, interests, demographics, frequency and other features. - The reinforcement aspect of RL allows it to adapt faster to real-time changes in user sentiment and profile, without need for explicit (re-)training. - Enterprise adoption also seems to be gaining momentum with the availability of Cloud APIs (e.g. Azure Personalizer) and Google’s RecSim. Article recommendation based on Azure Personalizer *D. Biswas. Delayed Rewards in the Context of Reinforcement Learning based Recommender Systems. AAI4H@ECAI 2020: 49- 53
  • 7. RL FOR INDUSTRIAL CONTROL RL is a good fit for Industrial Control Systems as it is able to learn and adapt to multi-parameterized system dynamics in real-time, without requiring any knowledge of the underlying system model. Leading to widespread adoption of RL in Control systems, from controlling combustion engines, to robotic arms cutting metals, to air conditioning systems in buildings. “We define Data Driven Control as simply Machine Learning (ML) techniques applied to Control Systems.” We deep dive into the underlying reasons / trends, starting with an understanding of the limitations of Control Theory for Control Systems.
  • 8. CONTROL THEORY System & Controller A Control System consists of a System & Controller: - System to control - Controller applies a control strategy to control the system in an optimal fashion. Any strategy that the Controller can apply is constrained by: - its knowledge of the system state — in most cases, provided by the System Sensors; - and the system parameters that it can control — also referred to as the System Actuators. E.g., an engine can only drive a car within a certain speed range, at a certain acceleration..
  • 9. CONTROL THEORY - LIMITATIONS Linear Equations Designing a control strategy then consists of solving the equations characterizing the system behavior — often modeled in the form of linear equations. Most of Control Theory is targeted towards solving linear equations. Unfortunately, real-world systems are (mostly) non-linear. E.g., even the equation to capture the motion of a pendulum is non-linear. There has been a lot of research on linearization methods, basically techniques to convert non-linear equations to linear ones and then trying to solve them using linear state space control theory. Unfortunately, such linearization methods are very limited to specific classes of non-linear equations and cannot be generalized easily.
  • 10. CONTROL THEORY – LIMITATIONS (2) Model Driven Control A model of the system and its corresponding equations are required. This is the reason that traditional control strategies, also referred to as Model Driven Control, still exclude a lot of systems that we do not know how to model (whose system equations are not known). And, the complexity of such systems is only increasing day by day, where we want to solve hyper-scale problems, e.g., climate control, disease control, automated vehicles, financial markets, etc.
  • 11. MACHINE LEARNING (ML) TO THE RESCUE Data Driven Control ML/Data based approaches show a lot of promise in this context. The underlying logic here is that even for a very high dimensional system that we cannot model, there are dominant patterns that characterize the system behavior — and Machine Learning (Deep Learning) is very good at learning these patterns. This would (most likely) be an approximation, and while we still would not understand the system fully — it is good enough for most real-life use-cases (including predictions), barring some exceptional scenarios.
  • 12. MODEL BASED RL Offline Training RL allows further fine-tuning the developed ML Model. In Model based RL, it is possible to develop a model of the problem scenario, and bootstrap initial RL training based on the model simulation values. For complex scenarios (e.g. games, robotic tasks), where it is not possible to build a model of the problem scenario, it might still be possible to bootstrap an RL model based on historical values – referred to as Offline Training. Structured Raw / Staging (Bronze) Cleansed / Standardize d (Silver) Transformed / Modeled (Gold) Unstructure d BI / Reporting AI/ML Feature extraction Training dataset Test dataset Model Training Exploratory Data Analysis Model Serving (Inference) Model Monitoring DataOps DQ/Validation Filtering Historization Aggregation RL based Model Improvement DQ/Cleaning Encoding Selection Normalization ML Outputs (Inferences, Predictions ) *D. Biswas. MLOps for Compositional AI. NeurIPS Workshop on Challenges in Deploying and Monitoring Machine Learning Systems (DMML), 2022.
  • 13. CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION HVAC Optimization The primary goal of the the HVAC (Heating, Ventilation and Air Conditioning) units is to keep the temperature and (relative) humidity within the prescribed manufacturing tolerance ranges. By controlling 4 Output valves: Cooling, Heating, Re- heating and Humidifier. This needs to be balanced with energy savings and CO2 emission reductions to offset the environmental impact of running them. This is a complex problem as it requires computing an optimal state taking into account multiple variable factors, e.g. the occupancy in a building zone, temperature requirements of operating machines, air flow dynamics within the building, external weather conditions, etc. D. Biswas. Reinforcement Learning based Energy Optimization in Factories. In proceedings of the 11th ACM e-Energy Conference, Jun 2020.
  • 14. CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (2) HVAC -RL Formulation At any point in time, a factory zone is in a state characterized by the temperature and (relative) humidity values observed inside and outside the zone. The game environment in this case corresponds to the temperature and humidity tolerance levels, which basically mandate that the zone temperature and humidity values should be within the range: 19–25 degrees and 45–55%. The set of available actions in this case are the Cooling, Heating, Re-heating and Humidifier valve opening percentages (%).
  • 15. CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (3) The Reward Function assigns a reward to each action based on the following three parameters: A control strategy is to decide on the weightage of the three parameters: Setpoint Closeness (SC), Energy Cost (EC), Tolerance Violation (TV). The Energy Cost is captured in terms of electricity consumption and CO2 emission. Setpoint Closeness encourages a "business friendly" policy where the RL model attempts to keep the zone temperature as close as possible to the temperature / humidity setpoints, implicitly reducing the risk of violations, but at a higher Energy Cost. We opt for a "balanced” control policy which maximizes Energy Savings and Setpoint Closeness, while minimizing the risk of Tolerance Violations.
  • 16. CASE STUDY: RL BASED HVAC ENERGY OPTIMIZATION (4) Optimization Results Within a 6-month pilot, we were able to develop and operationalize a RL based HVAC controller that is able to learn and adapt to real-life factory settings, without the need for any offline training. It showcases the successful transition of an Industrial Control System run by a traditional PID controller for the last 10+ years, to a more efficient RL based controller. Benchmarking results show the potential to save up to 25% in energy efficiency (as compared to when they were operated by PID controllers).