We present an example of using Deep Reinforcement Learning (DRL) to control the fan speeds inside a storage server, as a proxy for cooling energy usage, in relation to the stochastic read and write storage workloads. We use a model-free RL approach, and we train on the real environment, not through a simulator, and we present a reward shaping designed for this use case. We used Advantage Actor Critic (A2C) as the DRL algorithm. This work is presented at the ECML-PKDD 2018 conference in Dublin Ireland.
is an area of machine learning an agent learns from interaction with an environment what actions to take in order to optimize a long-term objective (expectation of cumulative rewards)