Application of Deep Reinforcement Learning for Optimising Energy Consumption Systems

This article discusses how deep reinforcement learning can be applied to optimise energy consumption systems, using an indoor ice arena as an illustrative example. The study explores how IoT data and AI control agents can learn optimal control strategies to reduce energy use in systems such as refrigeration plants.

Overview

In the context of global warming and rising energy prices driven by the energy crisis, optimising energy consumption systems has become increasingly important for businesses and governments alike. Digital transformation in the energy sector enables the adoption of advanced artificial intelligence methods to support smarter energy management.

The goal of this research is to minimise electricity costs at an ice arena by using data collected from IoT sensors to develop a reinforcement learning model that maintains optimal ice surface conditions. Simulation results indicate that applying reinforcement learning algorithms can lead to reductions in energy consumption.

Introduction and Context

Maintaining climate control and energy systems inside large ice arenas — where refrigeration equipment can account for 60–80% of the site’s total energy use — is a major operational challenge. Digitalising such environments with IoT devices allows real-time monitoring without replacing expensive legacy hardware.

IoT sensors are installed on the building and equipment to collect energy and climate data, which is then fed into a control system. This rich dataset supports analysis of energy use and enables recommendations for energy-saving actions based on observed patterns.

Deep reinforcement learning (DRL) combines reinforcement learning and deep neural networks, allowing an AI agent to learn optimal control policies through interaction with the environment. In this work, the agent is trained to manage compressor stages on a refrigeration plant by maximising rewards linked to maintaining target ice temperature.

Reinforcement Learning Framework

Reinforcement learning is based on a Markov Decision Process (MDP), where an agent interacts with an environment, selects actions, and receives rewards. The objective is to maximise cumulative future reward by discovering optimal policies through trial and feedback.

In this case, the agent observes environmental states (from per-minute aggregated sensor data) and chooses actions — such as adjusting compressor stage settings — to keep surface ice temperature near a target value (e.g. –3.5 °C). The algorithm used in the study is a Double Deep Q-Network (Double DQN), a deep reinforcement learning technique that uses two neural networks to estimate state-action value functions.

The agent’s learning balances exploitation (choosing known good actions) and exploration (trying new actions) using an ε-greedy strategy, where with some probability a random action is taken to discover better strategies.

Data and Model Evaluation

A dataset collected from the ice arena includes minute-level aggregated measurements from sensors:

Dehumidifier energy
Pump energy
Compressor energy (4 stages)
Indoor/outdoor temperature and humidity
Motion and activity levels
Glycol supply/return temperatures
Time index

The reinforcement learning agent was trained to maintain the ice temperature at the target setpoint over time. Using this approach, the system achieved a significant reduction in energy consumption relative to baseline control (up to roughly 42% savings compared with a basic rule-based strategy), demonstrating the effectiveness of the learned policies in adapting to changing conditions.

Results and Insights

The agent’s reward function penalises deviations from the target temperature. When unexpected events — such as a sudden temperature rise due to resurfacing — influence conditions, the trained agent adapts by proactively adjusting compressor stages based on predicted future conditions, reducing energy use while maintaining ice quality.

This demonstrates the strength of reinforcement learning in managing complex control systems where dynamic and non-linear interactions occur between environmental conditions and operational states.

Discussion and Practical Considerations

While the simulation results are promising, real-world deployment requires additional considerations. For example, cold-start behaviour must be addressed: an agent in a live system cannot act randomly for long periods as it learns. Pre-training on simulated or historical data may be necessary to shorten adaptation time. Also, systems must be capable of re-learning after significant changes such as equipment upgrades or facility remodels.

Conclusion

Deep reinforcement learning offers a robust approach for energy consumption optimisation in complex environments like ice arenas. By learning control policies that adapt to changing conditions and forecast system behaviour, such models can deliver substantial energy savings and operational efficiency improvements. This opens pathways for broader application in HVAC systems and other energy-intensive infrastructure.