Applying Experiment Insights in a Custom RL Environment

Applying Experiment Insights in a Custom RL Environment

Using real-world power and confidence data for adaptive edge inference (15.08.2025)

Förderjahr 2024 / Stipendium Call #19 / ProjektID: 7383 / Projekt: Dynamic Power Management for Edge AI: A Sustainable Self-Adaptive Approach

My experiment results on how model type, frame rate, and resolution affect YOLO power use and confidence now power a custom RL environment for adaptive inference on energy-constrained edge devices.

In my last post, I shared the results of my experiment on how model type, frame rate, and resolution affect both power consumption and prediction confidence in YOLO object detection. Those findings weren’t just for curiosity - they now form the backbone of a custom reinforcement learning (RL) environment I’ve built to explore adaptive inference on energy-constrained edge devices.

The Goal

The environment is designed to help an agent learn how to dynamically choose model configurations that balance energy use with detection quality. This is especially relevant for devices that run on limited or intermittent power sources, such as solar-charged batteries.

How the Environment Works

At its core is a simulated edge device. Each RL step represents a short operational period (e.g., 10 minutes), during which the agent chooses:

A YOLO model variant (YOLO11n or YOLO11s)
A frame rate (1-30 FPS)
A resolution (128-640 px)

This creates a multi-discrete action space of possible configurations.

Power Consumption Module

The Power module predicts energy draw based on the configuration. These predictions are not made-up - they’re generated by a RandomForestRegressor trained on my real-world measurement data from the experiment. This way, the simulation reflects the actual trends we observed: for example, resolutions above 320 px waste energy without much gain, and higher FPS has diminishing returns on detection quality.

Confidence Estimation Module

To model detection performance, the Inference Engine samples from Kernel Density Estimates (KDEs) fitted to the confidence score distributions I collected for each parameter combination. This lets the environment realistically reflect that, for example, YOLO11s at 320 px might deliver high confidence at higher power cost, while YOLO11n at 128 px is cheap to run but much less accurate.

Solar Charging and Battery Management

The environment also models a solar panel (varying input by time of day and random weather effects) and a battery (tracking charging and discharging). These constraints make the agent’s choices more realistic: it can’t just pick the “best” model every time - it needs to manage its energy budget over time.

A Step in Action

For each step:

The agent picks a model, FPS, and resolution.
The environment simulates solar input and power usage.
The battery level is updated.
A confidence value is sampled.
The next state and a reward are returned.

The observation space includes battery level, solar input, previous power usage, and previous confidence - all of which can guide the agent toward better adaptive strategies.

Next Steps

The environment is fully integrated with PPO from Stable Baselines3 and ready for experiments. My next focus will be designing reward functions that balance short-term detection quality with long-term energy sustainability — a challenge that deserves its own post.

Julia Oberauner

Weitere Blogbeiträge