Reinforcement Learning-Augmented MPC for Stable Bipedal Locomotion on Deformable Terrain – Laboratory for Intelligent Decision and Autonomous Robots

Motivation: This project aims to develop a reinforcement learning-augmented model predictive control (RL-augmented MPC) framework designed for bipedal locomotion on challenging deformable terrain, such as gravel and sand.

Model predictive control (MPC)-based controllers offer stability and safety guarantees due to their physics-grounded, optimization-based methods. However, their performance is limited by the accuracy of the defined state dynamics model and constraints. Conversely, learning-based control methods, such as reinforcement learning (RL), demonstrate agile locomotion capabilities but are inherently unsafe and sample-inefficient. RL-augmented MPC frameworks, such as those utilizing residual learning, combine the strengths of both approaches and have shown promising results in aerial and quadrupedal robots. Nevertheless, their effectiveness for bipedal or humanoid robots on challenging deformable terrain remains to be demonstrated. The primary challenges include modeling complex terrain dynamics within the MPC’s state dynamics and switching between various gaits based on the robot’s and the environment’s state, rather than relying on predefined periodic gaits.

Methodology: Our work explores two types of architectures: (1) a hierarchical control architecture where RL outputs are passed to the MPC, which then handles the control, and (2) a parallel control architecture where both RL and MPC predict the same action, and the concatenated actions are used for control.

The first approach augments MPC’s single rigid body state dynamics (SRBD) by predicting linear and angular perturbation accelerations. These accelerations capture the reaction forces and moments applied by the deformable terrain.

The second approach explores augmenting the ground reaction wrench (GRW) from SRBD MPC through RL. The RL policy implicitly compensates for uncertain terrain dynamics by solving a reward maximization problem, finding the optimal residual term to add to the MPC output.

Additionally, we aim to adjust gait parameters such as contact durations, swing durations, and foot clearance through residual learning. With this framework, we aim to develop a stable yet adaptive controller capable of robust locomotion on challenging terrain.

Current status: As for the preliminary result, we are testing the algorithm in gravel field simulated inside IsaacLab, a Nvidia’s robotics simulator. Challenge lies in training the effective RL policy to augment MPC.