PPO Reinforcement Learning for Quadruped Locomotion

Trained a robust locomotion policy for the Unitree Go2 quadruped robot within the Isaac Lab physics simulator.

PPO Reinforcement Learning for Quadruped Locomotion

Overview

Trained a robust locomotion policy for the Unitree Go2 quadruped robot within the Isaac Lab physics simulator.

Project Overview

This project centered on training a robust locomotion policy for the Unitree Go2 quadruped robot within the Isaac Lab physics simulator. Utilizing Proximal Policy Optimization (PPO)—a reinforcement learning algorithm that stabilizes training by limiting how much the policy can change per update—the goal was to evolve a deliberately weak baseline policy into a highly capable walking and trotting gait.

Through principled reward shaping and regularization strategies, the policy was optimized for smooth footfall timing, base height maintenance, and accurate tracking of forward, lateral, and yaw velocity commands. The model’s capabilities were further pushed by testing additional objectives, including bipedal walking and ensuring robust gait stability across procedurally generated, uneven terrain.

Reinforcement Learning Simulation Robotics