Unlocking the Potential of MAPPO with Asynchronous Optimization

Abstract

It almost reaches a consensus that off-policy algorithms dominated research benchmarks of multi-agent reinforcement learning, while recent work demonstrates that on-policy MARL algorithm, Multi-Agent Proximal Policy Optimization (MAPPO), can also attain comparable performance. In this paper, we propose a training framework based on MAPPO, named async-MAPPO, which supports scalable asynchronous training. We further re-examine async-MAPPO in StarCraftII micromanagement domain and obtain state-of-the-art performances on several hard and super-hard maps. Finally, we analyze three experimental phenomena and provide hypotheses behind the performance improvement of async-MAPPO.

Publication
In CAAI International Conference on Artificial Intelligence
Yunfei Li
Yunfei Li
PhD student

My research interests include reinforcement learning and robotics.