STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning
STAIRS-Former is a novel transformer architecture for offline multi-task multi-agent reinforcement learning that leverages spatio-temporal attention, an interleaved recursive structure, and token dropout to effectively handle varying agent populations and long-horizon dependencies, achieving state-of-the-art performance across diverse benchmarks.