Routing without Forgetting
The paper introduces Routing without Forgetting (RwF), a transformer architecture that addresses Online Continual Learning by replacing iterative gradient-based specialization with dynamic, single-step associative retrieval of input-conditioned prompts via energy-based layers, thereby achieving superior performance on class-incremental benchmarks without explicit task identifiers.