SaiVLA-0: Cerebrum--Pons--Cerebellum Tripartite Architecture for Compute-Aware Vision-Language-Action
SaiVLA-0 introduces a neuroscience-inspired, compute-aware Vision-Language-Action framework featuring a tripartite Cerebrum-Pons-Cerebellum architecture that decouples high-level semantics from real-time control to achieve modular scalability, active foveated vision, and significant improvements in training efficiency and task success rates.