BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
BitVLA introduces a fully native 1-bit Vision-Language-Action model for robotic manipulation that achieves performance comparable to full-precision baselines while significantly reducing memory footprint and latency through native ternary parameter design and a novel Quantize-then-Distill strategy for the vision backbone.