AR2-4FV: Anchored Referring and Re-identification for Long-Term Grounding in Fixed-View Videos
The paper proposes AR2-4FV, a novel framework for long-term language-guided referring in fixed-view videos that leverages a static background-derived Anchor Bank and a ReID-Gating mechanism to maintain identity continuity and accelerate re-capture during occlusions or scene exits, significantly outperforming existing baselines in re-capture rate and latency.