Context-Dependent Affordance Computation in Vision-Language Models
Through a large-scale study of Qwen-VL and LLaVA-1.5, this paper demonstrates that vision-language models exhibit significant context-dependent affordance drift, where both lexical and semantic outputs vary substantially based on agentic personas, suggesting a need for dynamic, query-dependent ontological projection in robotics rather than static world modeling.