EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
This paper introduces EgoCross, a comprehensive benchmark comprising 1,000 QA pairs across four challenging domains (surgery, industry, extreme sports, and animal perspective) to evaluate and expose the poor cross-domain generalization capabilities of current Multimodal Large Language Models in egocentric video question answering.