AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering
This paper introduces AutoViVQA, a large-scale automatically constructed dataset for Vietnamese Visual Question Answering, and evaluates transformer-based multimodal models alongside various automatic metrics to assess their performance and alignment with human judgment in the Vietnamese context.