RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

This paper addresses the challenges of Kilometer Marker Recognition for autonomous metro trains in complex environments by proposing a robust multi-modal method that adapts a pre-trained RGB OCR foundation model to event camera data and introducing the first large-scale synchronized RGB-Event dataset, EvMetro5K, to validate the approach.

Xiaoyu Xian, Shiao Wang, Xiao Wang + 2 more2026-02-26🤖 cs.AI

Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D

The paper introduces Brain3D, a specialized vision-language framework that converts 2D pretrained encoders into native 3D architectures to automate neuroradiology report generation from brain tumor MRIs, achieving significantly higher clinical accuracy and perfect specificity on healthy scans compared to 2D baselines through a three-stage alignment process.

Mariano Barone, Francesco Di Serio, Giuseppe Riccio + 4 more2026-02-26💻 cs

GeoDiv: Framework For Measuring Geographical Diversity In Text-To-Image Models

The paper introduces GeoDiv, a novel framework leveraging large language and vision-language models to systematically measure and reveal significant geographical biases and socio-economic stereotypes in text-to-image generation, demonstrating how current models disproportionately portray countries like India, Nigeria, and Colombia in impoverished ways.

Abhipsa Basu, Mohana Singh, Shashank Agnihotri + 2 more2026-02-26💻 cs

WeaveTime: Stream from Earlier Frames into Emergent Memory in VideoLLMs

WeaveTime is a model-agnostic framework that addresses the time-agnostic limitations of current Video-LLMs in streaming scenarios by introducing a lightweight Temporal Reconstruction objective to instill order-aware representations and a Past-Current Dynamic Focus Cache for uncertainty-triggered retrieval, thereby improving accuracy and reducing latency without architectural changes.

Yulin Zhang, Cheng Shi, Sibei Yang2026-02-26💻 cs