FluenceFormer: Transformer-Driven Multi-Beam Fluence Map Regression for Radiotherapy Planning

This paper introduces FluenceFormer, a transformer-driven, two-stage framework that leverages a physics-informed Fluence-Aware Regression loss to achieve superior, geometry-aware fluence map prediction for radiotherapy planning, significantly outperforming existing CNN and single-stage methods in energy conservation and structural fidelity.

Ujunwa Mgboh, Rafi Ibn Sultan, Joshua Kim + 2 more2026-03-06💻 cs

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments

This paper introduces a new dataset derived from football highlight reels to evaluate foundation models' ability to identify contextually important video moments, revealing that current state-of-the-art models perform near chance levels due to their reliance on single dominant modalities and failure to effectively synthesize cross-modal information.

Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle2026-03-06💻 cs

Pailitao-VL: Unified Embedding and Reranker for Real-Time Multi-Modal Industrial Search

Pailitao-VL is a unified multi-modal retrieval system that achieves state-of-the-art, real-time industrial search performance by replacing traditional contrastive embeddings with an absolute ID-recognition paradigm and evolving reranking into a compare-and-calibrate listwise policy, thereby overcoming granularity, noise, and latency challenges in large-scale production environments.

Lei Chen, Chen Ju, Xu Chen + 13 more2026-03-06💻 cs