Fusion Complexity Inversion: Why Simpler Cross View Modules Outperform SSMs and Cross View Attention Transformers for Pasture Biomass Regression

This study demonstrates that for pasture biomass regression on scarce agricultural data, prioritizing high-quality backbone pretraining and utilizing simple local fusion modules significantly outperforms complex global architectures like SSMs and cross-view attention transformers, a phenomenon termed "fusion complexity inversion."

Mridankan Mandal2026-03-10🤖 cs.LG

Structure and Progress Aware Diffusion for Medical Image Segmentation

This paper proposes Structure and Progress Aware Diffusion (SPAD), a novel framework for medical image segmentation that employs a progress-aware scheduler to guide a coarse-to-fine learning paradigm, utilizing semantic-concentrated and boundary-centralized diffusion modules to effectively balance stable anatomical structure understanding with the refinement of ambiguous target boundaries.

Siyuan Song, Guyue Hu, Chenglong Li, Dengdi Sun, Zhe Jin, Jin Tang2026-03-10💻 cs

Beyond Heuristic Prompting: A Concept-Guided Bayesian Framework for Zero-Shot Image Recognition

This paper proposes a Concept-Guided Bayesian Framework for zero-shot image recognition that enhances Vision-Language Models by treating class-specific concepts as latent variables, utilizing an LLM-driven synthesis pipeline with diversity enforcement and a training-free adaptive soft-trim likelihood to achieve superior performance over heuristic prompting methods.

Hui Liu, Kecheng Chen, Jialiang Wang, Xianming Liu, Wenya Wang, Haoliang Li2026-03-10💻 cs

IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation

The paper proposes IMSE, a test-time adaptation method that fine-tunes only the singular values of Vision Transformer linear layers via a spectral mixture of experts and a diversity maximization loss to prevent feature collapse, achieving state-of-the-art performance with significantly fewer trainable parameters.

Sunghyun Baek (Korea Advanced Institute of Science and Technology), Jaemyung Yu (Korea Advanced Institute of Science and Technology), Seunghee Koh (Korea Advanced Institute of Science and Technology), Minsu Kim (LG Energy Solution), Hyeonseong Jeon (LG Energy Solution), Junmo Kim (Korea Advanced Institute of Science and Technology)2026-03-10💻 cs

Text to Automata Diagrams: Comparing TikZ Code Generation with Direct Image Synthesis

This study evaluates the effectiveness of vision-language and large language models in converting scanned student-drawn automata diagrams into TikZ code, finding that while direct image-to-text generation often yields errors, human-corrected descriptions significantly improve the accuracy of the resulting digital diagrams for educational applications like automated grading.

Ethan Young, Zichun Wang, Aiden Taylor, Chance Jewell, Julian Myers, Satya Sri Rajiteswari Nimmagadda, Anthony White, Aniruddha Maiti, Ananya Jana2026-03-10💻 cs

VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

VisualAD is a language-free, zero-shot anomaly detection framework that leverages a frozen Vision Transformer backbone with learnable normality and abnormality tokens, along with spatial-aware cross-attention and self-alignment modules, to achieve state-of-the-art performance across industrial and medical domains without relying on text encoders or cross-modal alignment.

Yanning Hou, Peiyuan Li, Zirui Liu, Yitong Wang, Yanran Ruan, Jianfeng Qiu, Ke Xu2026-03-10💻 cs