Remote Sensing Image Classification Using Deep Ensemble Learning

This paper proposes a deep ensemble learning framework that fuses four independent CNN-ViT hybrid models to overcome the performance bottlenecks of redundant feature representations, achieving state-of-the-art accuracy on remote sensing image classification datasets while maintaining computational efficiency.

Niful Islam, Md. Rayhan Ahmed, Nur Mohammad Fahad, Salekul Islam, A. K. M. Muzahidul Islam, Saddam Mukta, Swakkhar Shatabda2026-03-09🤖 cs.AI

TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

This paper introduces TumorChain, a multimodal interleaved reasoning framework paired with the large-scale TumorCoT dataset, to enhance the traceability, accuracy, and reliability of clinical tumor analysis by integrating 3D CT imaging with step-by-step Chain-of-Thought reasoning for lesion characterization and pathology prediction.

Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, Tony C. W. Mok, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, Ling Zhang2026-03-09💻 cs

PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues

The paper proposes PatchCue, a novel patch-based visual cue paradigm that enhances Vision-Language Model reasoning by aligning with human perceptual habits and leveraging patch-tokenized inputs through a two-stage training process, thereby outperforming existing pixel-level and point-based approaches across diverse benchmarks.

Yukun Qi, Pei Fu, Hang Li, Yuhan Liu, Chao Jiang, Bin Qin, Zhenbo Luo, Jian Luan2026-03-09💻 cs

CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis

CylinderSplat is a feed-forward framework for panoramic 3D Gaussian Splatting that introduces a novel cylindrical Triplane representation and a dual-branch architecture to effectively handle occlusions and geometric distortions in $360^\circ$ scenes, achieving state-of-the-art results in both single-view and multi-view novel view synthesis.

Qiwei Wang, Xianghui Ze, Jingyi Yu, Yujiao Shi2026-03-09💻 cs

InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation

The paper proposes InnoAds-Composer, a single-stage framework that efficiently generates e-commerce posters by integrating subject, text, and style controls through an optimized token routing mechanism and a text enhancement module, while also introducing a new high-quality dataset and benchmark for this task.

Yuxin Qin, Ke Cao, Haowei Liu, Ao Ma, Fengheng Li, Honghe Zhu, Zheng Zhang, Run Ling, Wei Feng, Xuanhua He, Zhanjie Zhang, Zhen Guo, Haoyi Bian, Jingjing Lv, Junjie Shen, Ching Law2026-03-09💻 cs

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

This paper proposes three bias mitigation techniques—top-k concept filtering, removal of biased concepts, and adversarial debiasing—to address information leakage in Concept Bottleneck Models, thereby achieving superior fairness-performance tradeoffs for interpretable image classification compared to prior work.

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal2026-03-09🤖 cs.LG

CollabOD: Collaborative Multi-Backbone with Cross-scale Vision for UAV Small Object Detection

CollabOD is a lightweight collaborative detection framework designed to enhance UAV small object detection by integrating structural detail preservation, cross-path feature alignment, and localization-aware lightweight strategies to overcome challenges like scale variation and feature degradation in high-altitude imagery.

Xuecheng Bai, Yuxiang Wang, Chuanzhi Xu, Boyu Hu, Kang Han, Ruijie Pan, Xiaowei Niu, Xiaotian Guan, Liqiang Fu, Pengfei Ye2026-03-09💻 cs

CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning

This paper introduces CORE-Seg, a reinforcement learning-driven framework that integrates a Semantic-Guided Prompt Adapter with a progressive SFT-to-GRPO training strategy to bridge the gap between visual segmentation and cognitive reasoning for complex medical lesions, achieving state-of-the-art performance on the newly proposed ComLesion-14K Chain-of-Thought benchmark.

Yuxin Xie, Yuming Chen, Yishan Yang, Yi Zhou, Tao Zhou, Zhen Zhao, Jiacheng Liu, Huazhu Fu2026-03-09🤖 cs.AI

BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

This paper introduces BlackMirror, a novel black-box, training-free framework that detects backdoored text-to-image models by identifying and verifying the stability of partial semantic deviations between instructions and generated images, overcoming the limitations of existing image-similarity-based methods against diverse backdoor attacks.

Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Xilin Zhao, Xiaochun Cao, Qingming Huang2026-03-09🤖 cs.AI