Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion

This paper proposes a novel diffusion-based method for Open-Vocabulary Camouflaged Instance Segmentation (OVCIS) that effectively fuses multi-scale textual-visual features to overcome the challenges of blending boundaries and segmenting unseen object classes, demonstrating superior performance on benchmarks with applications in surveillance, wildlife monitoring, and military reconnaissance.

Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo + 4 more2026-03-05🤖 cs.AI

Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph

This paper proposes a novel framework that integrates Large Language Models with the Australian National University's Computer Science Scholarly Knowledge Graph, utilizing a Deep Document Model and KG-enhanced Query Processing to enable accurate, fine-grained semantic retrieval of research artifacts through automatic LLM-SPARQL fusion.

Runsong Jia, Bowen Zhang, Sergio J. Rodríguez Méndez + 1 more2026-03-05🤖 cs.AI

Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information

This paper introduces PubHealthBench, a new benchmark comprising over 8,000 questions derived from UK government guidance, to evaluate LLMs on public health knowledge and finds that while state-of-the-art models excel in multiple-choice tasks, their performance on free-form responses remains limited, highlighting the need for additional safeguards in real-world applications.

Joshua Harris, Fan Grayson, Felix Feldman + 8 more2026-03-05🤖 cs.LG

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

This paper introduces Multi-Objective Balanced Covering (MoB), a novel visual token pruning framework that leverages Hausdorff distance and ϵ\epsilon-covering theory to derive a closed-form error bound and dynamically balance prompt alignment with visual preservation, achieving significant inference acceleration with minimal performance loss across diverse multimodal models.

Yangfu Li, Hongjian Zhan, Tianyi Chen + 2 more2026-03-05💬 cs.CL

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

This paper introduces Supervised Calibration (SC), a loss-minimization framework that enhances In-Context Learning in Large Language Models by learning optimal per-class affine transformations to correct systematic biases and alter decision boundary orientations, thereby achieving state-of-the-art performance across multiple models and datasets.

Korel Gundem, Juncheng Dong, Dennis Zhang + 2 more2026-03-05🤖 cs.AI

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models

This paper investigates how preference models overrely on superficial artifacts like length and style rather than substantive quality, quantifying this miscalibration against human preferences and demonstrating that a counterfactual data augmentation method effectively mitigates these biases while preserving overall performance.

Anirudh Bharadwaj, Chaitanya Malaviya, Nitish Joshi + 1 more2026-03-05💬 cs.CL

CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering

This paper introduces CounselBench, a large-scale benchmark developed with mental health professionals to evaluate and adversarially stress-test large language models on realistic open-ended mental health questions, revealing significant gaps in safety, personalization, and the reliability of automated evaluation compared to human expert judgment.

Yahan Li, Jifan Yao, John Bosco S. Bunyi + 3 more2026-03-05💬 cs.CL