HypoSpace: Evaluating LLM Creativity as Set-Valued Hypothesis Generators under Underdetermination

The paper introduces HypoSpace, a diagnostic suite that evaluates large language models as set-valued hypothesis generators in underdetermined scientific domains by measuring validity, uniqueness, and recovery to reveal mode collapse that traditional correctness-only metrics fail to detect.

Tingting Chen, Beibei Lin, Zifeng Yuan, Qiran Zou, Hongyu He, Anirudh Goyal, Yew-Soon Ong, Dianbo LiuTue, 10 Ma💬 cs.CL

FOR-Prompting: From Objection to Revision via an Asymmetric Prompting Protocol

The paper introduces FOR-Prompting, a model-agnostic, asymmetric prompting protocol that enhances reasoning and iterative refinement across diverse tasks by structuring interactions between a Defender, a Questioner, and an optional Host, enabling even small models to achieve performance comparable to or better than standard baselines without requiring training or access to model internals.

He Zhang, Anzhou Zhang, Jian DaiTue, 10 Ma💬 cs.CL

A Simple "Motivation" Can Enhance Reinforcement Finetuning of Large Reasoning Models

This paper introduces MeRF, a method that enhances reinforcement finetuning of large reasoning models by injecting reward specifications directly into prompts as "motivation," thereby leveraging in-context learning to align generation with optimization objectives and achieve substantial performance gains over standard RLVR baselines.

Junjie Zhang, Guozheng Ma, Shunyu Liu, Haoyu Wang, Jiaxing Huang, Ting-En Lin, Fei Huang, Yongbin Li, Dacheng TaoTue, 10 Ma💬 cs.CL

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

SwingArena is a competitive evaluation framework that simulates real-world software development workflows by pairing LLMs as patch submitters and reviewers within a CI-driven pipeline, utilizing a retrieval-augmented code generation module to effectively solve long-context GitHub issues across multiple programming languages.

Wendong Xu, Jing Xiong, Chenyang Zhao, Qiujiang Chen, Haoran Wang, Hui Shen, Zhongwei Wan, Jianbo Dai, Taiqiang Wu, He Xiao, Chaofan Tao, Z. Morley Mao, Ying Sheng, Zhijiang Guo, Hongxia Yang, Bei Yu, Lingpeng Kong, Quanquan Gu, Ngai WongTue, 10 Ma💬 cs.CL

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

This paper proposes HaLoRA, a hardware-aware low-rank adaptation method that enhances the robustness of Large Language Models deployed on hybrid Compute-in-Memory architectures by training noise-resilient LoRA branches on SRAM while storing pretrained weights on noisy RRAM, thereby achieving significant energy efficiency and up to a 22.7-point performance improvement without compromising accuracy.

Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Wendong Xu, Chufan Shi, Zhengwu Liu, Ngai WongTue, 10 Ma💬 cs.CL

Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

This paper introduces a Discrete Key-Value Bottleneck (DKVB) for encoder-only small language models that enables efficient continual learning by alleviating catastrophic forgetting through localized updates and task-independent initialization, achieving competitive performance with lower computational costs even in challenging single-head scenarios.

Andor Diera, Lukas Galke, Fabian Karl, Ansgar ScherpTue, 10 Ma💬 cs.CL