Exploring the Reasoning Depth of Small Language Models in Software Architecture: A Multidimensional Evaluation Framework Towards Software Engineering 2.0

This study benchmarks ten small language models on architectural decision record generation to establish a multidimensional evaluation framework, revealing that models exceeding 3 billion parameters excel in zero-shot reasoning while sub-2 billion models benefit most from fine-tuning, and that few-shot prompting effectively calibrates mid-sized models despite high semantic diversity often correlating with hallucinations.

Ha Vo, Nhut Tran, Khang Vo, Phat T. Tran-Truong, Son Ha2026-03-10💻 cs

Randomise Alone, Reach as a Team

This paper investigates concurrent graph games with distributed randomization where team players lack a shared random source, establishing that memoryless strategies suffice for the threshold problem (placing it in R\exists\mathbb{R} and proving NP-hardness) and that almost-sure reachability is NP-complete, while introducing the IRATL logic and a corresponding solver.

Léonard Brice, Thomas A. Henzinger, Alipasha Montaseri, Ali Shafiee, K. S. Thejaswini2026-03-10💻 cs

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

This paper empirically investigates whether large language models can synthesize executable Unity game code from Goal Playable Patterns under strict structural constraints, revealing that while intermediate representations improve performance, project-level grounding and hygiene failures remain primary bottlenecks in achieving high compilation success rates.

Hugh Xuechen Liu, Kıvanç Tatar2026-03-10💻 cs

Efficient Personalized Reranking with Semi-Autoregressive Generation and Online Knowledge Distillation

This paper proposes the Personalized Semi-Autoregressive with online knowledge Distillation (PSAD) framework, which utilizes a semi-autoregressive teacher model and a User Profile Network to balance generation quality with low-latency inference while enhancing user-item interactions, thereby outperforming state-of-the-art baselines in both ranking performance and efficiency.

Kai Cheng, Hao Wang, Wei Guo, Weiwen Liu, Yong Liu, Yawen Li, Enhong Chen2026-03-10💻 cs

Vision Language Models Cannot Reason About Physical Transformation

This paper introduces ConservationBench to demonstrate that current Vision Language Models systematically fail to reason about physical transformations and maintain invariant representations of physical quantities, often performing near chance levels despite strong textual priors favoring invariance.

Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng2026-03-10💻 cs