Complexity-Regularized Proximal Policy Optimization

This paper introduces Complexity-Regularized Proximal Policy Optimization (CR-PPO), a novel algorithm that replaces standard entropy regularization with a self-regulating complexity term—defined as the product of Shannon entropy and disequilibrium—to maintain beneficial stochasticity while reducing sensitivity to hyperparameter tuning and avoiding the overriding of reward signals.

Luca Serfilippi, Giorgio Franceschelli, Antonio Corradi + 1 more2026-03-06💻 cs

Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

This paper demonstrates that narrow finetuning leaves distinct, interpretable biases in LLM activations that can be extracted via model diffing to reconstruct training data characteristics and enhance interpretability, while warning that such models may not accurately represent broader finetuning scenarios and suggesting that mixing pretraining data can mitigate these overfitting traces.

Julian Minder, Clément Dumas, Stewart Slocum + 4 more2026-03-06💻 cs

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

This paper introduces SceneCOT, a novel framework that achieves grounded question-answering in 3D scenes by decoupling complex reasoning into manageable steps with visual clues, supported by the newly created SCENECOT-185K dataset, which demonstrates state-of-the-art performance and represents the first successful application of Chain-of-Thought reasoning to 3D scene understanding.

Xiongkun Linghu, Jiangyong Huang, Ziyu Zhu + 2 more2026-03-06💻 cs