cs papers | Gist.Science

When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs

This paper reveals that visual token information in Vision Large Language Models progressively vanishes at a depth-dependent "information horizon," beyond which existing pruning methods underperform random selection, leading to a novel strategy that integrates random pruning to achieve state-of-the-art efficiency without sacrificing accuracy.

Yahong Wang, Juncheng Wu, Zhangkai Ni, Longzhen Yang, Yihang Liu, Chengmei Yang, Ying Wen, Lianghua He, Xianfeng Tang, Hui Liu, Yuyin Zhou2026-03-10💻 cs

IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams

This study demonstrates that in heterogeneous multi-agent settings, a standard IPPO baseline trained via self-play generalizes effectively to novel teammate algorithms, achieving performance comparable to more complex training methods like Rotating Policy Training that explicitly expose agents to diverse partner strategies.

Ryan LeRoy, Jack Kolb2026-03-10💻 cs

Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction

This paper addresses the challenges of off-road road network extraction by introducing the WildRoad dataset and MaGRoad, a novel path-centric framework that overcomes the limitations of existing node-centric models to achieve state-of-the-art performance and faster inference in wild terrains.

Wenfei Guan, Jilin Mei, Tong Shen, Xumin Wu, Shuo Wang, Chen Min, Yu Hu2026-03-10💻 cs

ReMeDI: Refined Memory for Disambiguation of Identities with SAM3 in Surgical Segmentation

The paper introduces ReMeDI-SAM3, a training-free extension of SAM3 that enhances surgical instrument segmentation in endoscopy by implementing relevance-aware memory filtering, piecewise interpolation, and feature-based re-identification to overcome challenges like occlusions and rapid motion, achieving significant zero-shot performance improvements over existing methods.

Valay Bundele, Mehran Hosseinzadeh, Hendrik P. A. Lensch2026-03-10💻 cs

It is not always greener on the other side: Greenery perception across demographics and personalities in multiple cities

This study analyzes the discrepancies between objective and subjective urban greenery perceptions across five countries using street view imagery and a survey of 1,000 participants, revealing that while demographics and personality have little influence, an individual's geographic location is a primary factor shaping how they perceive green spaces.

Matias Quintana, Fangqi Liu, Jussi Torkko, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Tuuli Toivonen, Yi Lu, Filip Biljecki2026-03-10💻 cs

VOIC: Visible-Occluded Integrated Guidance for 3D Semantic Scene Completion

This paper introduces VOIC, a novel dual-decoder framework for monocular 3D Semantic Scene Completion that employs a Visible Region Label Extraction strategy to decouple visible-region perception from occluded-region reasoning, thereby mitigating feature dilution and achieving state-of-the-art performance on standard benchmarks.

Zaidao Han, Risa Higashita, Jiang Liu2026-03-10💻 cs

Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL

This paper demonstrates that reasoning Large Language Models significantly reduce cloud query execution costs and data consumption compared to non-reasoning models in Text-to-SQL tasks, while revealing that execution time is a poor proxy for cost efficiency and highlighting the substantial financial risks posed by non-reasoning models' tendency to generate inefficient queries.

Saurabh Deochake, Debajyoti Mukhopadhyay2026-03-10💻 cs

NashOpt -- A Python Library for Computing Generalized Nash Equilibria

NashOpt is an open-source Python library that computes generalized Nash equilibria in noncooperative games with shared constraints by leveraging joint KKT conditions, JAX-based automatic differentiation for nonlinear problems, and mixed-integer linear programming for linear-quadratic cases, while also supporting inverse and Stackelberg game design.

Alberto Bemporad2026-03-10💻 cs

Toward a Physical Theory of Intelligence

This paper introduces the Conservation-Congruent Encoding (CCE) framework, a unified physical theory that defines intelligence as an irreversible process of extracting work while minimizing dissipation, thereby deriving universal computational bounds and linking thermodynamic measurement, quantum decoherence, and spacetime geometry to establish substrate-neutral constraints for both natural and artificial intelligence.

Peter David Fagan2026-03-10💻 cs

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

This paper introduces DrivingGen, the first comprehensive benchmark for generative driving world models that addresses the lack of rigorous evaluation by combining a diverse dataset with a novel suite of metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability, thereby revealing critical trade-offs in current state-of-the-art models.

Yang Zhou, Hao Shao, Letian Wang, Zhuofan Zong, Hongsheng Li, Steven L. Waslander2026-03-10💻 cs

Machine Learning Guided Cooling System Optimization for Data Center

This paper presents a three-stage, physics-guided machine learning framework applied to the Frontier exascale supercomputer that identifies significant cooling inefficiencies and demonstrates how safe, counterfactual setpoint adjustments can recover up to 96% of excess energy consumption while maintaining thermal limits.

Shrenik Jadhav, Zheng Liu2026-03-10💻 cs

Batch-of-Thought: Cross-Instance Learning for Enhanced LLM Reasoning

This paper introduces Batch-of-Thought (BoT), a training-free method that enhances Large Language Model reasoning by jointly processing related queries to leverage cross-instance signals, thereby improving accuracy, calibration, and computational efficiency through a multi-agent reflection architecture.

Xuan Yang, Furong Jia, Roy Xie, Xiong Xi, Hengwei Bian, Jian Li, Monica Agrawal2026-03-10💻 cs

Route, Retrieve, Reflect, Repair: Self-Improving Agentic Framework for Visual Detection and Linguistic Reasoning in Medical Imaging

The paper introduces R^4, a self-improving agentic framework that enhances medical image analysis by decomposing workflows into routing, retrieval, reflection, and repair stages to iteratively refine both textual reports and spatial bounding boxes, achieving significant performance gains over single-pass VLM baselines without requiring gradient-based fine-tuning.

Md. Faiyaz Abdullah Sayeedi, Rashedur Rahman, Siam Tahsin Bhuiyan, Sefatul Wasi, Ashraful Islam, Saadia Binte Alam, AKM Mahbubur Rahman2026-03-10💻 cs

The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

This paper audits the LAION-Aesthetics Predictor to reveal how its algorithmic gaze reinforces Western, male, and imperial biases by disproportionately filtering content and prioritizing specific cultural aesthetics, ultimately urging a shift toward pluralistic evaluation methods in AI development.

Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, Haiyi Zhu2026-03-10💻 cs

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

This paper introduces "Single-Shot Planning," a secure architecture for Computer Use Agents that generates a complete, trusted execution graph before observing untrusted UI states to effectively mitigate prompt injection and branch steering attacks while maintaining competitive task performance.

Hanna Foerster, Tom Blanchard, Kristina Nikolic, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao2026-03-10💻 cs

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI

This paper investigates how users detect and respond to sycophantic behavior in conversational AI through a proposed DCR epistemology, revealing that while users employ various mitigation strategies, sycophancy is not universally harmful and can provide valued emotional support for vulnerable populations, suggesting a need for context-aware AI design rather than universal elimination.

Kazi Noshin, Syed Ishtiaque Ahmed, Sharifa Sultana2026-03-10💻 cs

BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics

This paper introduces BoxMind, a closed-loop AI system that transforms unstructured boxing footage into hierarchical tactical indicators and predictive gradients to generate expert-level strategic recommendations, which were validated during the 2024 Paris Olympics by contributing to the Chinese National Team's historic medal success.

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu2026-03-10💻 cs

Multifaceted Scenario-Aware Hypergraph Learning for Next POI Recommendation

This paper proposes the Multifaceted Scenario-Aware Hypergraph Learning (MSAHG) framework, which addresses the limitations of existing methods in handling mobility variations across distinct contexts by constructing scenario-specific disentangled sub-hypergraphs and employing a parameter-splitting mechanism to resolve inter-scenario conflicts, thereby significantly improving next POI recommendation performance.

Yuxi Lin, Yongkang Li, Jie Xing, Zipei Fan2026-03-10💻 cs

S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation

The paper introduces S2DiT, a novel Streaming Sandwich Diffusion Transformer that leverages efficient attention mechanisms, a budget-aware sandwich architecture, and a 2-in-1 distillation framework to achieve high-fidelity, real-time video generation on mobile devices with performance comparable to server-grade models.

Lin Zhao, Yushu Wu, Aleksei Lebedev, Dishani Lahiri, Meng Dong, Arpit Sahni, Michael Vasilkovsky, Hao Chen, Ju Hu, Aliaksandr Siarohin, Sergey Tulyakov, Yanzhi Wang, Anil Kag, Yanyu Li2026-03-10💻 cs

Equal-Pay Contracts

This paper investigates multi-agent contract design under equal-pay constraints, providing tight polynomial-time approximation algorithms and hardness results for various reward functions while resolving open problems in unconstrained settings and quantifying the efficiency loss of fairness via a $\Theta(\log n / \log \log n)$ price of equality.

Michal Feldman, Yoav Gal-Tzur, Tomasz Ponitka, Maya Schlesinger2026-03-10💻 cs

← Previous Next →