It is not always greener on the other side: Greenery perception across demographics and personalities in multiple cities

This study analyzes the discrepancies between objective and subjective urban greenery perceptions across five countries using street view imagery and a survey of 1,000 participants, revealing that while demographics and personality have little influence, an individual's geographic location is a primary factor shaping how they perceive green spaces.

Matias Quintana, Fangqi Liu, Jussi Torkko, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Tuuli Toivonen, Yi Lu, Filip Biljecki2026-03-10💻 cs

Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL

This paper demonstrates that reasoning Large Language Models significantly reduce cloud query execution costs and data consumption compared to non-reasoning models in Text-to-SQL tasks, while revealing that execution time is a poor proxy for cost efficiency and highlighting the substantial financial risks posed by non-reasoning models' tendency to generate inefficient queries.

Saurabh Deochake, Debajyoti Mukhopadhyay2026-03-10💻 cs

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

This paper introduces DrivingGen, the first comprehensive benchmark for generative driving world models that addresses the lack of rigorous evaluation by combining a diverse dataset with a novel suite of metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability, thereby revealing critical trade-offs in current state-of-the-art models.

Yang Zhou, Hao Shao, Letian Wang, Zhuofan Zong, Hongsheng Li, Steven L. Waslander2026-03-10💻 cs

Route, Retrieve, Reflect, Repair: Self-Improving Agentic Framework for Visual Detection and Linguistic Reasoning in Medical Imaging

The paper introduces R^4, a self-improving agentic framework that enhances medical image analysis by decomposing workflows into routing, retrieval, reflection, and repair stages to iteratively refine both textual reports and spatial bounding boxes, achieving significant performance gains over single-pass VLM baselines without requiring gradient-based fine-tuning.

Md. Faiyaz Abdullah Sayeedi, Rashedur Rahman, Siam Tahsin Bhuiyan, Sefatul Wasi, Ashraful Islam, Saadia Binte Alam, AKM Mahbubur Rahman2026-03-10💻 cs

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

This paper introduces "Single-Shot Planning," a secure architecture for Computer Use Agents that generates a complete, trusted execution graph before observing untrusted UI states to effectively mitigate prompt injection and branch steering attacks while maintaining competitive task performance.

Hanna Foerster, Tom Blanchard, Kristina Nikolic, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao2026-03-10💻 cs

User Detection and Response Patterns of Sycophantic Behavior in Conversational AI

This paper investigates how users detect and respond to sycophantic behavior in conversational AI through a proposed DCR epistemology, revealing that while users employ various mitigation strategies, sycophancy is not universally harmful and can provide valued emotional support for vulnerable populations, suggesting a need for context-aware AI design rather than universal elimination.

Kazi Noshin, Syed Ishtiaque Ahmed, Sharifa Sultana2026-03-10💻 cs

BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics

This paper introduces BoxMind, a closed-loop AI system that transforms unstructured boxing footage into hierarchical tactical indicators and predictive gradients to generate expert-level strategic recommendations, which were validated during the 2024 Paris Olympics by contributing to the Chinese National Team's historic medal success.

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu2026-03-10💻 cs

Multifaceted Scenario-Aware Hypergraph Learning for Next POI Recommendation

This paper proposes the Multifaceted Scenario-Aware Hypergraph Learning (MSAHG) framework, which addresses the limitations of existing methods in handling mobility variations across distinct contexts by constructing scenario-specific disentangled sub-hypergraphs and employing a parameter-splitting mechanism to resolve inter-scenario conflicts, thereby significantly improving next POI recommendation performance.

Yuxi Lin, Yongkang Li, Jie Xing, Zipei Fan2026-03-10💻 cs

S2DiT: Sandwich Diffusion Transformer for Mobile Streaming Video Generation

The paper introduces S2DiT, a novel Streaming Sandwich Diffusion Transformer that leverages efficient attention mechanisms, a budget-aware sandwich architecture, and a 2-in-1 distillation framework to achieve high-fidelity, real-time video generation on mobile devices with performance comparable to server-grade models.

Lin Zhao, Yushu Wu, Aleksei Lebedev, Dishani Lahiri, Meng Dong, Arpit Sahni, Michael Vasilkovsky, Hao Chen, Ju Hu, Aliaksandr Siarohin, Sergey Tulyakov, Yanzhi Wang, Anil Kag, Yanyu Li2026-03-10💻 cs

ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance

This paper introduces ReViP, a novel Vision-Language-Action framework that mitigates "false completion" failures caused by proprioceptive bias through vision-proprioception rebalancing and a new benchmark suite, achieving significant performance gains over existing models.

Zhuohao Li, Yinghao Li, Jian-Jian Jiang, Lang Zhou, Tianyu Zhang, Jiadong Yin, Mu Lin, Yi-Kin Wei, Wei-Shi Zheng2026-03-10💻 cs

ScenePilot-Bench: A Large-Scale Dataset and Benchmark for Evaluation of Vision-Language Models in Autonomous Driving

This paper introduces ScenePilot-Bench, a large-scale benchmark built on the diverse ScenePilot-4K dataset to comprehensively evaluate and advance vision-language models in autonomous driving through multi-granularity annotations and a safety-aware, four-axis assessment framework.

Yujin Wang, Yutong Zheng, Wenxian Fan, Tianyi Wang, Hongqing Chu, Li Zhang, Bingzhao Gao, Daxin Tian, Jianqiang Wang, Hong Chen2026-03-10💻 cs

Query-Guided Spatial-Temporal-Frequency Interaction for Music Audio-Visual Question Answering

This paper proposes QSTar, a novel query-guided spatial-temporal-frequency interaction method enhanced by a Query Context Reasoning block, which significantly improves Audio-Visual Question Answering performance by deeply integrating question-guided clues and audio frequency characteristics with visual perception, outperforming existing multimodal approaches on multiple benchmarks.

Kun Li, Michael Ying Yang, Sami Sebastian Brandt2026-03-10💻 cs

Dynamic framework for edge-connectivity maintenance of simple graphs

This paper presents a dynamic framework for maintaining kk-edge-connectivity in undirected simple graphs under edge insertions and deletions by combining Nagamochi-Ibaraki sparse certificates with Link-Cut Trees for efficient O(klogn)O(k \log n) amortized insertions and a maximum-flow-based approach for O(k3/2n3/2)O(k^{3/2} n^{3/2}) deletions, all while keeping the graph sparse with O(kn)O(kn) edges.

Blazej Wrobel2026-03-10💻 cs