cs.AI papers | Gist.Science

The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor

This paper audits the LAION-Aesthetics Predictor to reveal how its algorithmic gaze reinforces Western, male, and imperial biases by disproportionately filtering content and prioritizing specific cultural aesthetics, ultimately urging a shift toward pluralistic evaluation methods in AI development.

Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, Haiyi Zhu2026-03-10💻 cs

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

This paper introduces "Single-Shot Planning," a secure architecture for Computer Use Agents that generates a complete, trusted execution graph before observing untrusted UI states to effectively mitigate prompt injection and branch steering attacks while maintaining competitive task performance.

Hanna Foerster, Tom Blanchard, Kristina Nikolic, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao2026-03-10💻 cs

BoxMind: Closed-loop AI strategy optimization for elite boxing validated in the 2024 Olympics

This paper introduces BoxMind, a closed-loop AI system that transforms unstructured boxing footage into hierarchical tactical indicators and predictive gradients to generate expert-level strategic recommendations, which were validated during the 2024 Paris Olympics by contributing to the Chinese National Team's historic medal success.

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Qingmin Fan, Milin Zhang, Zongrui Li, Xuesi Zhou, Bo Han, Liren Chen, Chenyi Guo, Ji Wu2026-03-10💻 cs

Multifaceted Scenario-Aware Hypergraph Learning for Next POI Recommendation

This paper proposes the Multifaceted Scenario-Aware Hypergraph Learning (MSAHG) framework, which addresses the limitations of existing methods in handling mobility variations across distinct contexts by constructing scenario-specific disentangled sub-hypergraphs and employing a parameter-splitting mechanism to resolve inter-scenario conflicts, thereby significantly improving next POI recommendation performance.

Yuxi Lin, Yongkang Li, Jie Xing, Zipei Fan2026-03-10💻 cs

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

DevBench is a realistic, telemetry-driven benchmark comprising 1,800 instances across six languages that evaluates LLMs on code completion tasks with a focus on ecological validity, contamination-free assessment, and detailed diagnostic insights to guide practical model selection and development.

Pareesa Ameneh Golnari, Adarsh Kumarappan, Wen Wen, Xiaoyu Liu, Gabriel Ryan, Yuting Sun, Shengyu Fu, Elsie Nallipogu2026-03-10🤖 cs.LG

MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks

This paper introduces MAS-Orchestra, a training-time framework that optimizes multi-agent system orchestration via function-calling reinforcement learning, alongside the MASBENCH benchmark, to demonstrate that multi-agent benefits are task-dependent and to achieve significant performance gains with over 10x efficiency on complex reasoning tasks.

Zixuan Ke, Yifei Ming, Austin Xu, Ryan Chin, Xuan-Phi Nguyen, Prathyusha Jwalapuram, Jiayu Wang, Semih Yavuz, Caiming Xiong, Shafiq Joty2026-03-10💬 cs.CL

Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents

This paper introduces the Determinism-Faithfulness Assurance Harness (DFAH), a framework and set of financial benchmarks demonstrating that decision determinism and task accuracy in LLM agents are uncorrelated, thereby necessitating independent measurement to ensure reliable regulatory audit replay in financial services.

Raffi Khatchadourian2026-03-10💬 cs.CL

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA

This paper proposes a novel data-rate-aware continuous-flow architecture for CNN inference on FPGAs that mitigates hardware underutilization caused by data reduction in pooling and strided convolution layers by interleaving signals and sharing resources, thereby enabling the high-throughput implementation of complex models like MobileNet on a single device.

Tobias Habermann, Michael Mecik, Zhenyu Wang, César David Vera, Martin Kumm, Mario Garrido2026-03-10🤖 cs.LG

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

MeanCache is a training-free framework that accelerates Flow Matching inference by replacing instantaneous velocity caching with an average-velocity approach using cached Jacobian-vector products and a trajectory-stability scheduling strategy, achieving significant speedups (up to 4.56X) while maintaining high generation quality across models like FLUX.1 and HunyuanVideo.

Huanlin Gao, Ping Chen, Fuyuan Shi, Ruijia Wu, Li YanTao, Qiang Hui, Yuren You, Ting Lu, Chao Tan, Shaoan Zhao, Zhaoxiang Liu, Fang Zhao, Kai Wang, Shiguo Lian2026-03-10🤖 cs.LG

BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics

This paper introduces BioAgent Bench, a comprehensive evaluation suite and dataset for assessing AI agents in bioinformatics, which reveals that while frontier models can reliably construct multi-step pipelines, they lack robustness against perturbations and may be unsuitable for privacy-sensitive applications compared to open-weight alternatives.

Dionizije Fa, Marko Čuljak, Bruno Pandža, Mateo Čupic2026-03-10💻 cs

RedSage: A Cybersecurity Generalist LLM

The paper introduces RedSage, an open-source, locally deployable cybersecurity LLM trained on a massive curated dataset and agentic augmentation pipeline, which achieves state-of-the-art performance on specialized cybersecurity benchmarks while also improving general reasoning capabilities.

Naufal Suryanto, Muzammal Naseer, Pengfei Li, Syed Talal Wasim, Jinhui Yi, Juergen Gall, Paolo Ceravolo, Ernesto Damiani2026-03-10💬 cs.CL

Real-Time Aligned Reward Model beyond Semantics

This paper introduces R2M, a novel lightweight RLHF framework that mitigates reward overoptimization by leveraging real-time policy hidden states to dynamically align the reward model with the policy's evolving distribution, rather than relying solely on static semantic representations.

Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuefeng Xiao, Hongyan Xie, Li Huaqiu, Songshi Liang, Zhongxiang Dai, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang2026-03-10💻 cs

Bitcoin Price Prediction using Machine Learning and Combinatorial Fusion Analysis

This paper proposes a Bitcoin price prediction model using Combinatorial Fusion Analysis (CFA) to integrate diverse machine learning models via rank-score characteristics and weighted combinations, achieving a superior Mean Absolute Percentage Error (MAPE) of 0.19% that outperforms individual models and existing prediction methods.

Yuanhong Wu, Wei Ye, Jingyan Xu, D. Frank Hsu2026-03-10🤖 cs.LG

Impact of LLMs news Sentiment Analysis on Stock Price Movement Prediction

This paper evaluates the impact of LLM-based news sentiment analysis on stock price prediction, demonstrating that DeBERTa outperforms other models and that an ensemble approach achieves 80% accuracy, while sentiment features provide modest improvements to various time-series forecasting architectures.

Walid Siala (SnT, University of Luxembourg, Luxembourg), Ahmed Khanfir (RIADI, ENSI, University of Manouba, Tunisia, SnT, University of Luxembourg, Luxembourg), Mike Papadakis (SnT, University of Luxembourg, Luxembourg)2026-03-10💻 cs

In-Run Data Shapley for Adam Optimizer

This paper introduces Adam-Aware In-Run Data Shapley, a novel method that overcomes the limitations of SGD-based attribution in adaptive optimizers by deriving a closed-form approximation and a Linearized Ghost Approximation to achieve near-perfect fidelity in data contribution estimation while maintaining high training efficiency.

Meng Ding, Zeqing Zhang, Di Wang, Lijie Hu2026-03-10🤖 cs.LG

Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? A Study of Hierarchical Gating and Calibration

This paper investigates whether Schwartz higher-order values improve sentence-level human value detection, finding that while hierarchical gating offers limited benefits, calibration techniques and hybrid ensembles significantly boost performance, suggesting the value hierarchy is more effective as an inductive bias than a rigid routing mechanism.

Víctor Yeste, Paolo Rosso2026-03-10🤖 cs.LG

Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning

This paper introduces T2T (Thickening-to-Thinning), a dynamic reward shaping framework inspired by human learning dynamics that enhances LLM reasoning by encouraging longer, exploratory trajectories on incorrect attempts and penalizing length upon success, thereby outperforming standard baselines on mathematical benchmarks.

Wenze Lin, Zhen Yang, Xitai Jiang, Pony Ma, Gao Huang2026-03-10🤖 cs.LG

Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

This paper introduces FSTab, a framework that demonstrates how LLM-generated software exhibits predictable, recurring vulnerabilities by enabling black-box attacks based on frontend features and quantifying the consistency of these flaws across different domains and model variations.

Tomer Kordonsky, Maayan Yamin, Noam Benzimra, Amit LeVi, Avi Mendelson2026-03-10💻 cs

Semantic Search over 9 Million Mathematical Theorems

This paper introduces a scalable semantic search system for a corpus of 9.2 million mathematical theorems, demonstrating that representing theorems with natural-language descriptions significantly improves retrieval accuracy for both specific theorems and entire papers compared to existing baselines.

Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Jarod Alper, Giovanni Inchiostro, Vasily Ilin2026-03-10🔢 math

LMMRec: LLM-driven Motivation-aware Multimodal Recommendation

This paper introduces LMMRec, a model-agnostic framework that leverages large language models and chain-of-thought prompting to extract fine-grained user and item motivations from heterogeneous text data, effectively aligning them with interaction signals to significantly improve multimodal recommendation performance.

Yicheng Di, Zhanjie Zhang, Yun Wang, Jinren Liu, Jiaqi Yan, Jiyu Wei, Xiangyu Chen, Yuan Liu2026-03-10💻 cs

← Previous Next →