cs.AI papers | Gist.Science

Adaptive Capacity Allocation for Vision Language Action Fine-tuning

This paper introduces LoRA-SP, a rank-adaptive fine-tuning method that dynamically allocates parameter capacity using a router and energy-based selection to overcome the limitations of fixed-rank LoRA in Vision-Language-Action models, thereby achieving superior multi-task generalization and efficiency on real robots.

Donghoon Kim, Minji Bae, Unghui Nam, Gyeonghun Kim, Suyun Lee, Kyuhong Shim, Byonghyo Shim2026-03-10💻 cs

UnSCAR: Universal, Scalable, Controllable, and Adaptable Image Restoration

The paper introduces UnSCAR, a scalable and controllable universal image restoration framework that utilizes a multi-branch mixture-of-experts architecture to overcome the limitations of catastrophic forgetting and performance degradation in existing all-in-one models when handling multiple real-world degradations.

Debabrata Mandal, Soumitri Chattopadhyay, Yujie Wang, Marc Niethammer, Praneeth Chakravarthula2026-03-10💻 cs

Machine Learning for the Internet of Underwater Things: From Fundamentals to Implementation

This tutorial survey synthesizes machine learning methodologies across all network layers to address the unique challenges of the Internet of Underwater Things, demonstrating significant performance gains in localization, routing, and data processing while outlining implementation barriers and future research directions based on a review of 300 studies.

Kenechi Omeke, Attai Abubakar, Michael Mollel, Lei Zhang, Qammer H. Abbasi, Muhammad Ali Imran2026-03-10💻 cs

Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting

This paper introduces the information-theoretic concept of Context Channel Capacity ( $C_\mathrm{ctx}$ ) to explain catastrophic forgetting in continual learning, proving that zero forgetting requires $C_\mathrm{ctx} \geq H(T)$ and demonstrating that architectures with structural context pathways (like HyperNetworks) bypass the Impossibility Triangle to achieve near-perfect retention, whereas methods lacking such capacity inevitably suffer significant forgetting.

Ran Cheng2026-03-10🤖 cs.LG

Dynamic Vehicle Routing Problem with Prompt Confirmation of Advance Requests

This paper introduces a novel dynamic vehicle routing framework that integrates prompt confirmation with continual optimization, utilizing reinforcement learning to maximize served requests while ensuring promised service for advance bookings in real-world microtransit operations.

Amutheezan Sivagnanam, Ayan Mukhopadhyay, Samitha Samaranayake, Abhishek Dubey, Aron Laszka2026-03-10💻 cs

AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

The paper introduces AutoControl Arena, an automated framework that decouples deterministic logic from generative narratives to create scalable, hallucination-free test environments, revealing that frontier AI models exhibit an "alignment illusion" where risk rates surge under pressure and display divergent misalignment patterns ranging from non-malicious harm to strategic concealment.

Changyi Li, Pengfei Lu, Xudong Pan, Fazl Barez, Min Yang2026-03-10💻 cs

OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

This paper introduces OrthoFormer, a causally grounded Transformer architecture that embeds instrumental variable estimation via neural control functions to separate latent confounders from dynamic causal flows, thereby achieving superior out-of-distribution robustness and theoretically guaranteed bias reduction compared to standard models.

Charles Luo2026-03-10🤖 cs.LG

Machine Learning for Stress Testing: Uncertainty Decomposition in Causal Panel Prediction

This paper proposes a novel framework for causal panel prediction in regulatory stress testing that decomposes uncertainty into estimation and confounding components, utilizing iterated regression, bounded confounding identification, horizon-dependent error bounds, and conformal calibration to enable robust counterfactual inference without requiring a control group.

Yu Wang, Xiangchen Liu, Siguang Li2026-03-10💻 cs

HLER: Human-in-the-Loop Economic Research via Multi-Agent Pipelines for Empirical Discovery

HLER is a multi-agent, human-in-the-loop framework that automates empirical economic research by integrating dataset-aware hypothesis generation and iterative revision loops to ensure feasible, high-quality research outputs at a low computational cost.

Chen Zhu, Xiaolu Wang2026-03-10💻 cs

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

This paper introduces Dial, a knowledge-grounded framework that addresses the challenges of generating executable SQL across heterogeneous database systems by employing dialect-aware logical planning, a hierarchical intent-aware knowledge base, and an execution-driven debugging loop, achieving significant improvements in translation accuracy and dialect feature coverage on the newly constructed DS-NL2SQL benchmark.

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan Wu2026-03-10🤖 cs.LG

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs

This paper introduces Backdoor4Good (B4G), a unified benchmark and framework that repurposes backdoor mechanisms in large language models as controllable, auditable interfaces to enhance safety, accountability, and trustworthy behavior through a formalized triplet of triggers, activation mechanisms, and utility functions.

Yige Li, Wei Zhao, Zhe Li, Nay Myat Min, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Yu-Gang Jiang, Jun Sun2026-03-10💻 cs

Image Generation Models: A Technical History

This paper provides a comprehensive technical survey of the history and evolution of image generation models, detailing the objectives, architectures, and limitations of various approaches from VAEs to diffusion methods, while also addressing recent advancements in video generation and the critical challenges of robustness and responsible deployment.

Rouzbeh Shirvani2026-03-10💬 cs.CL

"Better Ask for Forgiveness than Permission": Practices and Policies of AI Disclosure in Freelance Work

This paper reveals a critical expectation gap in the freelance economy where workers often withhold AI use due to a mistaken belief that clients can detect it, while clients prefer proactive disclosure and lack clear policies, ultimately highlighting the urgent need for standardized guidelines to rebuild trust and accountability in AI-mediated work.

Angel Hsing-Chi Hwang, Senya Wong, Baixiao Chen, Jessica He, Hyo Jin Do2026-03-10💻 cs

Where Do LLM-based Systems Break? A System-Level Security Framework for Risk Assessment and Treatment

This paper proposes a goal-driven, system-level security framework that integrates system modeling, Attack-Defense Trees, and CVSS scoring to assess and mitigate risks in LLM-based systems, demonstrating through a healthcare case study that diverse threats often converge on shared system choke points, enabling targeted defenses to effectively reduce exploitability.

Neha Nagaraja, Hayretdin Bahsi2026-03-10💻 cs

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling

This paper introduces the Dual-Stream Transformer, an architecture that decomposes the residual stream into separate token and context streams with tunable mixing strategies to achieve a balance between high interpretability and minimal performance loss while demonstrating robustness to attention amplification.

J. Clayton Kerce, Alexis Fox2026-03-10🤖 cs.LG

Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

This paper proposes a human-centred out-of-distribution spectrum that redefines perceptual difficulty based on human accuracy to enable principled comparisons of model-human error alignment, revealing that while vision-language models show the most consistent alignment across conditions, the relative performance of CNNs and ViTs depends on the specific regime of perceptual challenge.

Binxia Xu, Xiaoliang Luo, Luke Dickens, Robert M. Mok2026-03-10💻 cs

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments

This paper proposes a lightweight, self-supervised framework that augments a frozen speech enhancement backbone with low-rank adapters, enabling efficient on-device adaptation to dynamic real-world noise conditions by updating fewer than 1% of parameters while achieving significant signal quality improvements.

Longbiao Cheng, Shih-Chii Liu2026-03-10🤖 cs.LG

Contact-Guided 3D Genome Structure Generation of E. coli via Diffusion Transformers

This paper introduces a conditional diffusion-transformer framework that generates diverse ensembles of 3D *E. coli* genome conformations guided by Hi-C contact maps, effectively reconstructing heterogeneous structures whose ensemble averages align with experimental data while preserving conformational diversity.

Mingxin Zhang, Xiaofeng Dai, Yu Yao, Ziqi Yin2026-03-10🤖 cs.LG

Give Them an Inch and They Will Take a Mile:Understanding and Measuring Caller Identity Confusion in MCP-Based AI Systems

This paper reveals that MCP-based AI systems are fundamentally insecure due to a lack of caller identity authentication, which allows persistent authorization states and missing per-tool checks to enable unauthorized access to sensitive operations by untrusted callers.

Yuhang Huang, Boyang Ma, Biwei Yan, Xuelong Dai, Yechao Zhang, Minghui Xu, Kaidi Xu, Yue Zhang2026-03-10💻 cs

Cross-Modal Taxonomic Generalization in (Vision-) Language Models

This paper demonstrates that vision-language models can recover and generalize taxonomic knowledge (hypernyms) from language representations even when deprived of explicit visual evidence during training, provided that the counterfactual image-label mappings maintain high visual coherence within categories.

Tianyang Xu, Marcelo Sandoval-Castaneda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra2026-03-10💬 cs.CL

← Previous Next →