cs papers | Gist.Science

Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning

This paper introduces VASR, a multimodal reasoning framework for Context-Aware Visual Speech Recognition (CAVSR) that leverages an Audio-Visual Chain-of-Thought (AV-CoT) to explicitly ground acoustic signals with rich visual context like scenes and on-screen text, thereby overcoming single-modality dominance and achieving state-of-the-art performance.

Wenjie Tian, Mingchen Shao, Bingshen Mu, Xuelong Geng, Chengyou Wang, Yujie Liao, Zhixian Zhao, Ziyu Zhang, Jingbin Hu, Mengqi Wei, Lei Xie2026-03-10💻 cs

Kinematics-Aware Latent World Models for Data-Efficient Autonomous Driving

This paper proposes a kinematics-aware latent world model that integrates vehicle kinematic information and geometry-aware supervision into the Recurrent State-Space Model (RSSM) to enhance spatial representation and long-horizon imagination fidelity, thereby achieving more data-efficient and stable autonomous driving policy learning compared to existing baselines.

Jiazhuo Li, Linjiang Cao, Qi Liu, Xi Xiong2026-03-10💻 cs

Towards Network-Aware Operation of Integrated Energy Systems: A Comprehensive Review

This paper provides a comprehensive review of network-aware modeling, optimization, and control methods for Integrated Energy Systems, highlighting the critical role of network constraints in addressing operational challenges and outlining future research directions to achieve scalable, efficient, and low-carbon energy operations.

Alessandra Parisio2026-03-10💻 cs

How to Steal Reasoning Without Reasoning Traces

This paper introduces "trace inversion" models that can reconstruct detailed reasoning traces from only a target model's inputs, answers, and summaries, demonstrating that hiding reasoning chains fails to prevent the theft of reasoning capabilities and enabling significant performance gains for student models fine-tuned on these synthetic traces.

Tingwei Zhang, John X. Morris, Vitaly Shmatikov2026-03-10💻 cs

Sketch-Oriented Databases

This paper introduces sketch-oriented databases, a categorical framework that unifies various graph-based paradigms and features through finite-limit sketches, while proposing localizers for lazy path inference and stuttering sketches to enable modular composition and scalable model growth.

Dominique Duval, Rachid Echahed2026-03-10💻 cs

AutoDataset: A Lightweight System for Continuous Dataset Discovery and Search

AutoDataset is a lightweight, automated system that continuously monitors arXiv to detect, extract, and index newly released datasets from research papers, enabling real-time discovery and significantly improving search efficiency by up to 80%.

Junzhe Yang, Xinghao Chen, Yunuo Liu, Zhijing Sun, Wenjin Guo, Xiaoyu Shen2026-03-10💻 cs

VisualDeltas: Learning Preferences from Visual Quality Perturbations

VisualDeltas is a lightweight, label-free preference-learning framework that leverages systematic visual quality perturbations to generate informative supervision signals, thereby improving multimodal model performance and generalization without relying on human annotations.

Hailiang Huang, Yihao Liu, Shengyue Guan, Haoze Li, Sujian Li2026-03-10💻 cs

Worst--Case to Average--Case Reductions for SIS over integers

This paper establishes a worst-case to average-case reduction for a non-modular Short Integer Solution (SIS) problem over the integers, demonstrating that solving random instances of this problem efficiently allows for the polynomial-time approximation of the Shortest Independent Vectors Problem (SIVP) within a factor of $\widetilde{O}(n^{3/2})$ .

Konstantinos A. Draziotis, Myrto Eleftheria Gkogkou2026-03-10💻 cs

From Passive Consumption to Active Interaction: Exploring Interactive LLM Scaffolding to Support Learning Engagement

This paper presents a small-scale laboratory study demonstrating that embedding lightweight interactive components into Large Language Model-generated scaffolding can shift learners from passive consumption to active engagement, thereby improving perceived attention and short-term learning outcomes.

Zixin Chen, Haotian Li, Zhe Liu, Huamin Qu, Xing Xie2026-03-10💻 cs

LLM-FK: Multi-Agent LLM Reasoning for Foreign Key Detection in Large-Scale Complex Databases

LLM-FK is a novel multi-agent framework that overcomes the limitations of conventional heuristic and naive LLM methods in detecting foreign keys within large-scale complex databases by coordinating specialized agents to prune the search space, enhance reasoning with domain knowledge, and ensure global schema consistency, thereby achieving superior accuracy and scalability.

Zijian Tang, Ying Zhang, Sibo Cai, Ruoxuan Wang2026-03-10💻 cs

Complexity Lower Bounds of Small Matrix Multiplication over Finite Fields via Backtracking and Substitution

This paper presents a novel automated method combining the substitution technique with systematic backtracking and dynamic programming to prove that the bilinear complexity of $3 \times 3$ matrix multiplication over $\mathbb{F}_2$ is at least 20, thereby improving the previous lower bound of 19.

Chengu Wang2026-03-10💻 cs

Do Deployment Constraints Make LLMs Hallucinate Citations? An Empirical Study across Four Models and Five Prompting Regimes

This empirical study demonstrates that deployment-motivated prompting constraints significantly exacerbate citation hallucinations across four large language models, with no model achieving a citation existence rate above 47.5% and a substantial portion of unverifiable outputs being fabricated, thereby underscoring the critical need for post-hoc verification in academic and software engineering contexts.

Chen Zhao, Yuan Tang, Yitian Qian2026-03-10💻 cs

Virtual Try-On for Cultural Clothing: A Benchmarking Study

This paper introduces BD-VITON, a new benchmark dataset featuring culturally diverse Bangladeshi garments with complex draping and layering challenges, and evaluates the performance of state-of-the-art virtual try-on models on this dataset to demonstrate significant improvements over zero-shot inference.

Muhammad Tausif Ul Islam, Shahir Awlad, Sameen Yeaser Adib, Md. Atiqur Rahman, Sabbir Ahmed, Md. Hasanul Kabir2026-03-10💻 cs

TopRank-Based Delivery Rate Optimization for Coded Caching under Non-Uniform Demands

This paper proposes a TopRank-based coded caching strategy that optimizes delivery rates under non-uniform, unknown file demands by ranking files based on request count differences rather than estimating exact popularities, thereby achieving superior performance and sublinear regret in scenarios with limited users, small cache capacities, or noisy observation data.

Mohammadsaber Bahadori, Seyed Pooya Shariatpanahi, Behnam Bahrak2026-03-10💻 cs

MAviS: A Multimodal Conversational Assistant For Avian Species

This paper introduces MAviS, a domain-adaptive multimodal conversational assistant for avian species that leverages the newly created MAviS-Dataset and is evaluated on the MAviS-Bench to achieve state-of-the-art performance in fine-grained bird species understanding and multimodal question answering.

Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shabzan Khan, Rao Anwer, Salman Khan, Hisham Cholakkal2026-03-10💻 cs

A Cortically Inspired Architecture for Modular Perceptual AI

This paper proposes a modular, cortically inspired architecture for perceptual AI that leverages neuroscientific principles like predictive processing and specialized modules to overcome the interpretability and generalization limitations of current monolithic models, thereby enabling more transparent and human-aligned reasoning.

Prerna Luthra2026-03-10💻 cs

Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

This paper proposes a training-centered approach that combines adversarial training with a lightweight feature-map smoothing block to generate saliency maps that are simultaneously sparse, input-stable, and output-stable, thereby enhancing their perceived trustworthiness and sufficiency.

Dipkamal Bhusal, Md Tanvirul Alam, Nidhi Rastogi2026-03-10💻 cs

Tursio for Credit Unions: Powering Structured Data Search with Automated Context Graph

This paper introduces Tursio, a secure, on-premises platform that empowers credit unions to query complex structured databases using natural language by leveraging Large Language Models to automatically generate semantic knowledge graphs and compliant query plans.

Shivani Tripathi, Ravi Shetye, Shi Qiao, Alekh Jindal2026-03-10💻 cs

Seeing the Reasoning: How LLM Rationales Influence User Trust and Decision-Making in Factual Verification Tasks

This study reveals that in factual verification tasks, users' trust and decision-making are primarily driven by the correctness and certainty framing of LLM rationales rather than their presentation format, highlighting the dual potential of well-designed rationales to either support decision-making or miscalibrate trust.

Xin Sun, Shu Wei, Jos A Bosch, Isao Echizen, Saku Sugawara, Abdallah El Ali2026-03-10💻 cs

Soft Rigid Hybrid Gripper with Inflatable Silicone Pockets for Tunable Frictional Grasping

This paper presents a soft-rigid hybrid gripper that utilizes inflatable silicone pockets to actively modulate surface friction via internal air pressure, enabling the secure grasping of diverse objects—from heavy and slippery to fragile items—without relying on excessive normal force.

Hoang Hiep Ly, Cong-Nhat Nguyen, Doan-Quang Tran, Quoc-Khanh Dang, Ngoc Duy Tran, Thi Thoa Mac, Anh Nguyen, Xuan-Thuan Nguyen, Tung D. Ta2026-03-10💻 cs

← Previous Next →