cs.CL papers | Gist.Science

EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

EasyAnimate is a high-performance video generation framework that leverages diffusion transformers enhanced by Hybrid Window Attention for improved efficiency, reward backpropagation for better quality alignment, and additional optimizations like token-length training and multimodal text encoding to achieve state-of-the-art results.

Jiaqi Xu, Kunzhe Huang, Xinyi Zou + 5 more2026-03-06💻 cs

Vector Retrieval with Similarity and Diversity: How Hard Is It?

This paper addresses the theoretical and practical challenges of balancing similarity and diversity in dense vector retrieval by formally defining the NP-complete VRSD problem and proposing a novel parameter-free heuristic that outperforms existing methods like MMR and k-DPP.

Hang Gao, Dong Deng, Yongfeng Zhang2026-03-06💻 cs

Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

This study demonstrates that Retrieval-Augmented Generation (RAG) significantly enhances the accuracy and transparency of pancreatic cancer staging in large language models by enabling them to retrieve and cite relevant clinical guidelines, outperforming both non-RAG versions of the same model and models provided with guidelines but lacking retrieval capabilities.

Hisashi Johno, Yuki Johno, Akitomo Amakawa + 9 more2026-03-06💻 cs

Enhancing multimodal analogical reasoning with Logic Augmented Generation

This paper introduces a Logic Augmented Generation (LAG) framework that combines semantic knowledge graphs with prompt heuristics to enhance multimodal analogical reasoning, demonstrating superior performance and explainability in metaphor detection tasks compared to existing baselines and human benchmarks, while also highlighting current limitations in domain-specific understanding.

Anna Sofia Lippolis, Andrea Giovanni Nuzzolese, Aldo Gangemi2026-03-06💻 cs

Computational Fact-Checking of Online Discourse: Scoring scientific accuracy in climate change related news articles

This paper presents a semi-automated workflow using LLMs and knowledge graphs to quantify the scientific accuracy of climate change news, finding that while expert-validated tools offer beneficial veracity indications, current limitations in knowledge graph completeness and processing scale hinder widespread application.

Tim Wittenborg, Constantin Sebastian Tremel, Markus Stocker + 1 more2026-03-06💻 cs

Learning Virtual Machine Scheduling in Cloud Computing through Language Agents

This paper proposes MiCo, a hierarchical language agent framework that leverages large language models to design adaptive heuristics for solving the complex Online Dynamic Multidimensional Bin Packing problem in cloud VM scheduling, achieving a 96.9% competitive ratio in large-scale, real-world scenarios.

JieHao Wu, Ziwei Wang, Junjie Sheng + 3 more2026-03-06💻 cs

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

This paper introduces CausalPitfalls, a comprehensive benchmark designed to rigorously evaluate and expose the significant limitations of large language models in handling statistical causal inference pitfalls, such as Simpson's paradox, through both direct and code-assisted prompting protocols.

Jin Du, Li Chen, Xun Xian + 6 more2026-03-06💻 cs

ShIOEnv: A Command Evaluation Environment for Grammar-Constrained Synthesis and Execution Behavior Modeling

This paper introduces ShIOEnv, a grammar-constrained, self-supervised Bash environment that generates 2.1 million system-grounded input-output pairs to significantly improve the accuracy of modeling complex command-line execution behaviors compared to prior execution-free approaches.

Jarrod Ragsdale, Rajendra Boppana2026-03-06💻 cs

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

The paper introduces SealQA, a new benchmark comprising three challenging flavors (Seal-0, Seal-Hard, and LongSeal) designed to evaluate search-augmented language models on fact-seeking tasks with noisy or conflicting web results, revealing that even frontier models struggle significantly with reasoning accuracy, robustness to noise, and long-context document retrieval.

Thinh Pham, Nguyen Nguyen, Pratibha Zunjare + 3 more2026-03-06💻 cs

A Signal Contract for Online Language Grounding and Discovery in Decision-Making

This paper introduces LUCIFER, an inference-only middleware that decouples online language grounding from decision-making via a Signal Contract, enabling autonomous systems to robustly convert evolving human verbal reports into control-relevant signals that improve safety and information efficiency across diverse planning architectures.

Dimitris Panagopoulos, Adolfo Perrusquia, Weisi Guo2026-03-06💻 cs

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

This paper introduces EDINET-Bench, a challenging open-source benchmark derived from ten years of Japanese financial reports to evaluate LLMs on complex tasks like fraud detection and earnings forecasting, revealing that current models struggle significantly without specialized scaffolding and highlighting the need for more realistic evaluation frameworks.

Issa Sugiura, Takashi Ishida, Taro Makino + 4 more2026-03-06💻 cs

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

This paper demonstrates that Reinforcement Fine-Tuning (RFT) outperforms Supervised Fine-Tuning (SFT) in preserving prior knowledge for multimodal large language models by leveraging training data with smaller influence magnitudes and better alignment to the base model's probability landscape, thereby mitigating catastrophic forgetting while enabling effective task adaptation.

Zhihao Zhang, Qiaole Dong, Qi Zhang + 12 more2026-03-06💻 cs

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

The paper introduces La Leaderboard, the first open-source, community-driven initiative that evaluates 50 generative LLMs across 66 datasets covering Spanish varieties and the languages of Spain and Latin America to establish evaluation standards and promote linguistic diversity.

María Grandury, Javier Aula-Blasco, Júlia Falcão + 22 more2026-03-06💻 cs

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

MuRating is a scalable framework that transfers high-quality English data-quality signals to a unified multilingual evaluator via pairwise comparisons and translation, enabling the selection of balanced, high-quality datasets that significantly improve the performance of multilingual large language models on both English and non-English benchmarks.

Zhixun Chen, Ping Guo, Wenhan Han + 10 more2026-03-06💻 cs

Eka-Eval: An Evaluation Framework for Low-Resource Multilingual Large Language Models

This paper introduces Eka-Eval, an open-source, modular evaluation framework designed to provide comprehensive, accessible, and reproducible benchmarking for low-resource multilingual large language models through a unified platform that integrates over 55 benchmarks and outperforms existing baselines in usability and efficiency.

Samridhi Raj Sinha, Rajvee Sheth, Abhishek Upperwal + 1 more2026-03-06💻 cs

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

This paper introduces TreeBench, a diagnostic benchmark for evaluating traceable visual grounded reasoning, and proposes TreeVGR, a reinforcement learning-based training paradigm that significantly enhances model performance by jointly supervising localization and reasoning.

Haochen Wang, Xiangtai Li, Zilong Huang + 9 more2026-03-06💻 cs

Vevo2: A Unified and Controllable Framework for Speech and Singing Voice Generation

Vevo2 is a unified framework for controllable speech and singing voice generation that employs novel audio tokenizers and a two-stage modeling approach with specialized training strategies to achieve flexible control over content, prosody, style, and timbre while demonstrating strong generalization across diverse synthesis tasks.

Xueyao Zhang, Junan Zhang, Yuancheng Wang + 5 more2026-03-06💻 cs

How Quantization Shapes Bias in Large Language Models

This study comprehensively evaluates how weight and activation quantization influences various forms of bias in large language models, revealing that while it can reduce toxicity and preserve sentiment, it often exacerbates stereotypes and unfairness in generative tasks, particularly under aggressive compression.

Federico Marcuzzi, Xuefei Ning, Roy Schwartz + 1 more2026-03-06💻 cs

New Insights into Optimal Alignment of Acoustic and Linguistic Representations for Knowledge Transfer in ASR

This paper proposes an unbalanced optimal transport-based alignment model that reframes acoustic-linguistic matching as a detection problem to effectively handle structural asymmetries and distributional mismatches, thereby improving knowledge transfer performance in CTC-based automatic speech recognition systems.

Xugang Lu, Peng Shen, Hisashi Kawai2026-03-06💻 cs

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition

This paper proposes TSPC, a novel two-stage phoneme-centric architecture that leverages an extended Vietnamese phoneme set as an intermediate representation to significantly improve Vietnamese-English code-switching speech recognition accuracy while maintaining computational efficiency.

Tran Nguyen Anh, Truong Dinh Dung, Vo Van Nam + 1 more2026-03-06💻 cs

← Previous Next →