EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation

EasyAnimate is a high-performance video generation framework that leverages diffusion transformers enhanced by Hybrid Window Attention for improved efficiency, reward backpropagation for better quality alignment, and additional optimizations like token-length training and multimodal text encoding to achieve state-of-the-art results.

Jiaqi Xu, Kunzhe Huang, Xinyi Zou + 5 more2026-03-06💻 cs

Enhancing Pancreatic Cancer Staging with Large Language Models: The Role of Retrieval-Augmented Generation

This study demonstrates that Retrieval-Augmented Generation (RAG) significantly enhances the accuracy and transparency of pancreatic cancer staging in large language models by enabling them to retrieve and cite relevant clinical guidelines, outperforming both non-RAG versions of the same model and models provided with guidelines but lacking retrieval capabilities.

Hisashi Johno, Yuki Johno, Akitomo Amakawa + 9 more2026-03-06💻 cs

Enhancing multimodal analogical reasoning with Logic Augmented Generation

This paper introduces a Logic Augmented Generation (LAG) framework that combines semantic knowledge graphs with prompt heuristics to enhance multimodal analogical reasoning, demonstrating superior performance and explainability in metaphor detection tasks compared to existing baselines and human benchmarks, while also highlighting current limitations in domain-specific understanding.

Anna Sofia Lippolis, Andrea Giovanni Nuzzolese, Aldo Gangemi2026-03-06💻 cs

Computational Fact-Checking of Online Discourse: Scoring scientific accuracy in climate change related news articles

This paper presents a semi-automated workflow using LLMs and knowledge graphs to quantify the scientific accuracy of climate change news, finding that while expert-validated tools offer beneficial veracity indications, current limitations in knowledge graph completeness and processing scale hinder widespread application.

Tim Wittenborg, Constantin Sebastian Tremel, Markus Stocker + 1 more2026-03-06💻 cs

SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models

The paper introduces SealQA, a new benchmark comprising three challenging flavors (Seal-0, Seal-Hard, and LongSeal) designed to evaluate search-augmented language models on fact-seeking tasks with noisy or conflicting web results, revealing that even frontier models struggle significantly with reasoning accuracy, robustness to noise, and long-context document retrieval.

Thinh Pham, Nguyen Nguyen, Pratibha Zunjare + 3 more2026-03-06💻 cs

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements

This paper introduces EDINET-Bench, a challenging open-source benchmark derived from ten years of Japanese financial reports to evaluate LLMs on complex tasks like fraud detection and earnings forecasting, revealing that current models struggle significantly without specialized scaffolding and highlighting the need for more realistic evaluation frameworks.

Issa Sugiura, Takashi Ishida, Taro Makino + 4 more2026-03-06💻 cs

Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective

This paper demonstrates that Reinforcement Fine-Tuning (RFT) outperforms Supervised Fine-Tuning (SFT) in preserving prior knowledge for multimodal large language models by leveraging training data with smaller influence magnitudes and better alignment to the base model's probability landscape, thereby mitigating catastrophic forgetting while enabling effective task adaptation.

Zhihao Zhang, Qiaole Dong, Qi Zhang + 12 more2026-03-06💻 cs

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining

MuRating is a scalable framework that transfers high-quality English data-quality signals to a unified multilingual evaluator via pairwise comparisons and translation, enabling the selection of balanced, high-quality datasets that significantly improve the performance of multilingual large language models on both English and non-English benchmarks.

Zhixun Chen, Ping Guo, Wenhan Han + 10 more2026-03-06💻 cs