cs.CL papers | Gist.Science

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

This paper introduces Dial, a knowledge-grounded framework that addresses the challenges of generating executable SQL across heterogeneous database systems by employing dialect-aware logical planning, a hierarchical intent-aware knowledge base, and an execution-driven debugging loop, achieving significant improvements in translation accuracy and dialect feature coverage on the newly constructed DS-NL2SQL benchmark.

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan WuTue, 10 Ma🤖 cs.LG

Image Generation Models: A Technical History

This paper provides a comprehensive technical survey of the history and evolution of image generation models, detailing the objectives, architectures, and limitations of various approaches from VAEs to diffusion methods, while also addressing recent advancements in video generation and the critical challenges of robustness and responsible deployment.

Rouzbeh ShirvaniTue, 10 Ma💬 cs.CL

The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling

This paper introduces the Dual-Stream Transformer, an architecture that decomposes the residual stream into separate token and context streams with tunable mixing strategies to achieve a balance between high interpretability and minimal performance loss while demonstrating robustness to attention amplification.

J. Clayton Kerce, Alexis FoxTue, 10 Ma🤖 cs.LG

Cross-Modal Taxonomic Generalization in (Vision-) Language Models

This paper demonstrates that vision-language models can recover and generalize taxonomic knowledge (hypernyms) from language representations even when deprived of explicit visual evidence during training, provided that the counterfactual image-label mappings maintain high visual coherence within categories.

Tianyang Xu, Marcelo Sandoval-Castaneda, Karen Livescu, Greg Shakhnarovich, Kanishka MisraTue, 10 Ma💬 cs.CL

Skip to the Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs. Autoregressive LLMs

This paper reveals that diffusion language models develop distinct, hierarchical internal representations with early-layer redundancy compared to autoregressive models, enabling a novel, training-free layer-skipping inference method that significantly reduces computational costs while maintaining high performance.

Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee, Fatih PorikliTue, 10 Ma💬 cs.CL

A Joint Neural Baseline for Concept, Assertion, and Relation Extraction from Clinical Text

This paper proposes a novel end-to-end joint neural system that simultaneously optimizes concept recognition, assertion classification, and relation extraction for clinical text, significantly outperforming traditional pipeline baselines across all three tasks.

Fei Cheng, Ribeka Tanaka, Sadao KurohashiTue, 10 Ma💬 cs.CL

Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech

This paper introduces Bolbosh, the first open-source neural Text-to-Speech system for Kashmiri, which utilizes a script-aware, supervised cross-lingual adaptation strategy based on Optimal Transport Conditional Flow Matching and a three-stage acoustic enhancement pipeline to overcome the limitations of zero-shot multilingual baselines and achieve significantly higher speech quality and intelligibility.

Tajamul Ashraf, Burhaan Rasheed Zargar, Saeed Abdul Muizz, Ifrah Mushtaq, Nazima Mehdi, Iqra Altaf Gillani, Aadil Amin Kak, Janibul BashirTue, 10 Ma💬 cs.CL

TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

TableMind++ enhances the existing TableMind framework for tool-augmented table reasoning by introducing an uncertainty-aware inference framework that mitigates hallucinations through memory-guided plan pruning, confidence-based action refinement, and dual-weighted trajectory aggregation, thereby achieving superior performance on diverse benchmarks.

Mingyue Cheng, Shuo Yu, Chuang Jiang, Xiaoyu Tao, Qingyang Mao, Jie Ouyang, Qi Liu, Enhong ChenTue, 10 Ma💬 cs.CL

Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data

The paper introduces "Accent Vector," a novel method that enables fine-grained, controllable accent manipulation in multilingual Text-to-Speech systems by deriving accent characteristics from native non-English speech, thereby eliminating the need for accented training data.

Thanathai Lertpetchpun, Thanapat Trachu, Jihwan Lee, Tiantian Feng, Dani Byrd, Shrikanth NarayananTue, 10 Ma💬 cs.CL

MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

The paper introduces MAWARITH, a large-scale Arabic dataset and the MIR-E evaluation metric designed to benchmark and improve large language models' ability to perform complex, multi-step reasoning for Islamic inheritance law, revealing that while advanced models like Gemini-2.5-flash achieve high performance, many others struggle with critical legal rules and error propagation.

Abdessalam Bouchekif, Shahd Gaben, Samer Rashwani, Somaya Eltanbouly, Mutaz Al-Khatib, Heba Sbahi, Mohammed Ghaly, Emad MohamedTue, 10 Ma💬 cs.CL

Learning-free L2-Accented Speech Generation using Phonological Rules

This paper proposes a learning-free text-to-speech framework that generates L2-accented speech by applying phonological rules to phoneme sequences within a multilingual TTS model, enabling explicit accent control without requiring large-scale accented training datasets.

Thanathai Lertpetchpun, Yoonjeong Lee, Jihwan Lee, Tiantian Feng, Dani Byrd, Shrikanth NarayananTue, 10 Ma💬 cs.CL

Nw\=ach\=a Mun\=a: A Devanagari Speech Corpus and Proximal Transfer Benchmark for Nepal Bhasha ASR

This paper introduces Nw\=ach\=a Mun\=a, the first manually transcribed Devanagari speech corpus for the endangered Nepal Bhasha, and demonstrates that proximal cross-lingual transfer from Nepali achieves competitive automatic speech recognition performance comparable to large multilingual models while being significantly more computationally efficient.

Rishikesh Kumar Sharma, Safal Narshing Shrestha, Jenny Poudel, Rupak Tiwari, Arju Shrestha, Rupak Raj Ghimire, Bal Krishna BalTue, 10 Ma💬 cs.CL

KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation

KCoEvo is a knowledge graph-augmented framework that addresses the challenges of API-driven code evolution by decomposing migration into path retrieval and informed generation stages, significantly improving accuracy and execution success over standard LLM baselines through structured reasoning and synthetic supervision.

Jiazhen Kang, Yuchen Lu, Chen Jiang, Jinrui Liu, Tianhao Zhang, Bo Jiang, Ningyuan Sun, Tongtong Wu, Guilin QiTue, 10 Ma💬 cs.CL

StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

This paper introduces StyleBench, a multi-turn dialogue benchmark designed to systematically evaluate and quantify the ability of speech language models to control conversational speaking styles across emotion, speed, volume, and pitch dimensions, revealing performance gaps between current models and highlighting directions for future improvement.

Haishu Zhao, Aokai Hao, Yuan Ge, Zhenqiang Hong, Tong Xiao, Jingbo ZhuTue, 10 Ma💬 cs.CL

KohakuRAG: A simple RAG framework with hierarchical document indexing

KohakuRAG is an open-source, hierarchical RAG framework that achieves state-of-the-art performance on the WattBot 2025 Challenge by preserving document structure through a four-level tree representation, enhancing retrieval via LLM-powered query planning, and stabilizing outputs with ensemble voting, thereby outperforming existing methods in precision and citation accuracy.

Shih-Ying Yeh, Yueh-Feng Ku, Ko-Wei Huang, Buu-Khang TuTue, 10 Ma💬 cs.CL

Scalable Training of Mixture-of-Experts Models with Megatron Core

This paper presents Megatron Core, a scalable and production-ready open-source framework that addresses the coupled memory, communication, and computation challenges of Mixture-of-Experts (MoE) training through integrated system-level optimizations, enabling high-performance training of models ranging from billions to trillions of parameters on large-scale GPU clusters.

Zijie Yan (NVIDIA), Hongxiao Bai (NVIDIA), Xin Yao (NVIDIA), Dennis Liu (NVIDIA), Tong Liu (NVIDIA), Hongbin Liu (NVIDIA), Pingtian Li (NVIDIA), Evan Wu (NVIDIA), Shiqing Fan (NVIDIA), Li Tao (NVIDIA), Robin Zhang (NVIDIA), Yuzhong Wang (NVIDIA), Shifang Xu (NVIDIA), Jack Chang (NVIDIA), Xuwen Chen (NVIDIA), Kunlun Li (NVIDIA), Yan Bai (NVIDIA), Gao Deng (NVIDIA), Nan Zheng (NVIDIA), Vijay Anand Korthikanti (NVIDIA), Abhinav Khattar (NVIDIA), Ethan He (NVIDIA), Soham Govande (NVIDIA), Sangkug Lym (NVIDIA), Zhongbo Zhu (NVIDIA), Qi Zhang (NVIDIA), Haochen Yuan (NVIDIA), Xiaowei Ren (NVIDIA), Deyu Fu (NVIDIA), Tailai Ma (NVIDIA), Shunkang Zhang (NVIDIA), Jiang Shao (NVIDIA), Ray Wang (NVIDIA), Santosh Bhavani (NVIDIA), Xipeng Li (NVIDIA), Chandler Zhou (NVIDIA), David Wu (NVIDIA), Yingcan Wei (NVIDIA), Ashwath Aithal (NVIDIA), Michael Andersch (NVIDIA), Mohammad Shoeybi (NVIDIA), Jiajie Yao (NVIDIA), June Yang (NVIDIA)Tue, 10 Ma🤖 cs.LG

Large Language Model for Discrete Optimization Problems: Evaluation and Step-by-step Reasoning

This paper evaluates the capabilities of various large language models, including Llama-3 and ChatGPT, in solving diverse discrete optimization problems using natural language datasets, revealing that while stronger models generally perform better, Chain-of-Thought reasoning is not universally effective and data augmentation can improve performance on simpler tasks despite introducing instability.

Tianhao Qian, Guilin Qi, Z. Y. Wu, Ran Gu, Xuanyi Liu, Canchen LyuTue, 10 Ma💬 cs.CL

3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models

To address the "spatial intelligence gap" where Vision-Language Models struggle with elementary 3D tasks despite strong logical reasoning, the paper introduces 3ViewSense, a framework that leverages an engineering-inspired "Simulate-and-Reason" mechanism to ground spatial understanding in orthographic views, significantly improving performance on occlusion-heavy counting and view-consistent reasoning benchmarks.

Shaoxiong Zhan, Yanlin Lai, Zheng Liu, Hai Lin, Shen Li, Xiaodong Cai, Zijian Lin, Wen Huang, Hai-Tao ZhengTue, 10 Ma💬 cs.CL

Whitening Reveals Cluster Commitment as the Geometric Separator of Hallucination Types

This paper demonstrates that applying PCA-whitening to GPT-2-small embeddings reveals cluster commitment as the geometric separator distinguishing hallucination types, specifically resolving the previously indistinguishable "wrong-well convergence" and "coverage gap" failures while identifying the inability to separate "center-drift" from "wrong-well convergence" as a model capacity limitation rather than a measurement artifact.

Matic KorunTue, 10 Ma💬 cs.CL

QuadAI at SemEval-2026 Task 3: Ensemble Learning of Hybrid RoBERTa and LLMs for Dimensional Aspect-Based Sentiment Analysis

The QuadAI system for SemEval-2026 Task 3 achieves superior performance in dimensional aspect-based sentiment regression by employing an ensemble learning framework that combines a hybrid RoBERTa encoder with large language models, leveraging the complementary strengths of both architectures to significantly reduce RMSE and improve correlation scores.

A. J. W. de Vink, Filippos Karolos Ventirozos, Natalia Amat-Lefort, Lifeng HanTue, 10 Ma💬 cs.CL

← Previous Next →