cs.CL papers | Gist.Science

Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models

This position paper argues that model providers should expose vector prompt inputs as a public interface for large language models, citing evidence that they offer superior scalability and stability for inference-only customization compared to traditional text-based prompting.

Liangwei Yang, Shiyu Wang, Haolin Chen + 12 more2026-03-05✓ Author reviewed💬 cs.CL

The Company You Keep: How LLMs Respond to Dark Triad Traits

This study investigates how Large Language Models respond to user prompts reflecting Dark Triad traits, revealing that while models predominantly exhibit corrective behavior, they occasionally reinforce harmful tendencies depending on the trait severity and sentiment, highlighting the need for safer conversational system designs.

Zeyi Lu, Angelica Henestrosa, Pavel Chizhov + 1 more2026-03-05💬 cs.CL

$V_1$ : Unifying Generation and Self-Verification for Parallel Reasoners

The paper introduces $V_1$ , a framework that unifies generation and self-verification through efficient pairwise ranking and a tournament-based uncertainty-guided algorithm, significantly improving test-time scaling performance and efficiency on complex reasoning and code generation benchmarks compared to existing pointwise verification and standard reinforcement learning methods.

Harman Singh, Xiuyu Li, Kusha Sareen + 14 more2026-03-05💬 cs.CL

World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

This paper demonstrates that static word embeddings derived solely from co-occurrence statistics retain significant spatial and temporal structure, indicating that linear probe recoverability in LLMs does not necessarily prove the existence of internal world models beyond what is already latent in raw text.

Elan Barenholtz2026-03-05🤖 cs.AI

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

The AILS-NTUA team achieved first place in SemEval-2026 Task 12 with a 0.95 accuracy score by deploying a three-stage system that integrates graph-based retrieval, reflective prompt evolution for LLM-driven abductive reasoning, and post-hoc consistency enforcement, while their cross-model analysis identified systematic failure modes in multi-label causal reasoning across 14 models.

Nikolas Karafyllis, Maria Lymperaiou, Giorgos Filandrianos + 2 more2026-03-05💬 cs.CL

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Pointer-CAD is a novel LLM-based framework that unifies B-rep models and command sequences through pointer-based entity selection, effectively overcoming the limitations of traditional command-only methods by enabling complex geometric editing and significantly reducing topological errors caused by quantization.

Dacheng Qi, Chenyu Wang, Jingwei Xu + 6 more2026-03-05💬 cs.CL

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

This paper introduces Dual-Modality Multi-Stage Adversarial Safety Training (DMAST), a three-stage framework that co-trains multimodal web agents and attackers via imitation learning, oracle-guided fine-tuning, and adversarial reinforcement learning to effectively defend against cross-modal attacks while significantly improving task completion efficiency.

Haoyu Liu, Dingcheng Li, Lukas Rutishauser + 1 more2026-03-05🤖 cs.AI

$τ$ -Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

This paper introduces $\tau$ -Knowledge, a new benchmark featuring the $\tau$ -Banking domain to evaluate conversational agents' ability to coordinate unstructured knowledge retrieval with tool use in complex, policy-driven workflows, revealing that even frontier models struggle with low success rates and reliability in such realistic, long-horizon interactions.

Quan Shi, Alexandra Zytek, Pedram Razavi + 2 more2026-03-05🤖 cs.AI

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

TaxonRL is a reinforcement learning framework that employs hierarchical intermediate rewards to decompose fine-grained visual reasoning into structured taxonomic steps, achieving state-of-the-art accuracy and interpretable decision-making that surpasses human performance on challenging species classification tasks.

Maximilian von Klinski, Maximilian Schall2026-03-05💬 cs.CL

The 2020s Political Economy of Machine Translation

This paper argues that while the rapid deployment of machine translation in the 2020s will significantly lower language barriers to facilitate global communication and trade, it will simultaneously create uneven reductions in linguistic boundaries that pose new challenges for the equitable distribution of ideas, innovation, and economic growth.

Steven Weber2026-03-04💬 cs.CL

Thought Flow Nets: From Single Predictions to Trains of Model Thought

This paper proposes "Thought Flow Nets," a self-correcting mechanism inspired by Hegelian dialectics that enables AI models to iteratively refine predictions through a sequence of thoughts, thereby improving both model accuracy and human user performance compared to single-output approaches.

Hendrik Schuff, Heike Adel, Ngoc Thang Vu2026-03-04🤖 cs.LG

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering

This paper introduces VQA-MHUG, a novel multimodal gaze dataset, and demonstrates that for the first time, higher correlation between neural and human attention on text is a significant predictor of Visual Question Answering model performance, highlighting the need for improved text attention mechanisms in vision-language tasks.

Ekta Sood, Fabian Kögel, Florian Strohm + 2 more2026-03-04💬 cs.CL

Multimodal Integration of Human-Like Attention in Visual Question Answering

The paper introduces MULAN, a novel Visual Question Answering model that integrates human-like attention from both image and text modalities into neural self-attention layers, achieving state-of-the-art performance on the VQAv2 dataset with significantly fewer trainable parameters than prior methods.

Ekta Sood, Fabian Kögel, Philipp Müller + 3 more2026-03-04💬 cs.CL

Is Attention always needed? A Case Study on Language Identification from Speech

This paper proposes a CRNN-based Language Identification model using MFCC features that achieves over 98% accuracy across thirteen Indian languages and demonstrates robust performance in noisy conditions, while questioning the necessity of attention mechanisms through comparative analysis with state-of-the-art CNN and attention-based models.

Atanu Mandal, Santanu Pal, Indranil Dutta + 2 more2026-03-04⚡ eess

Reproduction and Replication of an Adversarial Stylometry Experiment

This paper reproduces and replicates a seminal study on adversarial stylometry, confirming the original conclusion that anonymity is difficult to maintain but revealing that the effectiveness of certain defenses may be overstated due to a lack of control groups, while also highlighting round-trip translation as a promising automatic method for reducing authorship attribution accuracy.

Haining Wang, Patrick Juola, Allen Riddell2026-03-04💬 cs.CL

Statistical Machine Translation for Indic Languages

This paper presents the development and evaluation of Statistical Machine Translation (SMT) systems using the MOSES toolkit to translate between English and fifteen low-resource Indian languages, leveraging the Samanantar and OPUS datasets for training, Flores-200 for testing, and various preprocessing and reordering techniques to optimize translation quality as measured by BLEU, METEOR, and RIBES metrics.

Sudhansu Bala Das, Divyajoti Panda, Tapas Kumar Mishra + 1 more2026-03-04💬 cs.CL

Predictive Authoring for Brazilian Portuguese Augmentative and Alternative Communication

This paper proposes using the BERTimbau model to predict pictograms in Brazilian Portuguese Augmentative and Alternative Communication (AAC) systems, demonstrating that representing pictograms via captions yields the highest accuracy while also exploring the potential of using images for prediction.

Jayr Pereira, Rodrigo Nogueira, Cleber Zanchettin + 1 more2026-03-04🤖 cs.AI

StarWhisper Telescope: An AI framework for automating end-to-end astronomical observations

The StarWhisper Telescope system is an AI agent framework that automates end-to-end astronomical observations by integrating large language models with specialized workflows to autonomously plan observations, analyze data, and trigger follow-ups, thereby reducing human intervention and demonstrating scalable potential for future large-scale telescope arrays.

Cunshi Wang, Yu Zhang, Yuyang Li + 25 more2026-03-04🔭 astro-ph

BioChemInsight: An Online Platform for Automated Extraction of Chemical Structures and Activity Data from Patents

BioChemInsight is an open-source pipeline that integrates advanced optical recognition and large language models to automatically extract chemical structures and bioactivity data from patents with over 90% accuracy, thereby significantly accelerating drug discovery by unlocking complementary chemical space not found in public databases like ChEMBL.

Zhe Wang, Fangtian Fu, Wei Zhang + 10 more2026-03-04🧬 q-bio

FeynTune: Large Language Models for High-Energy Theory

This paper introduces FeynTune, a suite of 20 specialized Large Language Models fine-tuned on High-Energy Physics arXiv abstracts that outperform both their base model and leading commercial LLMs in theoretical physics tasks, offering valuable insights for developing domain-specific AI in the field.

Paul Richmond, Prarit Agarwal, Borun Chowdhury + 2 more2026-03-02⚛️ hep-th

← Previous Next →

cs.CL