V1V_1: Unifying Generation and Self-Verification for Parallel Reasoners

The paper introduces V1V_1, a framework that unifies generation and self-verification through efficient pairwise ranking and a tournament-based uncertainty-guided algorithm, significantly improving test-time scaling performance and efficiency on complex reasoning and code generation benchmarks compared to existing pointwise verification and standard reinforcement learning methods.

Harman Singh, Xiuyu Li, Kusha Sareen + 14 more2026-03-05💬 cs.CL

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

The AILS-NTUA team achieved first place in SemEval-2026 Task 12 with a 0.95 accuracy score by deploying a three-stage system that integrates graph-based retrieval, reflective prompt evolution for LLM-driven abductive reasoning, and post-hoc consistency enforcement, while their cross-model analysis identified systematic failure modes in multi-label causal reasoning across 14 models.

Nikolas Karafyllis, Maria Lymperaiou, Giorgos Filandrianos + 2 more2026-03-05💬 cs.CL

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

This paper introduces Dual-Modality Multi-Stage Adversarial Safety Training (DMAST), a three-stage framework that co-trains multimodal web agents and attackers via imitation learning, oracle-guided fine-tuning, and adversarial reinforcement learning to effectively defend against cross-modal attacks while significantly improving task completion efficiency.

Haoyu Liu, Dingcheng Li, Lukas Rutishauser + 1 more2026-03-05🤖 cs.AI

ττ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

This paper introduces τ\tau-Knowledge, a new benchmark featuring the τ\tau-Banking domain to evaluate conversational agents' ability to coordinate unstructured knowledge retrieval with tool use in complex, policy-driven workflows, revealing that even frontier models struggle with low success rates and reliability in such realistic, long-horizon interactions.

Quan Shi, Alexandra Zytek, Pedram Razavi + 2 more2026-03-05🤖 cs.AI

Reproduction and Replication of an Adversarial Stylometry Experiment

This paper reproduces and replicates a seminal study on adversarial stylometry, confirming the original conclusion that anonymity is difficult to maintain but revealing that the effectiveness of certain defenses may be overstated due to a lack of control groups, while also highlighting round-trip translation as a promising automatic method for reducing authorship attribution accuracy.

Haining Wang, Patrick Juola, Allen Riddell2026-03-04💬 cs.CL

Statistical Machine Translation for Indic Languages

This paper presents the development and evaluation of Statistical Machine Translation (SMT) systems using the MOSES toolkit to translate between English and fifteen low-resource Indian languages, leveraging the Samanantar and OPUS datasets for training, Flores-200 for testing, and various preprocessing and reordering techniques to optimize translation quality as measured by BLEU, METEOR, and RIBES metrics.

Sudhansu Bala Das, Divyajoti Panda, Tapas Kumar Mishra + 1 more2026-03-04💬 cs.CL

StarWhisper Telescope: An AI framework for automating end-to-end astronomical observations

The StarWhisper Telescope system is an AI agent framework that automates end-to-end astronomical observations by integrating large language models with specialized workflows to autonomously plan observations, analyze data, and trigger follow-ups, thereby reducing human intervention and demonstrating scalable potential for future large-scale telescope arrays.

Cunshi Wang, Yu Zhang, Yuyang Li + 25 more2026-03-04🔭 astro-ph

BioChemInsight: An Online Platform for Automated Extraction of Chemical Structures and Activity Data from Patents

BioChemInsight is an open-source pipeline that integrates advanced optical recognition and large language models to automatically extract chemical structures and bioactivity data from patents with over 90% accuracy, thereby significantly accelerating drug discovery by unlocking complementary chemical space not found in public databases like ChEMBL.

Zhe Wang, Fangtian Fu, Wei Zhang + 10 more2026-03-04🧬 q-bio