cs.CL 件の論文 | Gist.Science

FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation

本論文は、センシティブなトピックに対する大規模言語モデルの回答を、コンテンツ・論理・適切さの 3 分野に細分化した評価体系「FINEST」を導入し、スコアと根拠に基づくフィードバックによって安全性と有用性を同時に向上させる手法を提案し、その有効性を検証したものである。

Juhyun Oh, Nayeon Lee, Chani Jung + 5 more2026-03-05💬 cs.CL

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

本論文は、検証可能な報酬を用いた強化学習がコンパクトな言語モデルに物理的推論を習得させるか検討した結果、厳密な物理報酬さえも単なる解答パターンの暗記を誘発し、構造化された推論の足場がない限り頑健な科学的推論には至らないことを示しています。

Tarjei Paule Hage, Markus J. Buehler2026-03-05🔬 cond-mat.mtrl-sci

VietNormalizer: An Open-Source, Dependency-Free Python Library for Vietnamese Text Normalization in TTS and NLP Applications

本論文は、TTS や NLP 向けに、依存関係なしのルールベース方式で、数値、日付、通貨、略語、外来語など多様な非標準テキストをベトナム語の発音形式に変換するオープンソースライブラリ「VietNormalizer」を提案し、その設計と既存手法との比較、および低資源言語への汎用性について論じています。

Hung Vu Nguyen, Loan Do, Thanh Ngoc Nguyen + 5 more2026-03-05💬 cs.CL

Traces of Social Competence in Large Language Models

この論文は、大規模言語モデルの社会的competenceを評価する偽信念テストにおいて、モデルの規模や学習手法が性能に与える影響を分析し、特に「思考」という語彙がモデルの推論パターンに因果的な影響を与える「クロスオーバー効果」の存在と、その発生メカニズムをベイズ回帰やベクトル操作を用いて解明したものである。

Tom Kouwenhoven, Michiel van der Meer, Max van Duijn2026-03-05💬 cs.CL

Code Fingerprints: Disentangled Attribution of LLM-Generated Code

本論文は、LLM 生成コードのモデルレベル帰属を可能にするために、意味情報とモデル固有のスタイル情報を分離する「Disentangled Code Attribution Network (DCAN)」を提案し、4 つの主要 LLM と 4 つのプログラミング言語を対象とした大規模ベンチマークデータセットを構築してその有効性を検証したものである。

Jiaxun Guo, Ziyuan Yang, Mengyu Sun + 3 more2026-03-05💬 cs.CL

When Do Language Models Endorse Limitations on Human Rights Principles?

本論文は、11 種類の主要な大規模言語モデル（LLM）を対象に 1,152 件のシナリオを用いた評価を通じて、これらのモデルが経済的・社会的・文化的権利の制限を政治的・市民的権利よりも容認する傾向にあること、言語（特に中国語やヒンディー語）による偏り、プロンプト操作への脆弱性、および回答形式による結果の差異など、人権原則の制限を容認する際の体系的なバイアスと課題を明らかにしています。

Keenan Samway, Nicole Miu Takagi, Rada Mihalcea + 4 more2026-03-05💬 cs.CL

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG

この論文は、多言語および視覚的に豊かな文書における RAG ベンチマークの性能向上が、主に高度な検索モデルによるものではなく、文書表現（文字起こしや前処理）の改善によるものであることを示し、検索能力と文字起こし能力を分離して評価する必要性を提唱しています。

Martin Asenov, Kenza Benkirane, Dan Goldwater + 1 more2026-03-05💬 cs.CL

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

本論文は、長期タスクにおける LLM エージェントのコンテキスト制約を克服するため、外部記憶とインデックスを活用して証拠を破棄せずに圧縮し、強化学習（MemexRL）を用いて要約・保存・検索のタイミングを最適化する「Memex」システムを提案し、より少ないコンテキストで高いタスク成功率を実現することを示しています。

Zhenting Wang, Huancheng Chen, Jiayun Wang + 1 more2026-03-05🤖 cs.LG

Causality Elicitation from Large Language Models

この論文は、大規模言語モデルから生成された文書群を分析し、イベントの抽出と集約、そして因果発見アルゴリズムを適用することで、モデルが想定しうる因果仮説の集合を可視化するパイプラインを提案するものである。

Takashi Kameyama, Masahiro Kato, Yasuko Hio + 2 more2026-03-05🤖 cs.AI

Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models

この論文は、大規模言語モデルのカスタマイズにおいてテキストプロンプトの限界を指摘し、よりスケーラブルで安定した制御を可能にするため、ベクトルプロンプト入力を公開インターフェースとして提供すべきだと主張しています。

Liangwei Yang, Shiyu Wang, Haolin Chen + 12 more2026-03-05✓ Author reviewed ⓘ💬 cs.CL

The Company You Keep: How LLMs Respond to Dark Triad Traits

本研究は、大規模言語モデルがユーザーのダークトライアッド特性（マキャベリズム、ナルシシズム、サイコパシー）を含むプロンプトに対して、主に是正的な反応を示しつつも特定の状況で強化的な出力を行うことを明らかにし、より安全な対話システムの設計への示唆を与えています。

Zeyi Lu, Angelica Henestrosa, Pavel Chizhov + 1 more2026-03-05💬 cs.CL

$V_1$ : Unifying Generation and Self-Verification for Parallel Reasoners

この論文は、生成と検証を統合し、候補間のペアワイズ比較に基づく不確実性guided ランキングと強化学習を用いることで、複雑な推論タスクにおけるテスト時スケーリングの効率と精度を大幅に向上させる新フレームワーク「V1」を提案するものです。

Harman Singh, Xiuyu Li, Kusha Sareen + 14 more2026-03-05💬 cs.CL

World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

この論文は、LLM の隠れ状態から地理的・時間的構造が線形に復元可能であるという事実が、モデルが「世界モデル」を内在化している証拠ではなく、単なる単語の共起統計に潜む構造的な情報に由来するものであることを、静的な単語埋め込みを用いた実験を通じて示している。

Elan Barenholtz2026-03-05🤖 cs.AI

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

AILS-NTUA は、グラフベースの検索、反射的プロンプト進化を介して最適化された LLM 駆動の帰納的推論、および事後の一貫性強制を組み合わせる 3 段階システムを提案し、SemEval-2026 タスク 12（帰納的事象推論）で 0.95 の精度を達成して 1 位を獲得し、さらに 14 種類のモデル間での誤差分析を通じて因果推論における 3 つの系統的な失敗パターンを特定しました。

Nikolas Karafyllis, Maria Lymperaiou, Giorgos Filandrianos + 2 more2026-03-05💬 cs.CL

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

本論文は、LLM ベースの CAD 生成におけるエンティティ選択の困難さと離散化によるトポロジー誤差を解決するため、B-Rep 幾何情報とポインタに基づく選択メカニズムを統合し、複雑な形状の生成と高精度な編集を可能にする新たなフレームワーク「Pointer-CAD」を提案するものである。

Dacheng Qi, Chenyu Wang, Jingwei Xu + 6 more2026-03-05💬 cs.CL

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

この論文は、マルチモーダル Web エージェントがスクリーンショットとアクセシビリティツリーの両方を含む視覚的攻撃に対して脆弱であることを発見し、教師モデルからの模倣学習、ゼロ・アックノリッジメント戦略を用いた教師あり微調整、および GRPO による敵対的強化学習の 3 段階パイプラインからなる「Dual-Modality Multi-Stage Adversarial Safety Training（DMAST）」を提案することで、タスク効率を倍増させつつ既存の防御手法を凌駕する堅牢性を達成したことを述べています。

Haoyu Liu, Dingcheng Li, Lukas Rutishauser + 1 more2026-03-05🤖 cs.AI

← 前へ次へ →

cs.CL

FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

VietNormalizer: An Open-Source, Dependency-Free Python Library for Vietnamese Text Normalization in TTS and NLP Applications

Traces of Social Competence in Large Language Models

Code Fingerprints: Disentangled Attribution of LLM-Generated Code

When Do Language Models Endorse Limitations on Human Rights Principles?

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Causality Elicitation from Large Language Models

Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models

The Company You Keep: How LLMs Respond to Dark Triad Traits

$V_1$ : Unifying Generation and Self-Verification for Parallel Reasoners

World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

$τ$ -Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

The 2020s Political Economy of Machine Translation

Thought Flow Nets: From Single Predictions to Trains of Model Thought

cs.CL

FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

VietNormalizer: An Open-Source, Dependency-Free Python Library for Vietnamese Text Normalization in TTS and NLP Applications

Traces of Social Competence in Large Language Models

Code Fingerprints: Disentangled Attribution of LLM-Generated Code

When Do Language Models Endorse Limitations on Human Rights Principles?

Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Causality Elicitation from Large Language Models

Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models

The Company You Keep: How LLMs Respond to Dark Triad Traits

V1V_1V1​: Unifying Generation and Self-Verification for Parallel Reasoners

World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

AILS-NTUA at SemEval-2026 Task 12: Graph-Based Retrieval and Reflective Prompting for Abductive Event Reasoning

Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

τττ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

The 2020s Political Economy of Machine Translation

Thought Flow Nets: From Single Predictions to Trains of Model Thought

$V_1$ : Unifying Generation and Self-Verification for Parallel Reasoners

$τ$ -Knowledge: Evaluating Conversational Agents over Unstructured Knowledge