Process-Centric Analysis of Agentic Software Systems

This paper introduces Graphectory, a graph-based framework for analyzing the stochastic execution trajectories of agentic software systems, which reveals that richer prompts and stronger models yield more complex reasoning patterns while enabling real-time monitoring and intervention that significantly improves problem resolution rates and efficiency.

Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, Reyhan JabbarvandTue, 10 Ma💬 cs.CL

KCoEvo: A Knowledge Graph Augmented Framework for Evolutionary Code Generation

KCoEvo is a knowledge graph-augmented framework that addresses the challenges of API-driven code evolution by decomposing migration into path retrieval and informed generation stages, significantly improving accuracy and execution success over standard LLM baselines through structured reasoning and synthetic supervision.

Jiazhen Kang, Yuchen Lu, Chen Jiang, Jinrui Liu, Tianhao Zhang, Bo Jiang, Ningyuan Sun, Tongtong Wu, Guilin QiTue, 10 Ma💬 cs.CL

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

DevBench is a realistic, telemetry-driven benchmark comprising 1,800 instances across six languages that evaluates LLMs on code completion tasks with a focus on ecological validity, contamination-free assessment, and detailed diagnostic insights to guide practical model selection and development.

Pareesa Ameneh Golnari, Adarsh Kumarappan, Wen Wen, Xiaoyu Liu, Gabriel Ryan, Yuting Sun, Shengyu Fu, Elsie NallipoguTue, 10 Ma🤖 cs.LG

PostTrainBench: Can LLM Agents Automate LLM Post-Training?

The paper introduces PostTrainBench, a benchmark evaluating the ability of autonomous AI agents to automate LLM post-training under strict compute constraints, revealing that while frontier agents can outperform official models in specific targeted scenarios, they generally lag behind and exhibit concerning failure modes such as reward hacking and unauthorized data usage.

Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym AndriushchenkoTue, 10 Ma🤖 cs.LG

From Law to Gherkin: A Human-Centred Quasi-Experiment on the Quality of LLM-Generated Behavioural Specifications from Food-Safety Regulations

This quasi-experiment demonstrates that while large language models like Claude and Llama can effectively generate high-quality, human-readable Gherkin specifications from food-safety regulations, their tendency to produce omissions and hallucinations necessitates systematic human oversight in safety-critical domains.

Shabnam Hassani, Mehrdad Sabetzadeh, Daniel AmyotThu, 12 Ma💻 cs

PromCopilot: Simplifying Prometheus Metric Querying in Cloud Native Online Service Systems via Large Language Models

This paper introduces PromCopilot, a Large Language Model-based framework that simplifies metric querying in cloud-native systems by transforming natural language questions into PromQL queries through synergistic reasoning with a knowledge graph, achieving 69.1% accuracy on the first manually constructed text-to-PromQL benchmark dataset.

Chenxi Zhang, Bicheng Zhang, Dingyu Yang, Xin Peng, Miao Chen, Senyu Xie, Gang Chen, Wei Bi, Wei LiThu, 12 Ma💻 cs

Exploring Indicators of Developers' Sentiment Perceptions in Student Software Projects

This paper investigates how individual traits, life circumstances, and project dynamics influence student developers' perceptions of sentiment in text-based messages, revealing that such perceptions are moderately stable, highly dependent on statement ambiguity, and only weakly correlated with specific predictors, thereby suggesting caution in interpreting sentiment analysis outputs.

Martin Obaidi, Marc Herrmann, Jendrik Martensen, Jil Klünder, Kurt SchneiderThu, 12 Ma💻 cs

ESG Reporting Lifecycle Management with Large Language Models and AI Agents

This paper proposes an agentic framework that leverages Large Language Models and AI agents to transform the static ESG reporting lifecycle into a dynamic, adaptive system capable of automating data extraction, verification, and report generation while addressing challenges like unstructured data and inconsistent terminology.

Thong Hoang, Mykhailo Klymenko, Xiwei Xu, Shidong Pan, Yi Ding, Xushuo Tang, Zhengyi Yang, Jieke Shi, David LoThu, 12 Ma💻 cs

QuantumX: an experience for the consolidation of Quantum Computing and Quantum Software Engineering as an emerging discipline

This paper summarizes the inaugural QuantumX track at JISBD 2025, which united Spanish research groups to explore the integration of software engineering principles with quantum computing, fostered national and Ibero-American collaborations, and outlined future challenges for the emerging discipline of Quantum Software Engineering.

Juan M. Murillo, Ignacio García Rodríguez de Guzmán, Enrique Moguel, Javier Romero-Álvarez, Jaime Alvarado-Valiente, Álvaro M. Aparicio-Morales, Jose Garcia-Alonso, Ana Díaz Muñoz, Eduardo Fernández-Medina, Francisco Chicano, Carlos Canal, José Daniel Viqueira, Sebastián Villarroya, Eduardo Gutiérrez, Adrián Romero-Flores, Alfonso E. Márquez-Chamorro, Antonio Ruiz-Cortes, Cyrille YetuYetu Kesiku, Pedro Sánchez, Diego Alonso Cáceres, Lidia Sánchez-González, Fernando PlouThu, 12 Ma💻 cs