Human-AI Collaboration for Scaling Agile Regression Testing: An Agentic-AI Teammate from Manual to Automated Testing

This paper presents a retrieval-augmented, multi-agent AI system developed in partnership with Hacon that accelerates agile regression testing by automatically generating executable scripts from validated specifications, thereby significantly increasing throughput and reducing manual effort while highlighting the continued necessity of human oversight and clear requirements for quality assurance.

Moustapha El Outmani, Manthan Venkataramana Shenoy, Ahmad Hatahet, Andreas Rausch, Tim Niklas Kniep, Thomas Raddatz, Benjamin KingTue, 10 Ma💻 cs

An explainable hybrid deep learning-enabled intelligent fault detection and diagnosis approach for automotive software systems validation

This paper proposes a novel explainable hybrid deep learning framework combining 1D-CNN and GRU architectures with interpretability techniques like IGs and SHAP to enhance fault detection, diagnosis, and root cause analysis in automotive software system validation while overcoming the limitations of traditional black-box models.

Mohammad Abboush, Ehab Ghannoum, Andreas RauschTue, 10 Ma💻 cs

The Effect of Code Obfuscation on Human Program Comprehension

This study investigates how varying levels of code obfuscation affect human program comprehension in Python and JavaScript, revealing that while obfuscation generally increases reasoning time and reduces accuracy, its impact is non-monotonic and language-specific, with moderate deliberation improving performance and experience proving more critical within specific languages than across them.

Anh H. N. Nguyen, Jack Le, Ilse Lahnstein Coronado, Tien N. NguyenTue, 10 Ma💻 cs

AgentRaft: Automated Detection of Data Over-Exposure in LLM Agents

This paper introduces AgentRaft, an automated framework that combines program analysis and semantic reasoning to detect and quantify the systemic risk of Data Over-Exposure in LLM agents, demonstrating high accuracy and efficiency across thousands of real-world tools.

Yixi Lin (Sun Yat-sen University, Zhuhai, Guangdong, China), Jiangrong Wu (Sun Yat-sen University, Zhuhai, Guangdong, China), Yuhong Nan (Sun Yat-sen University, Zhuhai, Guangdong, China), Xueqiang Wang (University of Central Florida, Orlando, Florida, USA), Xinyuan Zhang (Sun Yat-sen University, Zhuhai, Guangdong, China), Zibin Zheng (Sun Yat-sen University, Zhuhai, Guangdong, China)Tue, 10 Ma💻 cs

On the Effectiveness of Code Representation in Deep Learning-Based Automated Patch Correctness Assessment

This paper presents the first extensive study evaluating over 500 models to demonstrate that graph-based code representations consistently outperform other methods in predicting patch correctness, thereby significantly improving the effectiveness of automated program repair tools.

Quanjun Zhang, Chunrong Fang, Haichuan Hu, Yuan Zhao, Weisong Sun, Yun Yang, Tao Zheng, Zhenyu ChenTue, 10 Ma💻 cs

Empathy in Software Engineering Education: Evidence, Practices, and Opportunities

This systematic review of 43 studies reveals that while empathy is increasingly recognized as a vital capability for software engineers, its integration into education remains fragmented, prompting a call to evolve empathy from a peripheral soft skill into a structured, measurable pedagogical component to enhance collaboration, ethics, and inclusive design.

Matheus de Morais Leca, Kim Johnston, Ronnie de Souza SantosTue, 10 Ma💻 cs

Regression Testing in Remote and Hybrid Software Teams: An Exploratory Study of Processes, Tools, and Practices

This study investigates how remote and hybrid work environments reshape regression testing practices, revealing that while core phases remain stable, successful execution increasingly relies on automation, documentation, and standardized tooling to overcome communication challenges and support asynchronous collaboration.

Juliane Pascoal, Cleytton Magalhaes, Ronnie de Souza SantosTue, 10 Ma💻 cs

Do Deployment Constraints Make LLMs Hallucinate Citations? An Empirical Study across Four Models and Five Prompting Regimes

This empirical study demonstrates that deployment-motivated prompting constraints significantly exacerbate citation hallucinations across four large language models, with no model achieving a citation existence rate above 47.5% and a substantial portion of unverifiable outputs being fabricated, thereby underscoring the critical need for post-hoc verification in academic and software engineering contexts.

Chen Zhao, Yuan Tang, Yitian QianTue, 10 Ma💻 cs

Exploring the Reasoning Depth of Small Language Models in Software Architecture: A Multidimensional Evaluation Framework Towards Software Engineering 2.0

This study benchmarks ten small language models on architectural decision record generation to establish a multidimensional evaluation framework, revealing that models exceeding 3 billion parameters excel in zero-shot reasoning while sub-2 billion models benefit most from fine-tuning, and that few-shot prompting effectively calibrates mid-sized models despite high semantic diversity often correlating with hallucinations.

Ha Vo, Nhut Tran, Khang Vo, Phat T. Tran-Truong, Son HaTue, 10 Ma💻 cs

Characterizing Faults in Agentic AI: A Taxonomy of Types, Symptoms, and Root Causes

This paper presents a comprehensive taxonomy of faults in agentic AI systems, derived from a large-scale empirical study of 13,602 issues and validated by 145 practitioners, which categorizes 37 distinct fault types, their symptoms, and root causes to reveal critical propagation patterns and mismatches between probabilistic LLM outputs and deterministic system constraints.

Mehil B Shah, Mohammad Mehdi Morovati, Mohammad Masudur Rahman, Foutse KhomhTue, 10 Ma💻 cs

Measuring Complexity at the Requirements Stage: Spectral Metrics as Development Effort Predictors

This paper demonstrates that spectral metrics derived from natural language processing of requirements specifications can predict integration effort with high accuracy (correlations >0.95), offering a validated method to quantify structural complexity at the requirements stage and bridge the gap between architectural analysis and requirements engineering.

Maximilian Vierlboeck, Antonio Pugliese, Roshanak Nilchian, Paul Grogan, Rashika Sugganahalli Natesh BabuTue, 10 Ma💬 cs.CL