cs.SE papers | Gist.Science

Refactoring for Novices in Java: An Eye Tracking Study on the Extract vs. Inline Methods

An eye-tracking study involving Java novices reveals that while method extraction can significantly improve performance and reduce visual effort for complex tasks, it often hinders comprehension and increases cognitive load for simple tasks, suggesting that educators should be cautious about promoting premature modularization for beginners.

José Aldo Silva da Costa, Rohit Gheyi, José Júnior Silva da Costa + 5 more2026-03-06💻 cs

Toward architecting self-coding information systems

This extended abstract proposes and defines the novel research topic of self-coding information systems, which are agentic AI capable of autonomously evaluating, generating, testing, and redeploying their own source code at runtime to dynamically adapt and accelerate feature delivery.

Rodrigo Falcão, Frank Elberzhager, Karthik Vaidhyanathan2026-03-06💻 cs

RefAgent: A Multi-agent LLM-based Framework for Automatic Software Refactoring

This paper introduces RefAgent, a multi-agent LLM-based framework that autonomously plans, executes, and iteratively refines software refactoring through specialized agents, demonstrating significant improvements in code quality, test pass rates, and smell reduction compared to single-agent and traditional approaches.

Khouloud Oueslati, Maxime Lamothe, Foutse Khomh2026-03-06💻 cs

Vision Language Model-based Testing of Industrial Autonomous Mobile Robots

This paper presents RVSG, a Vision Language Model-based testing framework developed with PAL Robotics that automatically generates diverse, requirement-violating human interaction scenarios in simulation to safely and effectively evaluate the robustness of industrial Autonomous Mobile Robots against unpredictable behaviors.

Jiahui Wu, Chengjie Lu, Aitor Arrieta + 2 more2026-03-06💻 cs

MioHint: LLM-assisted Mutation for Whitebox API Testing

MioHint is a novel white-box API testing approach that overcomes the limitations of Large Language Models in analyzing entire codebases by synergizing static data-dependency analysis with LLMs to retrieve relevant code, thereby significantly improving line coverage and mutation accuracy for cloud applications compared to existing baselines.

Jia Li, Jiacheng Shen, Yuxin Su + 1 more2026-03-06💻 cs

Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models

This paper introduces a large-scale, mutation-based evaluation framework to assess the robustness of Large Language Models in fault localization, revealing that their reasoning is often brittle and reliant on syntactic cues rather than deep semantic understanding, as evidenced by a 78% failure rate when subjected to semantic-preserving code changes.

Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun + 5 more2026-03-06💻 cs

Real-Time BDI Agents: a model and its implementation

This paper addresses the limitations of existing BDI agents in real-time environments by proposing a redefined control loop that explicitly manages temporal constraints and resources, which is then validated through an implementation in a resource-collection video game.

Andrea Traldi, Francesco Bruschetti, Marco Robol + 3 more2026-03-06💻 cs

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

This paper presents the first systematic audit revealing that widely used "shadow APIs," which claim to provide access to restricted frontier LLMs, frequently employ deceptive practices such as model substitution and safety manipulation, thereby compromising the reliability, reproducibility, and validity of downstream applications and academic research.

Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang2026-03-06🔒 cs.CR

Automated TEE Adaptation with LLMs: Identifying, Transforming, and Porting Sensitive Functions in Programs

This paper introduces AUTOTEE, the first LLM-based approach that automatically identifies, transforms, and ports sensitive functions from existing programs into Trusted Execution Environments (TEEs), achieving high accuracy and success rates in Java and Python while significantly reducing the manual effort and domain expertise required for developers.

Ruidong Han, Zhou Yang, Chengyan Ma, Ye Liu, Yuqing Niu, Siqi Ma, Debin Gao, David Lo2026-03-06🔒 cs.CR

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

This paper presents the first multi-dimensional evaluation of 31 LLM safety benchmarks, revealing that while they do not outperform non-benchmark papers in academic influence, there is a critical misalignment where neither author prominence nor paper impact correlates with code quality, highlighting a significant need for improved repository readiness and ethical standards.

Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang2026-03-06🔒 cs.CR

Analyzing Dependency Distribution Changes Arising from Code Smell Interactions

This study analyzes 116 open-source Java systems to demonstrate that interactions between code smells significantly alter static dependency distributions, often increasing total dependencies, thereby offering valuable insights for improving code smell detection, prioritization, and refactoring strategies.

Zushuai Zhang, Elliott Wen, Ewan Tempero2026-03-05💻 cs

From Feedback to Failure: Automated Android Performance Issue Reproduction

This paper introduces RevPerf, an automated framework that synthesizes ambiguous user app reviews through semantic retrieval and prompt engineering to generate and execute reproduction steps, successfully identifying Android performance issues with a 72.73% success rate.

Zhengquan Li, Zhenhao Li, Zishuo Ding2026-03-05💻 cs

EasyRpl: A web-based tool for modelling and analysis of cross-organisational workflows

This paper introduces EasyRpl, a user-friendly web-based tool suite that facilitates the modelling, simulation, and analysis of complex cross-organisational workflows to optimize efficiency and identify resource bottlenecks.

Muhammad Rizwan Ali, Violet Ka I Pun, Guillermo Román-Díez2026-03-05💻 cs

Formal Analysis of the Contract Automata Runtime Environment with Uppaal: Modelling, Verification and Testing

This paper presents the formal modelling, verification using Uppaal, and testing of the Contract Automata Runtime Environment (CARE) as a network of stochastic timed automata to enhance the dependability of this open-source distributed application.

Davide Basile2026-03-05💻 cs

Natural Adversaries: Fuzzing Autonomous Vehicles with Realistic Roadside Object Placements

This paper introduces TrashFuzz, a black-box fuzzing algorithm that manipulates the realistic placement of common roadside objects to generate adversarial scenarios causing autonomous vehicles to misperceive traffic signals and violate traffic laws, demonstrating significant vulnerabilities in the Apollo system without relying on unnatural adversarial patches.

Yang Sun, Haoyu Wang, Christopher M. Poskitt + 1 more2026-03-05💻 cs

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

This large-scale empirical study evaluates the effectiveness of four LLMs using five prompting strategies for generating unit tests across 216,300 cases, revealing that while reasoning-based prompting improves reliability and readability compared to traditional search-based methods, persistent hallucination-driven compilation failures and maintainability issues necessitate hybrid approaches combining LLM generation with automated validation and refinement.

Wendkûuni C. Ouédraogo, Kader Kaboré, Yinghua Li + 5 more2026-03-05💻 cs

Code Fingerprints: Disentangled Attribution of LLM-Generated Code

This paper introduces the Disentangled Code Attribution Network (DCAN), a novel framework that leverages contrastive learning to disentangle source-specific stylistic fingerprints from semantic content, enabling accurate identification of the specific Large Language Model responsible for generating code snippets across multiple programming languages.

Jiaxun Guo, Ziyuan Yang, Mengyu Sun + 3 more2026-03-05💬 cs.CL

CONCUR: Benchmarking LLMs for Concurrent Code Generation

This paper introduces CONCUR, a novel benchmark comprising 115 concurrency-specific problems designed to evaluate and highlight the limitations of Large Language Models in generating complex concurrent code, addressing a critical gap left by existing benchmarks that focus solely on sequential code.

Jue Huang, Tarek Mahmud, Corina Pasareanu + 1 more2026-03-05🤖 cs.LG

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

This paper proposes and validates a dual-helix governance framework, implemented via a 3-track architecture and the open-source AgentLoom toolkit, which overcomes inherent LLM limitations in WebGIS development by externalizing domain knowledge and enforcing protocols, thereby achieving significant improvements in code quality and operational reliability as demonstrated in the FutureShorelines refactoring project.

Boyuan, Guan, Wencong Cui + 1 more2026-03-05🤖 cs.AI

LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints

This paper presents LikeThis!, a GenAI-based approach that empowers users to transform vague UI complaints into constructive, actionable feedback by generating concrete design improvement alternatives from user comments and screenshots, which was validated through model benchmarking and a user study showing enhanced feedback quality and developer understanding.

Jialiang Wei, Ali Ebrahimi Pourasad, Walid Maalej2026-03-05🤖 cs.AI

← Previous Next →