Refactoring for Novices in Java: An Eye Tracking Study on the Extract vs. Inline Methods

An eye-tracking study involving Java novices reveals that while method extraction can significantly improve performance and reduce visual effort for complex tasks, it often hinders comprehension and increases cognitive load for simple tasks, suggesting that educators should be cautious about promoting premature modularization for beginners.

José Aldo Silva da Costa, Rohit Gheyi, José Júnior Silva da Costa + 5 more2026-03-06💻 cs

Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models

This paper introduces a large-scale, mutation-based evaluation framework to assess the robustness of Large Language Models in fault localization, revealing that their reasoning is often brittle and reliant on syntactic cues rather than deep semantic understanding, as evidenced by a 78% failure rate when subjected to semantic-preserving code changes.

Sabaat Haroon, Ahmad Faraz Khan, Ahmad Humayun + 5 more2026-03-06💻 cs

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

This paper presents the first systematic audit revealing that widely used "shadow APIs," which claim to provide access to restricted frontier LLMs, frequently employ deceptive practices such as model substitution and safety manipulation, thereby compromising the reliability, reproducibility, and validity of downstream applications and academic research.

Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang2026-03-06🔒 cs.CR

Automated TEE Adaptation with LLMs: Identifying, Transforming, and Porting Sensitive Functions in Programs

This paper introduces AUTOTEE, the first LLM-based approach that automatically identifies, transforms, and ports sensitive functions from existing programs into Trusted Execution Environments (TEEs), achieving high accuracy and success rates in Java and Python while significantly reducing the manual effort and domain expertise required for developers.

Ruidong Han, Zhou Yang, Chengyan Ma, Ye Liu, Yuqing Niu, Siqi Ma, Debin Gao, David Lo2026-03-06🔒 cs.CR

Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

This paper presents the first multi-dimensional evaluation of 31 LLM safety benchmarks, revealing that while they do not outperform non-benchmark papers in academic influence, there is a critical misalignment where neither author prominence nor paper impact correlates with code quality, highlighting a significant need for improved repository readiness and ethical standards.

Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang2026-03-06🔒 cs.CR

Natural Adversaries: Fuzzing Autonomous Vehicles with Realistic Roadside Object Placements

This paper introduces TrashFuzz, a black-box fuzzing algorithm that manipulates the realistic placement of common roadside objects to generate adversarial scenarios causing autonomous vehicles to misperceive traffic signals and violate traffic laws, demonstrating significant vulnerabilities in the Apollo system without relying on unnatural adversarial patches.

Yang Sun, Haoyu Wang, Christopher M. Poskitt + 1 more2026-03-05💻 cs

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

This large-scale empirical study evaluates the effectiveness of four LLMs using five prompting strategies for generating unit tests across 216,300 cases, revealing that while reasoning-based prompting improves reliability and readability compared to traditional search-based methods, persistent hallucination-driven compilation failures and maintainability issues necessitate hybrid approaches combining LLM generation with automated validation and refinement.

Wendkûuni C. Ouédraogo, Kader Kaboré, Yinghua Li + 5 more2026-03-05💻 cs

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

This paper proposes and validates a dual-helix governance framework, implemented via a 3-track architecture and the open-source AgentLoom toolkit, which overcomes inherent LLM limitations in WebGIS development by externalizing domain knowledge and enforcing protocols, thereby achieving significant improvements in code quality and operational reliability as demonstrated in the FutureShorelines refactoring project.

Boyuan, Guan, Wencong Cui + 1 more2026-03-05🤖 cs.AI

LikeThis! Empowering App Users to Submit UI Improvement Suggestions Instead of Complaints

This paper presents LikeThis!, a GenAI-based approach that empowers users to transform vague UI complaints into constructive, actionable feedback by generating concrete design improvement alternatives from user comments and screenshots, which was validated through model benchmarking and a user study showing enhanced feedback quality and developer understanding.

Jialiang Wei, Ali Ebrahimi Pourasad, Walid Maalej2026-03-05🤖 cs.AI