cs.SE papers | Gist.Science

Towards a Goal-Centric Assessment of Requirements Engineering Methods for Privacy by Design

This paper proposes a goal-centric framework for assessing Requirements Engineering methods for Privacy by Design, arguing that practitioners should evaluate these methods based on organizational goals rather than solely on process characteristics to better support their selection and tailoring.

Oleksandr Kosenkov, Ehsan Zabardast, Jannik Fischbach, Tony Gorschek, Daniel MendezWed, 11 Ma💻 cs

A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs

This paper introduces Preguss, a modular framework that combines static analysis with LLM-aided synthesis to automatically generate and refine interprocedural specifications, enabling highly automated verification of large-scale programs (over 1,000 lines of code) while significantly reducing human effort.

Zhongyi Wang, Tengjie Lin, Mingshuai Chen, Haokun Li, Mingqi Yang, Xiao Yi, Shengchao Qin, Yixing Luo, Xiaofeng Li, Bin Gu, Liqiang Lu, Jianwei YinWed, 11 Ma💻 cs

Floating-Point Usage on GitHub: A Large-Scale Study of Statically Typed Languages

This paper presents the first large-scale empirical study of floating-point arithmetic usage in statically typed languages across millions of GitHub repositories, revealing that while existing benchmarks are partially representative, they do not fully capture real-world code patterns, and releasing a dataset of 10 million extracted functions to guide future reasoning techniques.

Andrea Gilot, Tobias Wrigstad, Eva DarulovaWed, 11 Ma💻 cs

Evaluating Large Language Models for Multilingual Vulnerability Detection at Dual Granularities

This paper presents a comprehensive empirical study evaluating state-of-the-art pre-trained and large language models for multilingual vulnerability detection across seven programming languages at both function and line levels, revealing that instruction-tuned GPT-4o significantly outperforms other models, particularly in identifying high-severity and unique multilingual vulnerabilities.

Honglin Shu, Michael Fu, Junji Yu, Dong Wang, Chakkrit Tantithamthavorn, Junjie Chen, Yasutaka KameiWed, 11 Ma💻 cs

Towards a Taxonomy of Software Log Smells

This paper presents a taxonomy of nine log smells derived from a survey of 51 studies to help developers write better logging code, while also mapping these issues to existing repair tools and highlighting critical gaps in current research and tooling.

Nyyti Saarimäki, Donghwan Shin, Domenico BianculliWed, 11 Ma💻 cs

"Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering

This study investigates the pitfalls of integrating large language models into software engineering workflows, revealing that persistent inaccuracies and cognitive overload often lead developers to abandon AI tools, though strategic prompt refinement can significantly mitigate the risk of such abandonment.

Jiessie Tie, Bingsheng Yao, Tianshi Li, Hongbo Fang, Syed Ishtiaque Ahmed, Dakuo Wang, Shurui ZhouWed, 11 Ma💻 cs

An Empirical Study of Interaction Smells in Multi-Turn Human-LLM Collaborative Code Generation

This paper introduces the concept of "Interaction Smells" in multi-turn human-LLM code generation, establishes a taxonomy based on real-world data, analyzes their distribution across leading models, and proposes the Invariant-aware Constraint Evolution (InCE) framework to effectively mitigate these issues and improve task success rates.

Binquan Zhang, Li Zhang, Lin Shi, Song Wang, Yuwei Qian, Linhui Zhao, Fang Liu, An Fu, Yida YeWed, 11 Ma💻 cs

Preparing Students for AI-Driven Agile Development: A Project-Based AI Engineering Curriculum

This paper presents a project-based AI engineering curriculum that integrates agile practices with generative AI tools to prepare students for modern software development, demonstrating through a seven-sprint case study that embedding AI across the engineering lifecycle fosters hands-on competence while necessitating adaptations for tool evolution and foundational learning verification.

Andreas Rausch, Stefan Wittek, Tobias Geger, David InkermannWed, 11 Ma💻 cs

EmbC-Test: How to Speed Up Embedded Software Testing Using LLMs and RAG

This paper introduces EmbC-Test, a Retrieval-Augmented Generation (RAG) pipeline that leverages project-specific artifacts to ground large language models, enabling the automated generation of syntactically correct and runtime-valid embedded C tests that reduce testing time by up to 66% while producing 270 tests per hour.

Maximilian Harnot, Sebastian Komarnicki, Michal Polok, Timo OksanenWed, 11 Ma💻 cs

Towards Viewpoint-centric Artifact-based Regulatory Requirements Engineering for Compliance by Design

This paper reports on the synthesis and seeks feedback for the future evaluation of an Artefact Model for Regulatory Requirements Engineering (AM4RRE), aiming to bridge the gap between organizational regulatory processes and ad-hoc software development practices to achieve systematic, integrated compliance by design.

Oleksandr KosenkovWed, 11 Ma💻 cs

Experience Report on the Adaptable Integration of Requirements Engineering Courses into Curricula for Professionals

This paper reports on the authors' experience developing three professional software engineering curricula and proposes a systematic, content-mapping-based approach with guiding principles for effectively integrating Requirements Engineering courses into these dynamic and modular programs.

Oleksandr Kosenkov, Konstantin Blaschke, Tony Gorschek, Michael Unterkalmsteiner, Oleksandr Adamov, Davide FucciWed, 11 Ma💻 cs

Can ChatGPT Generate Realistic Synthetic System Requirement Specifications? Results of a Case Study

This case study demonstrates that while ChatGPT can generate realistic synthetic system requirement specifications across multiple industries using iterative prompt engineering, the resulting artifacts still contain significant flaws that necessitate thorough expert evaluation rather than relying solely on LLM-based quality assessments.

Alex R. Mattukat, Florian M. Braun, Horst LichterWed, 11 Ma💻 cs

ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization

ToolRosetta is a unified framework that automatically transforms heterogeneous open-source code repositories into standardized, secure, and executable Model Context Protocol (MCP) tools, enabling LLM agents to autonomously plan and invoke specialized software for complex tasks with minimal human intervention.

Shimin Di, Xujie Yuan, Hanghui Guo, Chaoqian Ouyang, Zhangze Chen, Ling Yue, Libin Zheng, Jia Zhu, Shaowu Pan, Jian Yin, Min-Ling Zhang, Yong RuiWed, 11 Ma💻 cs

AgenticCyOps: Securing Multi-Agentic AI Integration in Enterprise Cyber Operations

This paper introduces AgenticCyOps, a security framework for enterprise multi-agent AI systems that mitigates emerging attack surfaces by formalizing tool orchestration and memory management as primary trust boundaries and applying five defensive principles aligned with global compliance standards to significantly reduce exploitable vulnerabilities in SOC workflows.

Shaswata Mitra, Raj Patel, Sudip Mittal, Md Rayhanur Rahman, Shahram RahimiWed, 11 Ma💻 cs

Class Model Generation from Requirements using Large Language Models

This paper evaluates the capability of state-of-the-art Large Language Models to automatically generate UML class diagrams from natural language requirements, demonstrating their effectiveness and reliability through a comprehensive dual-validation framework that combines LLM-as-a-Judge assessments with human expert evaluations.

Jackson Nguyen, Rui En Koe, Fanyu Wang, Chetan Arora, Alessio FerrariWed, 11 Ma💻 cs

Synergistic Directed Execution and LLM-Driven Analysis for Zero-Day AI-Generated Malware Detection

This paper presents a novel hybrid framework that synergistically combines concolic execution, LLM-guided path prioritization, and deep learning to achieve provably sound and highly accurate detection of zero-day AI-generated malware, significantly outperforming conventional baselines on both standard and synthesized threat benchmarks.

George Edwards, Mahdi EslamimehrWed, 11 Ma💻 cs

The Future of Software Engineering Conferences: A New Zealand Perspective

This paper examines the barriers faced by New Zealand researchers in attending software engineering conferences, such as high travel costs and scheduling misalignments, and proposes strategies like hybrid participation and governance reforms to foster more equitable global engagement.

Kelly Blincoe, Sherlock A. Licorish, Judith Fuchs, Amjed TahirWed, 11 Ma💻 cs

Lockbox -- A Zero Trust Architecture for Secure Processing of Sensitive Cloud Workloads

This paper presents Lockbox, a Zero Trust architecture that ensures the secure processing of sensitive cloud workloads by enforcing strict isolation, least-privilege access, and end-to-end encryption, thereby enabling enterprises to safely leverage advanced capabilities like AI without compromising their security posture.

Vamshi Krishna Thotempudi, Mahima Agarwal, Raghav Batta, Anjali MangalWed, 11 Ma💻 cs

Can AI Agents Generate Microservices? How Far are We?

This paper evaluates the capability of AI agents to generate functional microservices, finding that while they can produce maintainable code with high API contract adherence—particularly in clean-state scenarios—current limitations in consistency and the need for human oversight prevent fully autonomous generation.

Bassam Adnan, Matteo Esposito, Davide Taibi, Karthik VaidhyanathanWed, 11 Ma💻 cs

GenAI Is No Silver Bullet for Qualitative Research in Software Engineering

This paper argues that while Generative AI offers potential assistance for qualitative research in software engineering, it is not a universal solution and requires careful, strategy-specific adaptation to avoid overgeneralization, ultimately outlining its promises, pitfalls, and implications for research quality.

Neil A. Ernst, Christoph TreudeWed, 11 Ma💻 cs