SBOMs into Agentic AIBOMs: Schema Extensions, Agentic Orchestration, and Reproducibility Evaluation

Here is an explanation of the paper using simple language and creative analogies.

The Big Picture: From a Static Receipt to a Smart Bodyguard

Imagine you buy a complex machine, like a high-end drone. The manufacturer gives you a Software Bill of Materials (SBOM). Think of this like a grocery receipt. It lists every ingredient: "1 battery, 2 motors, 100 screws, 1 chip."

The Problem:
A receipt tells you what should be there, but it doesn't tell you what is actually happening right now.

Did someone swap the battery for a fake one while you weren't looking?
Is the motor overheating because of the weather?
Is that specific screw actually loose, or is it just listed as "loose" in the manual?

In the world of software, a standard SBOM is just a static list. It can't tell you if a known "bug" (vulnerability) in one of those ingredients is actually dangerous in your specific situation. It's like a receipt saying "Contains Peanuts," but not telling you if the person eating it is actually allergic or if the peanuts are sealed in a jar they can't open.

The Solution:
Dr. Radanliev and his team propose a new thing called an AIBOM (Agentic Artificial Intelligence Bill of Materials).

Think of an AIBOM not as a receipt, but as a team of smart, invisible bodyguards that follow your software around 24/7. They don't just list the ingredients; they watch how the software behaves, check if the ingredients are safe right now, and write a report that says, "Yes, there is a peanut, but it's sealed, so you are safe."

The Three "Bodyguards" (The Agents)

The paper describes a system with three specialized agents (smart software programs) that work together. Here is how they function:

1. MCP: The "Pre-Flight Inspector"

What it does: Before the software even starts running, this agent checks the "garage." It looks at the container, the operating system, and the list of ingredients to make sure everything is accounted for.
The Analogy: Imagine a pilot doing a pre-flight check. They don't just look at the manifest; they physically check the tires, the fuel, and the engine. If something is missing or looks weird, they stop the plane before it takes off.
Goal: To ensure the starting environment is perfect and complete.

2. A2A: The "Eyes on the Road"

What it does: While the software is running, this agent watches everything. It sees if the software suddenly downloads a new file, loads a strange library, or changes its behavior.
The Analogy: This is like a co-pilot watching the dashboard while you drive. If the engine starts making a weird noise or a warning light flickers that wasn't there before, the co-pilot shouts, "Hey, we have a problem!"
Goal: To catch "drift." Sometimes software changes its mind while running (like downloading a new plugin). This agent catches those changes in real-time.

3. AGNTCY: The "Judge and Jury"

What it does: This agent takes the list of bugs (vulnerabilities) and the observations from the other two agents to make a decision. It asks: "Is this bug actually dangerous right now?"
The Analogy: Imagine a security guard at a museum. A sign says "Doors are unlocked." The guard looks at the specific door. Is it actually open? Is there a security camera pointed at it? Is the alarm on?
- If the door is locked and the alarm is on, the guard says: "Not Affected." (The bug exists, but you are safe).
- If the door is open, the guard says: "Requires Review." (We need to fix this immediately).
Goal: To stop panic. Instead of saying "We have 1,000 bugs! Shut it down!", it says, "We have 1,000 bugs, but 999 are locked up tight. Only 1 needs attention."

Why This Matters: The "VEX" Concept

The paper introduces a concept called VEX (Vulnerability Exploitability eXchange).

Old Way: "Warning! This software has a hole in it!" (Everyone panics, even if the hole is covered by a brick wall).
New Way (AIBOM): "Warning! This software has a hole in it, BUT our brick wall (security settings) is covering it, so it is safe."

The AIBOM uses international standards (like CSAF) to write these reports in a language that computers and regulators can both understand. It turns a scary "Bug List" into a helpful "Safety Report."

The Results: Did it Work?

The team tested this system in a controlled environment (like a high-security lab for medical data). They compared their "Smart Bodyguard" system against older, "Static Receipt" systems.

Accuracy: The AIBOM caught 98.6% of the actual issues, while the old systems only caught about 60-87%.
Speed: It didn't slow the computer down much (only about 4% extra CPU usage).
Trust: Because the agents write their reports using a standard format, auditors and regulators can trust the results without needing to be software experts.

The Bottom Line

This paper argues that in a world where software changes constantly and moves fast, a static list of ingredients isn't enough. We need active, intelligent monitoring.

By turning the "Bill of Materials" into a smart, reasoning system, we can stop wasting time worrying about bugs that aren't dangerous, and focus our energy on the real threats. It's the difference between having a list of every car in a parking lot and having a security system that knows exactly which cars are running, which are locked, and which one is actually being stolen.

Here is a detailed technical summary of the paper "SBOMs into Agentic AIBOMs: Schema Extensions, Agentic Orchestration, and Reproducibility Evaluation" by Petar Radanliev et al.

1. Problem Statement

The paper addresses the critical limitations of conventional Software Bills of Materials (SBOMs) in modern, dynamic software supply chains.

Static Nature: Traditional SBOMs (e.g., CycloneDX, SPDX) provide static inventories of dependencies but fail to capture runtime behavior, environment drift, or the specific context in which code is executed.
Exploitability Gap: They cannot distinguish between the presence of a vulnerable component and its actual exploitability within a specific runtime environment. This leads to high false-positive rates in vulnerability management (estimated at ~95% of SBOM-listed vulnerabilities are not exploitable in a given product).
Reproducibility Issues: In regulated environments (like Trusted Research Environments or TREs), unobserved software drift or misinterpreted vulnerabilities can invalidate analytical outputs, even if the code and data remain unchanged.
Lack of Automation: Manual patching and vulnerability assessment are infeasible due to the sheer volume of components and CVEs (over 200,000 disclosed), necessitating automated, context-aware solutions.

2. Methodology

The authors propose a shift from passive inventories to Agentic Artificial Intelligence Bills of Materials (AIBOMs). This framework integrates autonomous, policy-constrained reasoning agents into the SBOM lifecycle.

A. Agentic Architecture

The system employs a multi-agent architecture where specialized agents operate over distinct perception spaces:

MCP (Model-Container Profiler): A baseline environment reconstruction agent. It performs pre-execution capture, ensuring SBOM completeness (>95%) and detecting baseline drift before execution begins.
A2A (Agent-for-Agent Telemetry): A runtime dependency and drift-monitoring agent. It monitors live telemetry (dynamic imports, late-bound modules) to detect environment drift and determine if vulnerable code paths are actually executed.
AGNTCY (Governance and Policy Agent): A policy-aware vulnerability and VEX (Vulnerability Exploitability eXchange) reasoning agent. It synthesizes runtime evidence, dependency usage, and environmental mitigations to generate structured exploitability assertions.

B. Schema Extensions & Standards Alignment

Schema: The framework introduces minimal, standards-aligned extensions to CycloneDX and SPDX formats. These extensions capture execution context, dependency evolution timelines, and agent decision provenance.
VEX & CSAF: The system anchors agentic reasoning to ISO/IEC 20153:2025 (CSAF v2.0). Instead of simple enforcement, agents generate VEX assertions (e.g., "Not Affected," "Affected: Mitigated," "Requires Review") based on runtime evidence.
Cryptographic Binding: A composite hash ( $H = \text{SHA256}(\text{SBOM} + \text{Script} + \text{Config} + \text{Output})$ ) links the AIBOM artifact to the analytical output, ensuring forensic accountability.

C. Evaluation Strategy

The framework was evaluated in controlled, policy-bound environments (TREs) using:

Workloads: Logistic regression (R), data processing (Python/pandas), and disclosure control simulations (R/sdcMicro).
Benchmarks: Compared against ReproZip, SciUnit, and ProvStore.
Metrics:
- Exact Parity (EP): Byte-identical output match.
- Semantic Parity (SP): Numerical equivalence within tolerance ( $\epsilon$ ).
- Environment State Match: Fidelity of reconstructed dependencies.
- VEX Alignment: Consistency of exploitability assertions.
Ablation Studies: Agents were individually disabled to prove their necessity.

3. Key Contributions

Conceptual Shift: Formalizes the transition from static SBOMs to Agentic AIBOMs, transforming them into active, reasoning-capable provenance artifacts.
Multi-Agent Framework: Defines a concrete architecture (MCP, A2A, AGNTCY) that enables dynamic dependency discovery, real-time exploitability reasoning, and policy binding, which deterministic scripts cannot achieve.
Standards-Compliant Schema: Proposes schema extensions that preserve interoperability with existing SBOM generators while adding critical runtime context and agent decision traces.
VEX Integration: Demonstrates the generation of contextual VEX assertions grounded in runtime telemetry and ISO/IEC 20153:2025 semantics, moving beyond simple CVE listing.
Compliance Matrix: Maps AIBOM fields to regulatory requirements (GDPR, NIST SP 800-53, ISO/IEC 27001), providing a blueprint for automated compliance verification in regulated workflows.

4. Results

Reproducibility: The Agentic AIBOM implementation achieved a 98.6% reproducibility score, significantly outperforming ReproZip (87.4%), SciUnit (73.2%), and ProvStore (61.8%).
Performance: The system introduced low computational overhead (~4% CPU, <300MB RAM), making it suitable for production environments.
Dependency Fidelity: Ablation studies showed that removing the A2A agent led to missed runtime dependencies and unavailable exploitability context, while removing MCP increased False Negative Rates (FNR) by 14% due to incomplete baseline capture.
VEX Accuracy: The system successfully generated contextual VEX assertions (e.g., distinguishing between "Not Affected" and "Mitigated") based on whether vulnerable code paths were executed, reducing noise in vulnerability reporting.
Cross-TRE Consistency: The framework enabled consistent computation and auditability across heterogeneous Trusted Research Environments (e.g., NHS Digital, University TREs).

5. Significance

Active Governance: The paper reframes the Bill of Materials from a passive inventory to an active cybersecurity object capable of issuing contextual vulnerability and reproducibility assertions.
Regulatory Readiness: By aligning with ISO/IEC 20153:2025 (CSAF v2.0) and integrating VEX, the framework provides machine-verifiable, regulator-ready evidence for software supply chain assurance.
Scalability: The low overhead and automated nature of the agentic approach address the scalability crisis in vulnerability management, where manual assessment of millions of components is impossible.
Future-Proofing: The architecture is designed to handle dynamic software ecosystems (late binding, federated services) and serves as a foundation for next-generation supply chain security, applicable beyond just research environments to critical infrastructure and CI/CD pipelines.

Limitations & Future Work:
The current implementation focuses on contextual VEX assertion generation; full policy-gated enforcement (automatically blocking jobs based on VEX status) and complete CSAF advisory lifecycle management are scoped as future work. Additionally, the system currently relies on rule-based agents, with potential for future learning-assisted decision support if explainability constraints are maintained.