Agentic AI-assisted coding offers a unique opportunity… — Plain-Language Explanation

Original authors: Magnus Palmblad, Jared M. Ragland, Benjamin A. Neely

Published 2026-04-24

📖 4 min read☕ Coffee break read

Original authors: Magnus Palmblad, Jared M. Ragland, Benjamin A. Neely

Original paper dedicated to the public domain under CC0 1.0 (http://creativecommons.org/publicdomain/zero/1.0/). ⚕️ This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you've just hired a brilliant, hyper-fast, but slightly reckless apprentice to build you a custom house. This apprentice (the Agentic AI) can read your instructions and start laying bricks, wiring electricity, and painting walls at lightning speed. This is what the authors call "vibe coding": you tell the AI what you want ("I need a house with a blue door and a pool"), and it just goes for it.

The problem? The apprentice is incredibly talented at building, but they don't know the laws of physics or building codes. If you ask for a pool on the roof, they might build it exactly as requested, only for the whole house to collapse because gravity wasn't considered. In science, this is dangerous. If an AI writes code to analyze medical data but ignores the rules of statistics, it might produce a "cure" that doesn't actually work.

The Solution: The "Constitution" for AI

The authors propose a new file called GROUNDING.md. Think of this not as a to-do list, but as the Constitution or the Building Code for your specific field (in this case, proteomics, which is the study of proteins).

Here is how it works, using simple analogies:

1. The Hierarchy of Instructions

Usually, when you talk to an AI, you give it a plan. The authors suggest a hierarchy of documents, like a set of Russian nesting dolls:

plan.md (The To-Do List): "Build a house with a blue door." (This is temporary and changes every time).
AGENTS.md (The Project Rules): "Use red bricks and paint the trim white." (Specific to this job).
SKILL.md (The Toolbox): "Here is how to install a window." (General techniques).
GROUNDING.md (The Constitution): "NO HOUSE CAN BE BUILT WITHOUT A FOUNDATION."

The GROUNDING.md sits at the very bottom, the deepest layer. It is the Field-Scoped Epistemic Grounding. "Epistemic" is a fancy word for "knowledge about what is true." This document contains the non-negotiable truths of the scientific field.

2. Hard Constraints vs. Convention Parameters

The paper divides the rules in this "Constitution" into two types:

Hard Constraints (HCs): These are the Red Lines. They are like the laws of physics.
- Example: "You cannot calculate the probability of a protein match without using a specific safety check (False Discovery Rate)."
- Analogy: If the AI tries to build a wall without a foundation, the GROUNDING.md slams the brakes. It says, "STOP. This violates the laws of science. I will not build this." It overrides whatever the user asked for.
Convention Parameters (CPs): These are the Community Preferences.
- Example: "We usually use blue paint for the front door, but red is okay if you have a good reason."
- Analogy: If the AI uses red paint, it gets a gentle tap on the wrist: "Hey, we usually do blue, but okay, I'll note that." It warns the user but doesn't stop the work.

3. Why Do We Need This?

Currently, if a non-expert (a "research programmer") asks an AI to write complex scientific software, the AI might invent a new way to do things that looks cool but is scientifically wrong. It's like the apprentice inventing a new type of cement that dissolves in rain.

The GROUNDING.md ensures that even if the person asking for the software doesn't know the deep science, the software itself knows the rules. It acts as a guardrail.

Without it: The AI optimizes for "what the user wants" (e.g., "Make it fast!"), potentially breaking the science.
With it: The AI optimizes for "what is scientifically valid," even if the user didn't ask for it.

The Big Picture

The authors are saying: "We are entering an era where AI can write almost any code. But if we don't give the AI a 'Constitution' that encodes the hard-won wisdom of our scientific community, we will end up with a lot of fast, broken software."

By creating a GROUNDING.md file, the scientific community can say to the AI: "You are the genius builder, but you must follow our rules. If you try to break the rules, you must stop and ask for help."

This allows non-experts to build powerful, custom scientific tools with the confidence that the "foundation" is solid, ensuring that the final product is trustworthy, reproducible, and actually works in the real world. It turns the AI from a wild genius into a disciplined, rule-following master craftsman.

1. Problem Statement

The rapid adoption of "vibe coding" (high-level, intent-driven AI coding) and agent scaffolds (e.g., Claude Code, Cursor, Copilot) allows non-domain experts to generate bespoke scientific software. However, this democratization introduces a critical validity gap:

Epistemic Drift: AI agents may satisfy user intent while violating field-specific scientific invariants (e.g., incorrect False Discovery Rate calculations or uncontrolled modification searches in proteomics).
Context Engineering Limitations: Current context files (e.g., plan.md, AGENTS.md, SKILL.md) focus on workflow, style, or specific techniques but lack mechanisms to enforce non-negotiable domain validity constraints.
The Risk: Without formalized constraints, agentic AI could produce scientifically invalid software, leading to fragmentation of best practices and the "reinvention of the wheel" with errors.

2. Methodology

The authors propose a new layer in the context engineering hierarchy called GROUNDING.md, a community-governed, field-scoped epistemic grounding document.

A. The `GROUNDING.md` Framework

This document is designed to be loaded into agent scaffolds with the highest priority (via system prompts) to override conflicting instructions from user plans or project rules. It encodes two distinct types of constraints:

Hard Constraints (HCs): Non-negotiable validity invariants empirically required for scientific correctness (e.g., "FDR must be $\le$ 0.01 via target-decoy"). Violations trigger an error/refusal.
Convention Parameters (CPs): Community-agreed defaults (e.g., "use label-free intensity for quantification"). Violations trigger a warning but allow flexibility, enabling the field to evolve best practices without sacrificing core validity.

B. Hierarchy of Context

The paper establishes a hierarchy of context stability and authority (Table 1):

Session/Task (plan.md): Ephemeral, user intent.
Project (AGENTS.md): Persistent project rules.
Technique (SKILL.md): Reusable method steps.
Field (GROUNDING.md): Invariant, field-scoped. This layer is the most authoritative and constrains all layers below it.

C. Implementation Strategy

Format: Compact natural language (human-readable for governance) but structured for agent consumption.
Enforcement: The document acts as a "proteomics Code of Hammurabi," explicitly defining invariants, conventions, and failure modes.
Testing Protocol: The authors tested the concept using Claude Code with a Nemotron model. They employed an "adversarial" setup where a CLAUDE.md file instructed the AI to ignore scientific validity, while the GROUNDING.md enforced it. They utilized system prompts (rather than XML tags) to ensure GROUNDING.md took precedence due to context primacy bias.

3. Key Contributions

Conceptual Innovation: Introduction of the GROUNDING.md file type, distinguishing it from existing context files by its focus on epistemic validity rather than workflow or style.
Technical Mechanism: A formalized method to separate Hard Constraints (validity) from Convention Parameters (best practices), allowing the scientific community to update standards (CPs) without breaking the core validity of generated code (HCs).
Domain Application: A draft Proteomics GROUNDING.md is provided as a proof-of-concept, covering functional correctness, algorithmic efficiency, interoperability, and testing/validation standards specific to mass spectrometry.
Governance Model: A proposal for community-governed documents that act as a "contract" between domain experts and AI agents, ensuring that even non-expert developers produce software adhering to consensus standards (e.g., HUPO-PSI guidelines).

4. Results (Preliminary Testing)

The authors conducted preliminary "proof of principle" testing with the following outcomes:

Adversarial Resistance: When an adversarial prompt (CLAUDE.md) instructed the AI to ignore scientific validity, the GROUNDING.md (loaded via system prompt) successfully overrode the instruction.
Explicit Refusal: The agent explicitly refused to generate non-compliant code, cited the specific Hard Constraint from GROUNDING.md, and explained the scientific invalidity of the requested approach.
Implementation Nuance: The study found that system prompts were more effective than XML tagging for enforcing priority. Nesting GROUNDING.md in skill folders was found to be ineffective as it subordinated the field rules to method-specific rules.
Success Criteria: Success was defined as the agent refusing to generate code that violated HCs, rather than successfully generating the code.

5. Significance and Future Implications

Democratization with Safety: Enables non-domain experts to generate high-quality, bespoke scientific software by "baking in" best practices at the ground level, reducing the barrier to entry while maintaining scientific rigor.
Shift in Bottlenecks: As AI models approach a "country of geniuses" in coding capability, the bottleneck shifts from code generation to validity assurance. GROUNDING.md addresses this shift directly.
Reproducibility and Trust: By enforcing self-documentation of versions, packages, and the specific GROUNDING.md commit used, the approach enhances the reproducibility and portability of AI-generated tools.
Scalability: While currently demonstrated in proteomics, the framework is applicable to any domain requiring strict epistemic standards (e.g., biostatistics, systems biology) and could be adopted by organizations like HUPO-PSI or FAIR4RS.

Conclusion: The paper argues that for agentic AI to be a reliable partner in scientific discovery, it must be anchored by explicit, community-validated epistemic documents. GROUNDING.md serves as this anchor, ensuring that the "great power" of agentic coding comes with "great responsibility" regarding scientific validity.

Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development