Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

The Big Idea: Stop Guessing the Name, Start Measuring the Shape

Imagine you are a detective looking at a blurry photo of a crime scene.

The Old Way (Semantics-First):
Right now, most scientists look at an image and immediately try to guess what it is. "Is that a dog? Is that a car? Is that a tumor?" They train their computers to recognize specific labels (like "dog" or "tumor") based on what they were taught in school.

The Problem: What if the "dog" is actually a wolf? What if the "tumor" looks different because the camera changed? What if, in 10 years, doctors decide to rename that type of cell? The whole system breaks because it was too obsessed with the name of the thing, not the thing itself.

The New Way (Criteria-First, Semantics-Later):
This paper proposes flipping the script. Instead of guessing the name first, the computer should first figure out the shape, boundaries, and structure of the objects based on strict, unchangeable rules (criteria).

The Solution: The computer says, "I don't know if this is a dog or a wolf yet. But I do know there is a distinct, round, furry shape here with a sharp edge against the grass."
The Benefit: Once the computer has mapped out the "furry shape," then humans can decide what to call it. If the definition of "dog" changes later, or if we want to call it a "wolf," we just change the label. The underlying map of the shape stays the same.

The "Lego" Analogy

Think of image analysis like building with Lego bricks.

The Old Way (Semantics-First): You try to build a specific model (like a "Spaceship") immediately. You only look for red bricks because the instructions say "Spaceships need red bricks." If you find a blue brick that looks exactly like a red one but is slightly different, you ignore it. If the instructions change tomorrow to say "Spaceships need blue bricks," you have to tear down your whole building and start over.
The New Way (Criteria-First): You first sort all the bricks by their physical properties: "This is a 2x4 brick," "This is a smooth plate," "This has a stud on top." You build a stable, solid base structure based on these physical rules.
- Why this is better: Once you have the stable base, you can decide later: "Okay, today we are calling this a Spaceship." Next year, if the rules change, we can say, "Actually, this is a Castle." The bricks and the base didn't change; only the story we tell about them changed.

The "Digital Twin" Problem

The paper talks a lot about Digital Twins (virtual copies of real-world things, like a forest or a human heart) and Long-term Monitoring.

Imagine you are watching a forest over 50 years.

The Old Way: You label every tree as "Oak" or "Pine." But over 50 years, scientists might realize that what we thought was an "Oak" is actually two different species, or the definition of "Pine" changes. Suddenly, your 50-year data is a mess because the labels don't match anymore. You can't compare last year's data to this year's.
The New Way: You don't label them "Oak" or "Pine." Instead, you use a strict rule to say, "Here is a distinct, tall, green object with a trunk." You record the shape and size of that object.
- Now, 50 years later, even if the names of the trees have changed, you can still compare the shapes. You can see, "The tall green objects are getting smaller." The data remains stable and useful, regardless of how we name the trees.

The "Translation" Metaphor

Think of the image data as a foreign language that no one speaks yet.

Semantics-First is like trying to translate a sentence word-for-word into English immediately. If the grammar of the foreign language changes slightly, your translation becomes nonsense.
Criteria-First is like first identifying the syntax and grammar rules of the foreign language (where the nouns go, where the verbs go). Once you understand the structure of the sentence, you can translate it into English, French, or Spanish later. If the meaning of a specific word changes, you just update the translation dictionary; you don't have to re-analyze the whole sentence structure.

Why This Matters for Science

Reproducibility: If two scientists use the same "rules" (criteria) to find shapes, they will get the exact same result, even if they are in different countries.
Future-Proofing: Science changes. New discoveries happen. If we lock our analysis into specific labels (like "Type A Cancer"), we get stuck when we discover "Type A" was actually wrong. If we lock it into "stable shapes," we can adapt to new discoveries easily.
AI Readiness: This makes data "AI-ready." AI can learn from the stable shapes without needing to be retrained every time a human decides to change a label.

Summary

The paper argues that we are too obsessed with naming things in science. Instead, we should focus on measuring the structure of things first using clear, unchangeable rules.

Old: "I see a dog." (What if it's not a dog?)
New: "I see a distinct, four-legged, furry shape." (This is true regardless of what we call it.)

By separating the structure (the shape) from the semantics (the name), we make science more stable, more reproducible, and ready for the future.

1. Problem Statement

The paper identifies a fundamental limitation in current image-based scientific analysis (spanning fields like remote sensing, medical imaging, microscopy, and astronomy): the dominance of the "semantics-first" paradigm.

The Current State: Analysis pipelines typically map raw image measurements directly to a predefined domain ontology or label set (e.g., classifying pixels as "forest," "tumor," or "cell type"). Success is evaluated based on agreement with these labels.
The Failure Mode: This approach fails under conditions where image-based science is most critical:
- Long-term monitoring: Domain ontologies and label sets drift culturally, institutionally, and ecologically over time.
- Cross-sensor/site variability: Changes in sensors, illumination, or geography cause "domain shift," breaking models trained on specific labels.
- Open-ended discovery: New phenomena cannot be detected if they do not fit into the pre-existing label space.
The Core Issue: Semantics are not intrinsic properties of an image but are community-specific interpretive schemes. By baking these contingent ontologies into the upstream analytic layer, current pipelines conflate structure recovery (finding patterns in data) with meaning assignment (labeling those patterns). This reduces transferability, reproducibility, and stability.

2. Methodology: The "Criteria-First, Semantics-Later" Framework

The author proposes a deductive inversion of the standard pipeline. Instead of predicting labels, the analysis should first extract a semantics-free structural product based on explicit, inspectable criteria, and only then map that structure to domain-specific meanings.

A. Formal Framework

The approach is defined by a minimal formal setup:

Measurement Field ( $X$ ): The raw data (pixels, voxels, point clouds) defined over a carrier set $\Omega$ .
Explicit Criterion ( $C$ ): A fully specified, inspectable object (functional, constraints, or energy function) that defines how distinctions are drawn. Examples include stability under perturbation, scale coherence, boundary evidence, or global consistency.
Structure Extraction Operator ( $S_C$ ): A deterministic procedure that maps $X$ to a structural product $S$ using $C$ :
$S = S_C(X)$
Structural Product ( $S$ ): The output is a domain-agnostic entity such as partitions, graphs, hierarchies, or scalar fields. It is reproducible by design because it depends only on $X$ and $C$ , not on a label set.
Downstream Semantic Mapping ( $M_i$ ): A separate, reversible step where the structural product $S$ is mapped to a domain ontology $O_i$ :
$M_i: S \rightarrow O_i$
This allows multiple, evolving semantic interpretations (pluralism) to coexist on the same stable structural foundation without rewriting the upstream extraction.

B. Theoretical Foundations

The methodology is grounded in:

Cybernetics & Observation-as-Distinction: Observation is an active operation of drawing distinctions. Semantics is the interpretive scheme that makes these distinctions communicable.
Information Theory: Following Shannon, the separation of information (structure) from meaning allows for the extraction of stable patterns (uncertainty reduction) independent of specific semantic labels.
Least-Commitment Principle: Postponing irreversible semantic commitments until after stable intermediate descriptions are computed.

3. Key Contributions

1. Conceptual Inversion

The paper formally argues for decoupling measurement-to-structure operations from structure-to-meaning operations. It posits that reproducibility resides in the explicit criteria ( $C$ ) and the stability of the structural product ( $S$ ), not in the alignment with a specific label set.

2. Unifying Cross-Domain Framework

The author demonstrates that this "criteria-first" pattern already exists implicitly in various fields but is often obscured by label-centric reporting. Table 1 in the paper synthesizes this across domains:

Earth Observation: Using spectral homogeneity and scale coherence to create regions before land-cover classification.
Medical Imaging: Extracting stable boundaries and organ contours via variational methods before assigning pathological diagnoses.
Seismology: Identifying coherent reflectors and faults via signal coherence before mapping them to stratigraphic units.
Robotics (SLAM): Building geometric maps via reprojection consistency before adding semantic labels (e.g., "room," "table").

3. New Validation Paradigm

The paper proposes shifting validation metrics away from "class accuracy" toward structural adequacy. Five evidence classes are defined for evaluating structural products:

Robustness: Stability under noise, sensor drift, and illumination changes.
Scale Coherence: Consistency across different resolutions and scale spaces.
Complexity Control: Preference for shorter descriptions (compressibility) that preserve salient regularities.
Global Optimality: Solutions derived from well-defined global criteria rather than ad-hoc heuristics.
Downstream Pluralism: The capacity of a single $S$ to support multiple, valid semantic mappings.

4. FAIR Digital Objects for Digital Twins

The paper redefines structural products as AI-ready, FAIR (Findable, Accessible, Interoperable, Reusable) digital objects.

They should be versioned, citable, and machine-actionable.
They serve as stable "state variables" for digital twins, ensuring that long-term monitoring remains comparable even as domain ontologies evolve.
Metadata for these objects must explicitly declare the criterion $C$ , software versions, and stability envelopes.

4. Results and Evidence

While the paper is primarily a theoretical and methodological proposal, it supports its argument with:

Synthetic Demonstrations: Figure 2 illustrates how a criteria-first approach yields stable object boundaries across perturbations (contrast changes, downsampling), whereas semantics-first labeling collapses or becomes inconsistent under the same shifts.
Cross-Domain Survey: The paper reviews literature across eight domains (Earth Observation, Medical Imaging, Microscopy, Seismology, Astronomy, Materials Science, 3D Sensing, Robotics). It shows that in all these fields, when labels are scarce, unstable, or too expensive, practitioners naturally revert to criteria-first methods (e.g., unsupervised clustering, variational segmentation, self-supervised learning) to extract structure before applying semantics.
Self-Supervised Learning (SSL) Alignment: The paper notes that modern SSL and foundation models (e.g., DINOv2, SAM) can be interpreted as powerful implementations of criteria-first structure extraction, provided they are used to generate structural products rather than just label amplifiers.

5. Significance and Impact

Reproducibility: By making the "theory" (the criteria) explicit and inspectable rather than implicit in a label set, scientific results become truly reproducible across different communities and time periods.
Long-term Monitoring: It solves the "ontology drift" problem. Digital twins and environmental monitoring systems can maintain stable state variables ( $S$ ) while updating their semantic interfaces ( $M_i$ ) as scientific understanding evolves.
Open-Ended Discovery: It enables the detection of novel phenomena. Instead of failing to classify an image because it doesn't fit a label, the system can flag it as a "structural deviation" from the stable regime, prompting scientific inquiry.
Infrastructure: It calls for a shift in research infrastructure: developing schemas for structural products, standardizing stability metrics, and treating structural extraction as a first-class research output rather than a preprocessing step.

In summary, the paper argues that structure precedes semantics. To achieve robust, reproducible, and long-term scientific progress in image-based sciences, the community must prioritize the extraction of stable, criterion-defined structures and treat semantic labeling as a flexible, downstream application.