Correlation of divergency: c-delta. Being different in a similar way or not

Imagine you are a music critic trying to understand two different bands.

The Old Way (Pearson Correlation):
Usually, when statisticians compare two groups of data, they ask: "Do these two things move together?"
If Band A plays a loud note, does Band B also play a loud note? If Band A plays a soft note, does Band B play a soft note? This is like checking if two people are walking in step. If they are, they are "correlated."

The New Way (The $c\delta$ Coefficient):
Johan Hoorn's new paper introduces a different question. Instead of asking if the notes match, he asks: "Do the bands have the same 'personality' regarding how they vary?"

He calls this the Correlation of Divergency ( $c\delta$ ).

Here is the simple breakdown of what this paper is about, using everyday analogies:

1. The Core Idea: "Being Different in the Same Way"

Imagine you have two groups of people:

Group X: A class of students taking a math test.
Group Y: A class of students taking a history test.

Standard Correlation would ask: "Did the student who got an 'A' in Math also get an 'A' in History?" (Are the specific scores linked?)

The $c\delta$ Coefficient asks a stranger question:

Look at Student #1 in Math. They are an outlier; they are very different from everyone else in the class (maybe they got a 99% while everyone else got 50%).
Now look at Student #1 in History. Are they also an outlier there? Are they very different from the other history students?
If Student #1 is the "weirdo" in both classes, and Student #5 is the "average Joe" in both classes, then the pattern of difference is the same.

The Analogy:
Think of two different forests.

Forest A has one giant redwood tree that towers over everything else, while the rest are small saplings.
Forest B has one giant oak tree that towers over everything else, while the rest are small saplings.

The trees aren't the same species (the values aren't the same), and they aren't growing in the same spot. But the structure of the forest is identical: both have one "loud" outlier and a bunch of "quiet" normal trees. The $c\delta$ coefficient measures this structural similarity. It answers: "Are these two groups 'different' in the same way?"

2. How It Works (The Recipe)

The paper explains a mathematical recipe to calculate this:

Look at one person in Group X.
Measure how far away they are from everyone else in Group X. (Are they an outlier? Or are they right in the middle?)
Do the exact same thing for the matching person in Group Y.
Compare the two measurements. If Person #1 is "far from the crowd" in both groups, that's a match. If Person #1 is "far from the crowd" in Group X but "right in the middle" in Group Y, that's a mismatch.
Do this for everyone and average the results.

3. Why Do We Need This?

The author argues that standard tools (like Pearson's correlation) miss a huge part of the story.

Quantum Physics: Imagine comparing two quantum machines. They might produce different numbers, but if the way those numbers scatter is identical, the machines might be working on the same principle. $c\delta$ can spot that hidden similarity.
Genetics: Maybe two different species of birds have different beak sizes overall, but the pattern of variation (some huge, some tiny, some average) is exactly the same. This suggests a shared evolutionary pressure.
Quality Control: If Machine A and Machine B both produce parts, but Machine A's "bad" parts are the ones that are wildly different, and Machine B's "bad" parts are also the wildly different ones, $c\delta$ tells you they have the same "flaw pattern," even if the actual sizes are different.

4. The Catch (Limitations)

The paper is very honest about the flaws of this new tool:

It can't say "No": Standard correlation can be negative (meaning "when one goes up, the other goes down"). $c\delta$ is always positive. It can tell you the patterns are similar, but it can't easily tell you if they are "opposite" (like a mirror image). The author suggests a fix: just run a standard correlation on the "divergence scores" to check the direction.
It hates outliers: Because it uses "squared differences" (math-speak for squaring the distance), one single crazy number can blow up the whole result. It's like if one student in a class got a 1,000,000 on a test; it would make the whole class look "divergent." The author suggests using a "robust" version (using absolute differences) if the data is messy.
It's hard to compare: A score of "5" in one study might mean something totally different than a score of "5" in another. You have to normalize it (scale it) to make sense of it.

5. The Bottom Line

This paper introduces a new statistical lens.

Old Lens: "Do these two things move together?"
New Lens ( $c\delta$ ): "Do these two groups have the same 'shape' of variety?"

It's a tool for when you want to know if two groups are structurally similar in how they spread out, even if the actual numbers they produce are completely different. It's like realizing two different jazz bands are improvising in the exact same chaotic style, even if they are playing different instruments.

Here is a detailed technical summary of the paper "Correlation of Divergency: $c\delta$ . Being Different in a Similar Way or Not" by Johan F. Hoorn (2026).

1. Problem Statement

Traditional statistical correlation coefficients (e.g., Pearson's $r$ , Spearman's $\rho$ ) measure the linear or monotonic association between paired values ( $x_i$ and $y_i$ ). However, there is a distinct methodological gap in quantifying whether the internal structure of variability (divergence patterns) within one dataset mirrors that of another.

Existing metrics like Energy Distance, Maximum Mean Discrepancy (MMD), or Quantum Fidelity compare distributions or state distinguishability but do not specifically assess whether the pattern of how individual data points deviate from their group mean (or other group members) is similar across two paired groups. The author posits a need for a measure to answer: "Are these two groups 'different in the same way'?" This is particularly relevant in fields like quantum physics (comparing spread of measurement outcomes), genetics (comparing divergence patterns between species), and machine learning (benchmarking variability structures).

2. Methodology: The $c\delta$ Coefficient

The paper introduces the Correlation of Divergency ( $c\delta$ ), a custom statistic designed to measure the similarity of internal divergence patterns between two paired groups of values, $X$ and $Y$ , each of size $n$ .

Core Calculation Steps

The method operates on a three-step hierarchical process:

Calculate Individual Divergence Magnitudes ( $D_{x,i}$ and $D_{y,i}$ ):
For every data point $i$ in a group, calculate its root mean square (RMS) distance from all other points in that same group.
$D_{x,i} = \sqrt{\frac{1}{n-1} \sum_{j \neq i} (x_i - x_j)^2}$
$D_{y,i} = \sqrt{\frac{1}{n-1} \sum_{j \neq i} (y_i - y_j)^2}$
Note: An absolute-difference variant (L1 norm) is also proposed to improve robustness against outliers.
Compute the Numerator (Signal):
Calculate the sum of the cross-products of these divergence magnitudes across the paired groups. This assesses if high divergence in $X$ corresponds to high divergence in $Y$ .
$\text{Numerator} = \sum_{i=1}^{n} (D_{x,i} \cdot D_{y,i})$
Compute the Denominator (Noise/Normalization):
Calculate the product of the average internal divergences for both groups to ensure scale invariance.
$\text{Denominator} = \bar{D}_x \cdot \bar{D}_y = \left( \frac{1}{n}\sum D_{x,i} \right) \cdot \left( \frac{1}{n}\sum D_{y,i} \right)$
Final Formula:
$c\delta = \frac{\sum_{i=1}^{n} (D_{x,i} \cdot D_{y,i})}{\bar{D}_x \cdot \bar{D}_y}$

Extensions and Variants

Absolute Difference Variant: Replaces squared differences with absolute differences ( $|x_i - x_j|$ ) to reduce sensitivity to outliers (analogous to Gini Mean Difference).
Complex/Quantum Extension: Theoretically extendable to complex numbers (using squared modulus) or quantum density matrices (using trace distance or Hilbert-Schmidt distance), though the author notes this requires rigorous justification regarding contractivity properties.
Directionality Correction: Since $c\delta$ is non-negative by construction, it cannot distinguish between similar and inverted divergence patterns. The author proposes calculating a secondary Pearson/Spearman correlation between the vectors $D_x$ and $D_y$ to determine directionality.

3. Key Contributions

Novel Statistical Metric: Introduces $c\delta$ as a distinct measure focusing on structural similarity of dispersion rather than value association.
Scale Invariance: The normalization by average RMS divergence ensures the metric is invariant to linear scaling of the data.
Comparative Framework: Provides a comprehensive comparison (Table 1) between $c\delta$ and established metrics (Pearson, Spearman, GMD, Energy Distance, MMD, KL Divergence, Quantum Fidelity), highlighting that $c\delta$ occupies a unique niche.
Inference Framework: Proposes permutation testing and bootstrapping (BCa method) as the primary methods for hypothesis testing and confidence interval estimation, as no closed-form null distribution exists.
Normalization Strategy: Proposes rescaling $c\delta$ by a sample-specific maximum ( $c\delta_{max}$ , calculated by comparing a set to itself) to create a bounded index $[0, 1]$ for interpretability, while acknowledging the limitations of cross-study comparability.

4. Results and Properties

Range: Theoretically $[0, \infty)$ . A value near 0 indicates dissimilar divergence patterns; a high value indicates similar patterns.
Self-Similarity: When $X = Y$ , $c\delta$ reaches its maximum possible value for that specific dataset ( $c\delta_{max}$ ).
Robustness: The standard squared-difference version has a breakdown point near 0 and is highly sensitive to outliers (quadratic influence function). The proposed absolute-difference (L1) variant offers significantly better robustness.
Small Sample Behavior: The metric is undefined for $n=1$ and unstable for $n < 10$ .
Limitations:
- Cannot produce negative values (cannot detect inverse divergence patterns without the proposed directional supplement).
- Undefined if one group has zero variance (all values identical).
- No universal upper bound without empirical rescaling.

5. Significance and Applications

The paper argues that $c\delta$ addresses a critical gap in statistical methodology by allowing researchers to compare variability structures rather than just means or direct associations.

Potential Applications:

Quantum Physics: Benchmarking quantum simulators by comparing the spread of simulated outcomes against theoretical or experimental reference systems.
Genetics & Evolution: Assessing if patterns of genetic divergence between individuals are analogous across different species.
Psychometrics: Determining if inter-individual differences are consistent across different test conditions.
Manufacturing: Comparing variability patterns between different production batches or machines.
Machine Learning: Validating clustering algorithms by comparing the internal cohesion/divergence of identified clusters.

Conclusion

The paper concludes that while $c\delta$ is not a replacement for Pearson or Spearman correlation, it is a powerful tool for specific questions regarding the similarity of dispersion patterns. The author emphasizes the need for robust variants (L1 or rank-based), permutation-based inference, and careful reporting (including raw values, normalized ratios, and p-values) to ensure valid interpretation. Future work is suggested to derive asymptotic distributions, refine quantum extensions, and develop open-source software implementations.

Correlation of divergency: c-delta. Being different in a similar way or not

1. The Core Idea: "Being Different in the Same Way"

2. How It Works (The Recipe)

3. Why Do We Need This?

4. The Catch (Limitations)

5. The Bottom Line

1. Problem Statement

2. Methodology: The cδc\deltacδ Coefficient

Core Calculation Steps

Extensions and Variants

3. Key Contributions

4. Results and Properties

5. Significance and Applications

Conclusion

More like this

Unified Probe of Quantum Chaos and Ergodicity from Hamiltonian Learning

Rethinking quantum smooth entropies: Tight one-shot analysis of quantum privacy amplification

Quantum State Certification via Effective Parent Hamiltonians from Local Measurement Data

Fundamental Limits on Polarization Entanglement Distribution in Optical Fiber

Markovian quantum master equations are exponentially accurate in the weak coupling regime

2. Methodology: The $c\delta$ Coefficient