WELLDOC property for words generated by morphisms

Imagine you are baking a giant, infinite loaf of bread. This isn't just any bread; it's made by a magical recipe (a morphism) that takes a small piece of dough, stretches it out, and replaces every crumb with a specific pattern of new crumbs. You keep doing this forever, creating a never-ending sequence of flavors.

Now, imagine you are a food critic trying to taste every possible combination of flavors in this infinite loaf. You want to know: Is the bread "fairly" distributed?

This paper is about a specific kind of fairness called WELLDOC (Well Distributed Occurrences).

The Big Idea: The "Fairness" Test

Let's say you are looking for a specific flavor combination, like "Chocolate-Chip" (a factor). You want to find every time "Chocolate-Chip" appears in the infinite loaf.

The WELLDOC property asks a very strict question:

"No matter what 'flavor profile' (a specific count of ingredients) you want to see before the Chocolate-Chip appears, can you find a spot in the bread where that profile exists?"

To make this concrete, imagine the bread is made of three ingredients: Flour, Sugar, and Salt.

The Parikh Vector: This is just a tally sheet. If you look at the first 100 crumbs, how many are Flour? How many are Sugar? How many are Salt? That's your vector.
The Modulo Test: The paper asks: "If I want to find a spot where, before the Chocolate-Chip, I have seen exactly 3 more Sugars than Flours (modulo 5), can I find it?"

If the answer is "Yes" for every possible ingredient count and every possible number you choose, the bread has the WELLDOC property. It means the ingredients are scattered so perfectly that you can find any statistical pattern you want before any specific flavor appears.

The Problem: The "Lattice" Trap

Why do we care? The authors explain that this concept comes from pseudorandom number generators (the math behind computer randomness).

Imagine a machine that generates numbers. Sometimes, these machines have a hidden flaw called a "lattice structure." It's like if you tried to scatter marbles on a floor, but they accidentally lined up in perfect, invisible rows and columns. If you look at the floor from a certain angle, you see gaps where no marbles exist. A truly random (or "well-distributed") floor should have marbles everywhere, with no predictable gaps.

The authors found that if a word (or a sequence of numbers) has the WELLDOC property, it has no lattice structure. It is perfectly "messy" in a good way.

The Solution: The Magic Recipe (Morphisms)

The paper focuses on words generated by morphisms (the magical recipes). The big question was: How do we know if a specific recipe will produce a perfectly fair loaf?

The authors discovered a simple "litmus test" based on the recipe's Matrix (a grid of numbers that describes how the recipe transforms ingredients).

1. The Binary Case (Two Ingredients: 0 and 1)

If your bread only has two ingredients (like a simple binary code), the test is incredibly simple:

Look at the recipe's matrix.
Calculate its Determinant (a single number that tells you how the recipe stretches or shrinks space).
The Rule: If the determinant is 1 or -1, the bread is perfectly fair (WELLDOC). If it's anything else (like 2, 3, or 4), the bread has "gaps" (lattice structure) and fails the test.

Analogy: Think of the determinant as the "stretch factor." If you stretch a rubber sheet by a factor of 2, you leave empty spaces. If you stretch it by exactly 1 (or flip it, -1), you cover every inch perfectly without gaps.

2. The Complex Case (Three or More Ingredients)

If you have 3, 4, or more ingredients, the "Determinant = 1" rule is necessary but not enough.

Condition A: The determinant must still be 1 or -1.
Condition B: You must also check the "Return Words."
- What is a Return Word? Imagine you are walking through the bread. Every time you see the first ingredient (say, "0"), you start counting. The "Return Word" is the path you take until you see "0" again.
- The Rule: The "paths" (vectors) you take between every "0" must be able to build any possible combination of ingredients. If your paths are too repetitive or restricted, you'll miss some flavor profiles, and the bread fails.

Why This Matters

The authors didn't just find a rule; they proved it works both ways (if the rule holds, the property holds; if the property holds, the rule must hold).

They also showed that for complex recipes, you can algorithmically check if the "Return Words" are good enough. This means a computer can easily tell you if a specific mathematical recipe will generate a "perfectly random" sequence.

The Takeaway

In simple terms, this paper gives us a cheat sheet for creating perfect randomness using simple, repeating rules.

The Goal: Create a sequence where every pattern appears with every possible statistical background.
The Tool: A mathematical recipe (morphism).
The Test:
1. Check the recipe's "stretch factor" (Determinant). It must be 1.
2. (For complex recipes) Check if the "loops" between the start of the sequence can reach every corner of the ingredient space.

If you pass the test, you have a sequence that is mathematically "fair," with no hidden lattice structures, making it ideal for generating high-quality random numbers for cryptography, simulations, and computer science.

Here is a detailed technical summary of the paper "WELLDOC property for words generated by morphisms" by Svetlana Puzynina and Vladimir Shavelev.

1. Problem Statement

The paper addresses the WELLDOC (Well Distributed Occurrences) property of infinite words. This property is an abelian-type condition regarding the regularity of the distribution of factors (substrings) within an infinite word.

Context: The property was originally introduced in the context of pseudorandom number generators (specifically Linear Congruential Generators) to avoid the "lattice structure" defect, where generated points lie on equidistant hyperplanes rather than filling the space uniformly.
Definition: An infinite word $w$ over an alphabet $\Sigma$ satisfies the WELLDOC property if, for every factor $u$ of $w$ , every integer modulus $m$ , and every vector $v \in (\mathbb{Z}/m\mathbb{Z})^{|\Sigma|}$ , there exists an occurrence of $u$ such that the Parikh vector (the vector of letter counts) of the prefix preceding that occurrence is congruent to $v$ modulo $m$ .
The Gap: While it was known that Sturmian and episturmian words satisfy this property, a general characterization for words generated by morphisms (substitutions) was an open question. The authors aim to provide a necessary and sufficient criterion for morphic words to satisfy the WELLDOC property.

2. Methodology

The authors employ a combination of combinatorics on words, algebraic number theory, and symbolic dynamics.

Morphic Words: They focus on infinite words generated by prolongable morphisms $\phi$ (where $w = \lim_{n \to \infty} \phi^n(a)$ ).
Algebraic Tools:
- Parikh Vectors: Used to map words to integer vectors representing letter frequencies.
- Morphism Matrices ( $A_\phi$ ): The incidence matrix of the morphism, where entry $(i, j)$ counts the occurrences of letter $i$ in $\phi(j)$ .
- Group Theory: Analysis of additive groups generated by Parikh vectors of "return words" (factors between consecutive occurrences of a specific letter).
- Modular Arithmetic: Utilizing properties of invertible matrices over $\mathbb{Z}$ and $\mathbb{Z}/m\mathbb{Z}$ (specifically that a matrix is invertible over $\mathbb{Z}$ iff $\det = \pm 1$ ).
Symbolic Dynamics:
- Recognizability: They utilize the concept of "recognizable morphisms" (specifically Mossé's theorem) to establish the existence of "cutting factors" in aperiodic recurrent words.
- Cutting Points: Indices in the word that align with the boundaries of morphic images.

3. Key Contributions and Results

The paper provides a complete characterization of the WELLDOC property for morphic words, distinguishing between binary and non-binary alphabets.

A. Main Theorems

Binary Case (Theorem 1):
For an infinite recurrent binary word generated by a morphism $\phi$ , the word satisfies the WELLDOC property if and only if the determinant of the morphism's matrix is $\pm 1$ ( $\det A_\phi = \pm 1$ ).
- Note: The authors explicitly exclude trivial periodic words like $0^\infty $or$ 1^\infty$ from this specific determinant condition, noting they satisfy WELLDOC trivially despite having different determinants.
Non-Binary Case (Theorem 2):
For an infinite word generated by a morphism $\phi$ over an alphabet of size $\sigma \geq 2$ , the word satisfies the WELLDOC property if and only if:
- $\det A_\phi = \pm 1$ , AND
- The Parikh vectors of all returns to the first letter of the word generate the additive group $\mathbb{Z}^\sigma$ .
- Significance: The determinant condition alone is insufficient for $\sigma > 2$ . The structure of return words is crucial.

B. Decidability

The authors prove that the condition regarding the generation of $\mathbb{Z}^\sigma$ by return vectors is decidable (Proposition 7). They provide an algorithm that:

Computes the set of Parikh vectors of return words modulo $m$ .
Checks if these vectors generate $(\mathbb{Z}/m\mathbb{Z})^\sigma$ for all prime moduli.
Uses the fact that if the condition fails for any prime $p$ , the global condition fails.

C. Applications and Examples

Sturmian Words: The paper re-proves that Sturmian words satisfy the WELLDOC property. Since Sturmian morphisms are compositions of elementary morphisms with determinant $\pm 1$ , the condition holds.
Episturmian Words: Similarly, standard episturmian words generated by morphisms satisfy the property because their morphisms also have determinant $\pm 1$ and their return vectors generate the necessary group.
Counter-Example: The authors construct a non-binary morphism where $\det A_\phi = 1$ , but the WELLDOC property fails because the return vectors do not generate $\mathbb{Z}^\sigma$ . This highlights the necessity of the second condition in the non-binary case.

4. Proof Strategy Overview

Sufficiency (Section 4):
- They first show that if $\det A_\phi = \pm 1$ , the morphism acts as an automorphism on the vector space modulo $m$ .
- They reduce the problem to checking the property only for the first letter (0) and only for prime moduli.
- They prove that if the return vectors generate $\mathbb{Z}^\sigma$ , then the set of Parikh vectors of prefixes preceding '0' covers all residue classes modulo $m$ .
Necessity (Section 5):
- They utilize the recognizability of morphisms for aperiodic points (Theorem 5 from [3]).
- They prove that for any aperiodic recurrent word generated by a morphism with $\det A_\phi \neq \pm 1$ , there exists a "cutting factor" (a factor that forces the prefix length to align with specific cutting points).
- They demonstrate that the existence of such a factor restricts the Parikh vectors of preceding prefixes to a proper subgroup of $(\mathbb{Z}/m\mathbb{Z})^\sigma$ for some $m$ , thereby violating the WELLDOC property.

5. Significance

Theoretical Advancement: This work resolves an open question regarding the combinatorial characterization of WELLDOC for morphic words. It bridges the gap between algebraic properties of morphisms (determinants) and combinatorial properties of the generated words (distribution of factors).
Pseudorandom Generation: The results offer a rigorous method for selecting morphisms to generate infinite words that, when used to drive Linear Congruential Generators, produce sequences free of lattice structure defects. This is vital for high-quality pseudorandom number generation.
Algorithmic Utility: By proving the decidability of the condition, the paper enables the automated verification of whether a given morphism will produce a "good" word for these applications.
Generalization: The distinction made between binary and non-binary alphabets reveals a deeper structural complexity in higher-dimensional morphic words, specifically the role of return words in generating the full lattice.

In summary, Puzynina and Shavelev establish that for morphic words, the "goodness" of factor distribution (WELLDOC) is strictly tied to the invertibility of the morphism's incidence matrix and the spanning capability of its return word vectors.