Infinite Words with very Low Factor Complexity: an introduction to Combinatorics on Words

Here is an explanation of the paper "Infinite Words with very Low Factor Complexity" using simple language, creative analogies, and metaphors.

The Big Picture: Counting Patterns in an Infinite Song

Imagine you have a song that never ends. It's made of a sequence of notes (or letters). In mathematics, we call this an infinite word.

The paper asks a very specific question: How "complicated" is this song?

To measure complexity, the author uses a method called Factor Complexity. Imagine you are listening to the song and you stop to look at every possible chunk of 3 notes.

If the song is just 1-1-1-1-1..., there is only one type of 3-note chunk (111). It's very simple.
If the song is 1-2-1-2-1-2..., there are two chunks: 121 and 212. Still simple.
If the song is random noise, you might find every possible combination of notes. That is maximum complexity.

The paper is about finding the "Goldilocks" zone: words that are not boring (not just repeating the same pattern forever) but are also as simple as possible.

Chapter 1: The Binary Case (The Two-Note World)

The Old Rule (Morse & Hedlund, 1938):
Imagine a song made of only two notes, say "A" and "B".

If the song eventually settles into a repeating loop (like ABABAB...), it is considered "trivial" or boring.
The famous theorem says: If a song is NOT boring, it must have at least $n+1$ different chunks of length $n$ .
- Length 1: At least 2 chunks (A and B).
- Length 2: At least 3 chunks.
- Length 3: At least 4 chunks.

The "Sturmian" Stars:
The paper introduces a special class of songs called Sturmian words. These are the "perfect" non-boring songs. They hit the minimum limit exactly: they have exactly $n+1$ chunks of length $n$ .

The Magic Connection:
The author explains that these perfect songs are deeply connected to Continued Fractions (a way of writing numbers like a recipe of integers).

The Analogy: Think of a Sturmian word as a staircase trying to walk up a hill with a specific slope.
If the slope is a simple fraction (like 1/2), the staircase repeats itself (boring).
If the slope is an irrational number (like the Golden Ratio, $\phi$ ), the staircase never repeats, but it stays as close to a straight line as possible without ever repeating. This "straightest possible non-repeating path" creates the Sturmian word.

Chapter 2: The Harder Problem (The Multi-Note World)

The Problem:
What happens if our song has 3, 4, or 10 different notes (a "d-ary" alphabet)?

In the 2-note world, "non-boring" just meant "doesn't repeat."
In the 3-note world, you can make a song that doesn't repeat but is still "fake" or "artificial." For example, you could take a 2-note Sturmian song and just insert a "C" every now and then. It doesn't repeat, but it's not a true 3-note song; it's just a 2-note song in disguise.

The New Rule (Tijdeman's Theorem):
To find the "true" non-boring songs with 3+ notes, we need a stricter rule. The notes must have Rationally Independent Frequencies.

The Analogy: Imagine a clock with 3 hands.
- If the hands move in a ratio like 1:2:3, they will eventually all line up and repeat the pattern.
- If the hands move at speeds that are "mathematically incompatible" (like $\sqrt{2}$ , $\sqrt{3}$ , and $\sqrt{5}$ ), they will never perfectly line up again. They are "rationally independent."
The Result: Tijdeman proved that for these "true" multi-note songs, the complexity must be at least $(d-1)n + 1$ .
- For 3 notes ( $d=3$ ), you need at least $2n + 1$ chunks.
- This is the new "minimum complexity" for a rich, non-repeating world.

Chapter 3: The New Proof (The Algebraic Detective Work)

The Old Way vs. The New Way:
Tijdeman proved his theorem in 1999 using heavy combinatorial logic (counting patterns like a detective).
In 2022, the author and a colleague (J. Cassaigne) found a new, algebraic proof.

The "Flow Matrix" Metaphor:
Imagine the song as a city map (a graph).

The Streets are the chunks of letters.
The Intersections are where the chunks overlap.
The Traffic is the frequency of how often you see a chunk.

The authors built a special tool called a Flow Matrix. Think of this as a Kirchhoff's Law for Traffic.

In an electrical circuit, the current flowing into a junction must equal the current flowing out.
In a word, the number of times a chunk appears must equal the number of times it is "entered" and "exited" by the next letter.

The Breakthrough:
By treating the word as a system of equations (linear algebra), they showed that if the "traffic" (frequencies) is truly independent (irrational), the "map" (the word) must have a very specific, tree-like structure.

The "Dendric" Discovery:
They discovered that all these minimal-complexity words are Dendric.

The Analogy: "Dendric" comes from the Greek word for Tree.
If you look at how a chunk of letters can be extended (what letter can go before it? what after it?), the possibilities form a Tree.
A tree has no loops. If you have a loop, the word is "too complex" or "trapped" in a cycle.
The Conclusion: The simplest possible non-repeating words are those where the connections between letters branch out like a tree, never circling back on themselves.

Summary: Why Does This Matter?

Order in Chaos: It tells us the absolute limit of how simple a non-repeating pattern can be.
Universal Language: It connects three different fields:
- Numbers: (Irrational numbers and continued fractions).
- Geometry: (Billiard balls bouncing on a table or walking on a torus).
- Algebra: (Matrices and flow).
The "Tree" Structure: It reveals that the most efficient, non-repeating patterns in nature and math share a hidden "tree-like" structure, ensuring they never get stuck in a loop.

In short, the paper teaches us that to build the most complex-yet-simple infinite song, you must use a "tree" structure and ensure your ingredients (letters) are mathematically incompatible, so they never fall into a predictable rhythm.

Here is a detailed technical summary of the lecture notes "Infinite Words with very Low Factor Complexity" by Mélodie Andrieu.

1. Problem Statement

The paper addresses the fundamental problem in Combinatorics on Words regarding the factor complexity of infinite words. The complexity function $p_w(n)$ counts the number of distinct subwords (factors) of length $n$ in an infinite word $w$ .

The central questions are:

What is the minimal complexity for a non-trivial infinite word over a finite alphabet of size $d$ ?
How should "non-triviality" be formally defined, especially when moving from binary ( $d=2$ ) to larger alphabets ( $d \ge 3$ )?
Which specific words achieve this minimal complexity, and what are their structural properties?

Context:

For binary words ( $d=2$ ), the Morse-Hedlund Theorem (1938) establishes that a word is eventually periodic if and only if its complexity is bounded. The minimal complexity for a non-periodic binary word is $p_w(n) = n+1$ . These words are known as Sturmian words.
For $d$ -ary words ( $d \ge 3$ ), the situation is more complex. Simple generalizations of Sturmian words (like Arnoux-Rauzy words) exist, but the definition of "non-trivial" and the exact minimal complexity bound were not fully settled or characterized in the same rigorous manner as the binary case.

2. Methodology

The paper employs a multi-faceted approach combining combinatorics, dynamical systems, number theory, and linear algebra.

Chapter 1 (Foundations): Provides a self-contained introduction to Sturmian words, establishing their connection to continued fractions, renormalization processes, and dynamical systems (billiards, circle rotations, torus flows). It defines Sturmian words via their complexity ( $n+1$ ) and proves their existence via substitutions.
Chapter 2 (Generalization to $d$ -ary alphabets):
- Analyzes why the standard "non-eventually periodic" condition is insufficient for $d \ge 3$ (as it allows for "quasi-Sturmian" words that are essentially distorted binary words).
- Proposes rational independence of letter frequencies as the correct generalization of non-periodicity.
- Introduces Tijdeman's Theorem (1999), which provides a lower bound on complexity for words with rationally independent frequencies.
Chapter 3 (Algebraic Proof and Structural Characterization):
- Develops a new algebraic proof of Tijdeman's theorem (joint work with J. Cassaigne, 2022).
- Introduces the Flow Matrix ( $M$ ), a rectangular matrix derived from the Rauzy graph of the word.
- Utilizes linear algebra (specifically the Rank-Nullity Theorem and properties of kernels over $\mathbb{Q}$ and $\mathbb{R}$ ) to relate the dimension of the space spanned by letter frequencies to the growth rate of the complexity function.
- Uses the properties of extension graphs to characterize the combinatorial structure of minimal complexity words.

3. Key Contributions and Results

A. Refinement of "Non-Triviality" for $d \ge 3$

The author argues that for $d \ge 3$ , the condition of being "non-eventually periodic" is too weak. Instead, the paper advocates for rational independence of letter frequencies as the defining property of non-triviality.

Fact: Sturmian words ( $d=2$ ) have rationally independent frequencies.
Result: No "quasi-Sturmian" word (complexity $n+d-1$ ) for $d \ge 3$ has rationally independent frequencies.

B. Tijdeman's Theorem (1999) and its Strengthening

The paper revisits R. Tijdeman's theorem, which states that for a $d$ -ary word with rationally independent letter frequencies, the complexity satisfies:
$p_w(n) \ge (d-1)n + 1$
The author and Cassaigne provide a strengthened version of this theorem (Theorem 2.18 / 3.1):

Let $\Delta_w$ be the maximal degree of irrationality of the letter frequencies (the dimension of the $\mathbb{Q}$ -vector space spanned by the frequencies).
The complexity bound is:
$p_w(n) \ge (\Delta_w - 1)(n - 1) + d$
This bound is tighter than the original because it uses the maximal degree of irrationality rather than the minimal, and it holds even if the frequencies do not strictly exist (using subsequential limits).

C. The Algebraic Proof via Flow Matrices

The most significant methodological contribution is the algebraic proof of the theorem.

Flow Matrix ( $M$ ): Defined based on the Rauzy graph, where rows represent factors of length $n$ and columns represent factors of length $n+1$ . Entries are $1, -1, 0$ based on extension properties.
Kirchhoff's Law: The vector of pseudo-frequencies of factors lies in the kernel of $M$ ( $\ker(M)$ ).
Dimension Argument: By analyzing the dimension of $\ker(M)$ and $\ker(M^T)$ , the authors prove that if the complexity grows too slowly (violating the bound), the dimension of the space spanned by letter frequencies must be strictly less than $\Delta_w$ , leading to a contradiction.
Significance: This shifts the proof from purely combinatorial arguments (Tijdeman's original "p-passing number") to linear algebra, offering a more robust and generalizable framework.

D. Characterization via Dendricity

A major byproduct of the algebraic proof is a structural characterization of words achieving the minimal complexity $(d-1)n + 1$ .

Theorem: Every infinite $d$ -ary word with rationally independent letter frequencies and minimal complexity is dendric.
Definition: A word is dendric if the extension graph of every factor is a tree (connected and acyclic).
Implication: This links the arithmetic property (rational independence) and the complexity bound directly to a specific combinatorial structure (dendricity), unifying classes like Sturmian words, Arnoux-Rauzy words, and episturmian words under a single structural umbrella.

4. Significance

Unification of Theory: The paper bridges the gap between the well-understood binary case (Sturmian words) and the complex $d$ -ary case, providing a unified definition of "minimal complexity" based on rational independence.
New Proof Technique: The introduction of Flow Matrices and the application of linear algebra to factor complexity offers a powerful new tool for researchers. It simplifies the proof of Tijdeman's theorem and allows for stronger results (the maximal degree of irrationality).
Structural Insight: The result that minimal complexity words are dendric provides a concrete combinatorial characterization. It suggests that the "most complex" words (in terms of structure) that still maintain "low" complexity are those with tree-like extension graphs.
Open Problems: While the paper characterizes the words with minimal complexity as dendric, it notes that the full characterization of which dendric words have rationally independent frequencies remains an open problem for $d \ge 3$ .

Conclusion

Mélodie Andrieu's lecture notes provide a comprehensive and modern treatment of low-complexity infinite words. By generalizing the Morse-Hedlund theorem through the lens of rational independence and proving Tijdeman's bound via a novel algebraic approach, the paper establishes that dendric words are the natural $d$ -ary generalization of Sturmian words. This work significantly advances the understanding of the interplay between combinatorics, dynamics, and arithmetic in the theory of infinite words.

Infinite Words with very Low Factor Complexity: an introduction to Combinatorics on Words

The Big Picture: Counting Patterns in an Infinite Song

Chapter 1: The Binary Case (The Two-Note World)

Chapter 2: The Harder Problem (The Multi-Note World)

Chapter 3: The New Proof (The Algebraic Detective Work)

Summary: Why Does This Matter?

1. Problem Statement

2. Methodology

3. Key Contributions and Results

A. Refinement of "Non-Triviality" for d≥3d \ge 3d≥3

B. Tijdeman's Theorem (1999) and its Strengthening

C. The Algebraic Proof via Flow Matrices

D. Characterization via Dendricity

4. Significance

Conclusion

More like this

The *-variation of the Banach-Mazur game and forcing axioms

Modified averaged vector field methods preserving multiple invariants for conservative stochastic differential equations

The probabilistic superiority of stochastic symplectic methods via large deviations principles

Hodge-Gromov-Witten theory

Large deviations principles for symplectic discretizations of stochastic linear Schrödinger Equation

A. Refinement of "Non-Triviality" for $d \ge 3$