Complexity of Linear Subsequences of $k$-Automatic Sequences

Imagine you have a giant, infinite tape of numbers, like a never-ending roll of toilet paper. On this tape, there's a pattern of 0s and 1s (or other symbols) that repeats in a very specific, predictable way. In computer science, we call these Automatic Sequences. They are like a song that never ends but follows a strict set of rules.

Now, imagine you have a tiny, simple robot (called an Automaton) that can read this tape. The robot has a limited memory (a few "states" or mental modes). Its job is to look at a number written on the tape (like the 43rd number) and tell you what symbol is there. The State Complexity is just a fancy way of asking: "How big does the robot's brain need to be to do this job?"

This paper is a deep dive into what happens when we mess with these sequences. The authors, Delaram Moradi, Narad Rampersad, and Jeffrey Shallit, ask three main questions:

The "Addition" Problem: If we want the robot to check if $A + B = C$ , how big does its brain need to be?
The "Skip" Problem: If we only look at every 5th number on the tape (a "linear subsequence"), does the robot need a bigger brain?
The "Speed" Problem: If we use a standard software tool (like a calculator for logic) to build these robots, how long does it take, and how big do the intermediate robots get before we finish?

Here is the breakdown of their findings using some creative analogies.

1. The Robot's Brain Size (State Complexity)

Think of the robot's memory states as rooms in a hotel.

Simple Math: If the robot just needs to check if two numbers add up to a third (like checking a receipt), it only needs 2 rooms (states), no matter how big the numbers are. It's like a simple "carry the one" mechanic.
Adding a Constant: If the robot needs to check if $x + 5 = y$ , it needs a slightly bigger hotel. The size of the hotel grows with the logarithm of the number 5. Think of it like a ladder: to reach the 5th rung, you don't need a skyscraper; you just need a few extra steps.
The "Skip" Surprise: This is the paper's biggest "Aha!" moment.
- Imagine you have a sequence of numbers. If you decide to only look at every 3rd number (the 3rd, 6th, 9th...), you might think the robot just needs to count to 3.
- The Twist: The authors found that the size of the robot's brain for this "skipped" sequence is directly tied to the complexity of the patterns inside the original sequence.
- The Analogy: Imagine the original sequence is a long, complex tapestry. If you look at the whole thing, it's messy. But if you only look at every 3rd thread, you are essentially looking at a specific "slice" of the tapestry. The authors proved that the size of the robot needed to read this slice is equal to the number of unique patterns of a certain length found in the original tapestry.
- Why it matters: They solved a puzzle left open by other researchers (Zantema and Bosma) about how to predict this size. They showed that for certain sequences (like the famous Thue-Morse sequence, which is a pattern of 0s and 1s that avoids repeating itself too much), the robot's brain size grows in a very specific, predictable way based on how "jumbled" the patterns are.

2. Reading Direction: Left-to-Right vs. Right-to-Left

The paper also discusses how the robot reads the numbers.

LSD-First (Least Significant Digit first): Reading a number like 101 from right to left (1, then 0, then 1). This is like adding numbers on paper; it's easy for a robot to handle carries.
MSD-First (Most Significant Digit first): Reading from left to right (1, then 0, then 1). This is how we usually read.
The Difference: The authors explain that for simple math (addition), both directions are easy. But for skipping numbers in a sequence, reading left-to-right (MSD) is much harder for the robot. It's like trying to guess the end of a story by only reading the first few words of every paragraph; you need a much bigger memory to keep track of the context.

3. The "Walnut" Software and Construction Time

The authors also looked at how long it takes to build these robots using a popular software tool called Walnut. Walnut uses a type of logic (Büchi arithmetic) to automatically generate these robots.

The Metaphor: Imagine you want to build a custom robot. You have a blueprint (the math formula).
- The Old Way: You might try to build the robot directly, which is efficient.
- The Walnut Way: You ask a general-purpose construction crew to build it for you. They use a very flexible, powerful method that works for any math problem, not just this specific one.
The Result: The authors analyzed how long this takes. They found that while the final robot might be small and efficient, the construction process creates some very large, temporary "intermediate" robots before it shrinks them down to the final size.
The Cost: The time it takes to build these robots grows based on the size of the numbers involved. For example, if you are skipping every $n$ numbers, the time to build the robot grows roughly with $n \times (\text{log of } n)^2$ . It's not instant, but it's manageable for computers.

Summary of the "Big Picture"

This paper is like a manual for robot architects working with infinite number patterns.

We now know exactly how big the robot needs to be when we skip numbers in a pattern, and it turns out the size depends on how many unique "chunks" of the pattern exist.
We solved a mystery about whether reading numbers from left-to-right makes the robot's brain explode in size (it does, but in a predictable way).
We measured the construction time, showing that while modern software tools are powerful, they can be a bit "bloated" during the building phase, creating temporary giants before settling into the final, efficient design.

In short, the authors took a complex, abstract problem in computer science and mapped out the exact "cost" (in memory and time) of manipulating these infinite digital patterns, providing a clear roadmap for anyone trying to build or analyze these systems.

Here is a detailed technical summary of the paper "Complexity of Linear Subsequences of k-Automatic Sequences" by Delaram Moradi, Narad Rampersad, and Jeffrey Shallit.

1. Problem Statement

The paper investigates the state complexity (the number of states in a minimal deterministic finite automaton with output, or DFAO) of operations performed on $k$ -automatic sequences. Specifically, it focuses on:

Recognizing Arithmetic Relations: Constructing automata to recognize addition ( $x+y=z$ ) and multiplication ( $nx+c=y$ ) relations in base $k$ , for both Most-Significant-Digit-first (msd-first) and Least-Significant-Digit-first (lsd-first) inputs.
Linear Subsequences: Determining the state complexity of linear subsequences of the form $(h(ni+c))_{i \ge 0}$ derived from a $k$ -automatic sequence $(h(i))_{i \ge 0}$ .
Runtime Complexity: Analyzing the computational time required to construct these automata using an interpretation of Büchi arithmetic (as implemented in tools like the Walnut software system).

A central motivation is resolving open questions left by Zantema and Bosma regarding the tightness of bounds for msd-first linear subsequences and understanding the intermediate state explosion that occurs during automated construction.

2. Methodology

The authors employ a combination of automata theory, combinatorics on words, and logical interpretations:

Automata Constructions: They design explicit DFAOs for arithmetic relations and sequence transformations. They distinguish between msd-first and lsd-first inputs, noting that while addition is easy in lsd-first (carry propagates forward), msd-first requires tracking potential carries backward or maintaining difference states.
Interior Sequences: They utilize the concept of the interior sequence $h'$ , generated by the intrinsic DFAO where the output function is the identity map. This allows them to relate the state complexity of a subsequence to the subword complexity (number of distinct factors) of the interior sequence.
Myhill-Nerode Theorem Adaptation: They adapt the Myhill-Nerode theorem for DFAOs to prove lower bounds on the number of states required for specific sequences (e.g., shifted Thue-Morse sequences).
Büchi Arithmetic Interpretation: They analyze how standard logical expressions in Büchi arithmetic (involving addition and the $k$ -adic valuation function $V_k$ ) translate into automata. They track the size of intermediate automata created via product constructions, projections, and subset constructions (determinization).
Case Studies: They apply their general theorems to the famous Thue-Morse sequence to derive exact or tight bounds for specific linear subsequences.

3. Key Contributions and Results

A. State Complexity of Arithmetic Relations

Addition: Confirmed that a 2-state automaton suffices for $x+y=z$ for both input formats.
Constant Addition/Subtraction: For $x+c=y$ $x + c = y$ (or $x-c=y$ $x - c = y$ ), they constructed automata with $O(\log_k c)$ $O (lo g_{k} c)$ states.
- Theorem 6: They provide an exact formula for the number of states in the minimal DFA for $x+c=y$ in msd-first input:
  $2|(c)_k| + 1 - \nu_k(c) - [(c)_k \text{ starts with } 1] - [k=2, (c)_2 \text{ starts with } 10, c \neq 2^p]$
  where $\nu_k(c)$ is the $k$ -adic valuation.
Multiplication: For $nx+c=y$ , they constructed automata with $O(\log c)$ states for addition and $n$ states for multiplication (when $c < n$ ).

B. Linear Subsequences and Subword Complexity (Main Novelty)

The paper resolves a question by Zantema and Bosma regarding the state complexity of $(h(ni+c))_{i \ge 0}$ for msd-first input.

Theorem 10: Establishes a novel relationship between the state complexity of a linear subsequence and the subword complexity of the interior sequence $h'$ $h^{'}$ .
- If $c < n$ , the state complexity is at most $\rho_{h'}(n)$ .
- If $c \ge n$ , the state complexity is at most $\rho_{h'}(c+1)$ .
Corollary 11: Using the bound $\rho_{h'}(n) \le k n m^2$ (where $m$ is the number of states of the original sequence), they derive an upper bound of $O(k n m^2)$ for the subsequence.
Significance: This proves that the state complexity of the subsequence is directly bounded by the number of distinct factors of length $n$ (or $c+1$ ) in the interior sequence, rather than just a function of the original state count.

C. Application to the Thue-Morse Sequence

The authors apply their theorems to the Thue-Morse sequence $t$ :

Theorem 15: For odd $n$ , the minimal DFAO for $(t(ni))_{i \ge 0}$ has exactly $\rho_t(n)$ states.
Corollary 18: They derive exact formulas for the state complexity of $(t(ni))$ for various $n$ based on the binary representation of $n$ .
Theorem 21: For shifted sequences $(t(i+c))$ , they prove a lower bound of $\Omega(c^{0.694})$ states, showing that the state complexity grows super-logarithmically, refuting the possibility of $O(\log c)$ bounds for all shifts.

D. Runtime Complexity of Construction (Büchi Arithmetic)

The paper analyzes the time complexity of constructing these automata using logical expressions (as done in Walnut):

Theorem 24 & 25: Constructing an automaton for $x=c$ or $x+c=z$ takes $O((\log^2 c)(\log \log c))$ time.
Theorem 27: Constructing an automaton for $nx=z$ takes $O(n \log^2 n)$ time.
Theorem 29: Constructing the automaton for the linear subsequence $(h(ni+c))$ takes:
$O(\log^2 c \log \log c + n \log^2 n + n \log c \log(n \log c) + m^2(n+c) \log(m^2(n+c)))$
This highlights that while the final automaton size might be polynomial, the intermediate automata and the construction time can be significantly larger, depending on $n$ and $c$ .

4. Significance

Resolution of Open Problems: The paper answers a specific open question from Zantema and Bosma regarding the state complexity of msd-first linear subsequences, providing a tight connection to subword complexity.
Theoretical Insight: It clarifies the fundamental difference between lsd-first and msd-first inputs for subsequences. While lsd-first allows for simple carry propagation, msd-first requires tracking "future" carries, leading to state complexities related to the subword complexity of the sequence.
Practical Implications for Tools: By analyzing the runtime of constructing these automata via Büchi arithmetic, the authors provide crucial insights for users of tools like Walnut. They demonstrate that while the final automaton might be small, the intermediate steps in the construction process can be computationally expensive, explaining potential performance bottlenecks in automated combinatorics on words.
Exact Bounds: The derivation of exact state counts for specific cases (like the Thue-Morse sequence) and the precise formula for constant addition automata offers a level of granularity previously unavailable in the literature.

In summary, this paper bridges the gap between the structural properties of automatic sequences (subword complexity) and the algorithmic complexity of manipulating them (state and runtime complexity), providing both theoretical bounds and practical construction analysis.

Complexity of Linear Subsequences of kkk-Automatic Sequences

1. The Robot's Brain Size (State Complexity)

2. Reading Direction: Left-to-Right vs. Right-to-Left

3. The "Walnut" Software and Construction Time

Summary of the "Big Picture"

1. Problem Statement

2. Methodology

3. Key Contributions and Results

A. State Complexity of Arithmetic Relations

B. Linear Subsequences and Subword Complexity (Main Novelty)

C. Application to the Thue-Morse Sequence

D. Runtime Complexity of Construction (Büchi Arithmetic)

4. Significance

More like this

The *-variation of the Banach-Mazur game and forcing axioms

Modified averaged vector field methods preserving multiple invariants for conservative stochastic differential equations

The probabilistic superiority of stochastic symplectic methods via large deviations principles

Hodge-Gromov-Witten theory

Large deviations principles for symplectic discretizations of stochastic linear Schrödinger Equation

Complexity of Linear Subsequences of $k$ -Automatic Sequences