Computational Complexity of Alignments

Imagine you are a detective trying to solve a mystery. You have two pieces of evidence:

The "Script" (The Model): A perfect, theoretical plan of how a business process should work (like a recipe for baking a cake).
The "Footage" (The Trace): A video recording of what actually happened in the real world (someone trying to bake the cake but maybe burning the toast or forgetting the sugar).

Process Mining is the field of comparing these two. Alignments are the specific technique used to line them up perfectly, step-by-step, to see exactly where the real footage deviated from the script.

This paper is essentially a mathematical investigation into how hard it is for a computer to do this detective work. The authors ask: "Is this task easy, moderately difficult, or impossibly hard?"

Here is the breakdown of their findings using simple analogies:

1. The General Rule: It's a Nightmare (PSPACE-Complete)

For most complex business processes (represented as "Safe Petri Nets"), finding the perfect alignment is PSPACE-complete.

The Analogy: Imagine trying to find the shortest path through a maze that is constantly changing shape, where the walls move as you walk. The number of possible paths is so vast that even the most powerful supercomputers would run out of memory trying to check every single possibility.
The Result: If you have a complex, general business model, there is no "magic button" to instantly find the perfect alignment. It is computationally expensive and slow.

2. The "Sound" Workflow: Still a Nightmare

The authors checked if making the models "well-behaved" (called Sound Workflow Nets, which means the process always finishes correctly without getting stuck) would make it easier.

The Analogy: It's like checking if the maze has a guaranteed exit. Even though we know the maze has an exit, finding the shortest path through the shifting walls is still just as hard as before.
The Result: Adding "soundness" doesn't help. It's still PSPACE-complete.

3. The "Free-Choice" Loophole: It Gets Easier (NP-Complete)

The researchers then looked at a specific type of model called Live, Bounded, Free-Choice Systems. These are models where choices are simple and don't get tangled in complex dependencies.

The Analogy: Imagine the maze is now static (walls don't move) and you are allowed to make a "guess" about the path. If you guess right, you can quickly verify it. You can't solve it instantly, but you can solve it much faster than the shifting maze.
The Result: The problem drops from "impossible" to NP-complete. This means it's still hard, but it's solvable within a reasonable timeframe for many practical cases. The computer can "guess" a solution and check it quickly.

4. The "Simple" Traps: Still Harder Than You Think

The paper reveals some surprising traps. Even very simple-looking models can be hard to align.

Process Trees (The "Shuffle" Problem): These are models that look like simple family trees. However, if they have a "Parallel" operator (doing two things at once), it becomes NP-complete.
- The Analogy: Imagine you have two lists of chores (A and B). You need to interleave them into one schedule. If the lists are long, the number of ways to mix them together is astronomical. Even though the tree looks simple, the "mixing" part makes it hard.
T-Systems (No Choices, Just Flow): These are models with no branching, just a straight line or a loop. Surprisingly, aligning these is also NP-complete.
- The Analogy: It's like trying to match two long strings of beads where you can only delete or add beads. Even without choices, the sheer length and the need to find the perfect match make it computationally heavy.

5. The One Exception: The "Single Token" S-System

Finally, the authors found one specific scenario where the problem becomes easy (Polynomial Time / P).

The Scenario: Live, Safe S-Systems. These are models where there is only one token (one "worker" or "item") moving through the system, and no two things can happen at the same time (no concurrency).
The Analogy: Imagine a single person walking down a single hallway. There are no forks in the road, no other people, and no moving walls. You just watch them walk from start to finish. Comparing their path to the plan is trivial.
The Catch: If you add a second token (two people) or allow them to move simultaneously, the complexity jumps back up to NP-complete. The "single worker" constraint is the key to making it easy.

Summary Table (The "Difficulty Meter")

Model Type	Difficulty Level	Everyday Meaning
General Safe Nets	PSPACE-Complete	Impossible/Very Hard. Like solving a maze that changes shape.
Sound Workflow Nets	PSPACE-Complete	Very Hard. Even if the process is "well-behaved," it's still a nightmare.
Process Trees / T-Systems	NP-Complete	Hard but Doable. Like a puzzle where you can guess and check.
Live, Safe S-Systems	P (Polynomial)	Easy. Like a single person walking down a straight hall.

The Big Takeaway

The authors conclude that complexity is unavoidable for general business processes. You cannot have a fast, perfect alignment algorithm for every type of model.

However, if you restrict your models to be simpler (like ensuring only one "token" moves at a time, or avoiding complex parallel structures), you can make the computer's job much easier. This helps software developers know which types of business processes they can analyze quickly and which ones will require heavy computing power.

Here is a detailed technical summary of the paper "Computational Complexity of Alignments" by Schwanen, Pakusa, and van der Aalst.

1. Problem Statement

In process mining, conformance checking is the task of comparing observed behavior (event logs) against a reference process model (typically a Petri net) to identify deviations. The state-of-the-art technique for this is alignments, which map an observed trace to a valid firing sequence of the model by inserting (model moves) or deleting (log moves) activities to minimize a cost function.

While alignment algorithms (often based on $A^*$ search) are widely used, their computational complexity has historically been poorly understood. The paper addresses the fundamental question: What is the computational complexity of finding an optimal alignment across various classes of Petri nets? The authors aim to determine whether efficient (polynomial-time) algorithms exist for specific model classes or if the problem is inherently intractable (NP-hard, PSPACE-complete, etc.).

2. Methodology

The authors employ a rigorous theoretical computer science approach, utilizing reductions and structural properties of Petri nets:

Reductions to Known Problems:
- Lower Bounds: They establish that the Reachability Problem (determining if a marking is reachable) and the Membership Problem (determining if a trace belongs to the language of a net) are polynomial-time reducible to the Alignment problem. Since Reachability is known to be PSPACE-complete for safe nets and NP-complete for others, these serve as lower bounds for Alignment.
- Upper Bounds: They construct algorithms (often non-deterministic guess-and-verify) to show that the problem lies within specific complexity classes (e.g., NP or PSPACE).
Synchronous Product: They utilize the standard construction where the alignment problem is reduced to finding a shortest path in the synchronous product of the trace system and the process model.
Structural Analysis: For specific classes (like Live, Bounded, Free-Choice systems), they leverage deep structural theorems from Petri net theory, specifically the Shortest Sequence Theorem, to bound the length of optimal firing sequences.
Complexity Classes Analyzed: The study covers a wide spectrum of Petri net classes, including:
- General, Bounded, and Safe Petri Nets.
- Workflow Nets (Sound and Safe).
- Free-Choice Systems (Live, Bounded, Free-Choice - LBFC).
- Process Trees.
- S-Systems (no concurrency) and T-Systems (no choice).
- Acyclic Systems.

3. Key Contributions and Results

The paper provides a comprehensive taxonomy of the complexity of the alignment problem. The results are summarized in Table 3 of the paper, but the key findings are:

A. High Complexity on General and Safe Nets

Safe Petri Nets & Sound Workflow Nets: The alignment problem is PSPACE-complete. This matches the complexity of the reachability problem for these classes.
- Significance: Even restricting models to "safe" (at most one token per place) and "sound" (well-behaved workflow) nets does not lower the complexity. The problem remains as hard as simulating a polynomial-space Turing machine.

B. The "Gap" in Free-Choice Systems

Live, Bounded, Free-Choice (LBFC) Systems: The complexity drops from PSPACE to NP-complete.
- Mechanism: The authors prove that for LBFC systems, there always exists an optimal alignment of polynomial length. This allows for a non-deterministic polynomial-time algorithm (guess the alignment, verify cost).
- Contrast: While Reachability for LBFC systems is also NP-complete, the authors note that for cyclic live, bounded, free-choice systems, Reachability is in P, whereas Alignment remains NP-complete. This reveals a complexity gap where alignment is strictly harder than reachability for these specific classes.

C. Hardness on Simple Subclasses

The paper demonstrates that alignment remains NP-complete (or NP-hard) even on very restricted subclasses where reachability is in P:

Process Trees: Alignment is NP-complete. This is surprising because process trees are block-structured and widely used in practice. The hardness stems from the shuffle operator (parallelism), which corresponds to the NP-complete "Shuffle of Words" problem.
- Exception: If process trees have unique labels (each activity appears once) or exclude the parallel operator, the problem becomes solvable in P.
T-Systems (Conflict-Free): Alignment is NP-complete, even though Reachability is in P.
Acyclic Systems: Alignment is NP-complete, despite Reachability being in P.

D. The Only Tractable Case: Live, Safe S-Systems

Live, Safe S-Systems: This is the only class identified where the alignment problem is solvable in Polynomial Time (P).
- Conditions: The system must be an S-net (each transition has at most one input and one output place), Live, and Safe (single token).
- Reasoning: In a safe S-system with a single token, there is no concurrency. The system behaves like a simple path (a trace system). The reachability graph is polynomially constructible, allowing the alignment to be solved via standard shortest-path algorithms.
- Crucial Insight: If Liveness or Safeness is dropped (e.g., allowing multiple tokens), the problem immediately jumps back to NP-complete. This highlights that concurrency (multiple tokens) is the primary driver of complexity.

4. Significance and Implications

Theoretical Foundation: This is the first systematic analysis of the computational complexity of alignments. It clarifies that the "state explosion" problem in alignment is not just a practical heuristic issue but a fundamental theoretical barrier for many model classes.
Practical Guidance:
- Practitioners should not expect efficient exact alignment algorithms for general sound workflow nets or process trees with parallelism.
- For Process Trees, the results suggest that exact alignment is feasible only if the model is restricted (e.g., unique labels) or if approximation/heuristic methods are used.
- The identification of Live, Safe S-Systems as the only P-class provides a target for model simplification if polynomial-time conformance checking is required.
Complexity Gap: The paper reveals a counter-intuitive phenomenon: for several classes (LBFC, Acyclic, T-Systems), Reachability is in P, but Alignment is NP-complete. This implies that finding if a path exists is easy, but finding the optimal path that aligns with a specific trace is significantly harder due to the combinatorial explosion of possible interleavings (concurrency).
Future Directions: The authors suggest that incorporating event attributes, translucent logs, or different distance metrics (like transpositions) are open research areas. They also call for empirical verification of these theoretical bounds and the development of parameterized complexity approaches to handle "hard" instances in practice.

Summary Table of Results (Selected)

Model Class	Reachability Complexity	Alignment Complexity
Safe Petri Nets	PSPACE-complete	PSPACE-complete
Sound Workflow Nets	PSPACE-complete	PSPACE-complete
LBFC Systems	NP-complete	NP-complete
Process Trees	P	NP-complete
Process Trees (Unique Labels)	P	P
T-Systems	P	NP-complete
Acyclic Systems	P	NP-complete
Live, Safe S-Systems	P	P

In conclusion, the paper establishes that while alignment is a powerful technique, its computational cost is unavoidable for most realistic process models. Efficient exact solutions are only possible for highly restricted models without concurrency or with strict structural constraints.