pathsig: A GPU-Accelerated Library for Truncated and Projected Path Signatures

Imagine you are trying to describe a complex journey to a friend. You could just say, "We went from A to B," but that misses the story. Did you take a scenic route? Did you stop for coffee? Did you zigzag through traffic?

In the world of machine learning, Path Signatures are a mathematical tool designed to tell that full story. They turn a messy, winding line of data (like a stock price, a heartbeat, or a robot's movement) into a rich, detailed "fingerprint" that a computer can understand.

However, calculating these fingerprints is like trying to count every single grain of sand on a beach while running a marathon. It's incredibly slow, and it eats up a lot of memory. Existing tools were like trying to do this with a spoon.

Enter pathsig, a new library introduced by Tobias Nygaard. Think of pathsig as a high-speed, GPU-powered vacuum cleaner that sucks up all that data instantly, leaving you with a clean, compact summary.

Here is a breakdown of how it works, using everyday analogies:

1. The Problem: The "Library of Babel"

Imagine the signature of a path as a massive library containing every possible story you could tell about that journey.

The Old Way: To get the story, you had to walk through every single aisle of the library, read every book, and write down a summary. If you wanted to learn from this (like training a neural network), you had to walk back through the library in reverse to see what you missed. This was slow and exhausting.
The New Way (pathsig): Instead of walking, pathsig uses CUDA (the brain of modern graphics cards) to send out thousands of tiny robots (threads) simultaneously. Each robot grabs a specific set of books, summarizes them, and hands them back instantly.

2. The Secret Sauce: "Prefix-Closed" Groups

How does pathsig organize this chaos? It uses a clever trick called prefix-closed sets.

Imagine you are organizing a family tree.

The Old Way: You might try to organize by "Great-Grandparents," then "Grandparents," then "Parents." But to understand a parent, you need to know their parents. It gets messy.
The pathsig Way: It groups people by family branches. If you are looking at a specific branch (a "word"), pathsig automatically gathers everyone in that branch's history (the "prefixes") and processes them together.
The Analogy: It's like a construction crew. Instead of one person building a whole house from scratch, they assign a team to build the foundation, then the walls, then the roof, all in perfect sync. Because the GPU can do thousands of these teams at once, the house gets built in seconds.

3. The "Memory" Trick: Not Storing Everything

One of the biggest headaches in AI is running out of memory (RAM).

The Old Way: To calculate the journey, the computer would write down every single step of the path on a giant whiteboard. If the path was long, the whiteboard would overflow, and the computer would crash.
The pathsig Way: It uses a "magic eraser." It only keeps the final result of the journey. When it needs to figure out the past (for learning), it mathematically "rewinds" the tape using the final result and the rules of the journey, rather than looking at a stored list of every step.
The Result: You can analyze massive datasets on a single graphics card without the computer screaming for more memory.

4. Customizing the Lens: Projections

Sometimes, you don't need the whole library. You only need the chapters about "weather" or "traffic."

Truncation (The Old Standard): This is like saying, "I only want the first 5 chapters of every book." It's simple, but you might miss a crucial plot point in chapter 6.
Projections (The pathsig Superpower): This is like saying, "I only want the chapters about rain and cars, regardless of which chapter they are in."
- Anisotropic Truncation: Imagine some parts of your journey are smooth (like a highway) and some are bumpy (like a dirt road). pathsig lets you treat the smooth parts with a coarse summary and the bumpy parts with a detailed one, saving time without losing important details.

5. Real-World Impact: The "Lead-Lag" Example

The paper shows a practical example using financial data (predicting how "rough" or smooth a market is).

They had a "Lead-Lag" transformation (a way of looking at how one asset moves before another).
The standard method was like taking a photo of the whole city and trying to find one specific car.
pathsig's "Sparse Projection" was like using a drone to zoom in only on the specific car and its immediate surroundings.
The Outcome: They got better accuracy (lower error) while using 6 times less data and finishing the training 2 times faster.

Summary

pathsig is a tool that makes the complex math of "Path Signatures" fast enough to use in modern AI.

It's Fast: It uses the power of graphics cards to do calculations in parallel, making it 10 to 30 times faster than previous tools.
It's Lean: It uses very little memory, allowing you to process huge datasets without crashing your computer.
It's Flexible: It lets you pick and choose exactly which parts of the data story you want to tell, rather than forcing you to read the whole book.

In short, pathsig turns a slow, heavy, manual process into a lightning-fast, automated assembly line, making it possible to teach AI to understand complex, moving data like never before.

1. Problem Statement

Path signatures provide a mathematically rigorous, universal feature representation for sequential data, offering invariance to time reparametrization and robustness to irregular sampling. However, their adoption in large-scale, gradient-based machine learning has been hindered by two main bottlenecks:

Scalability: Existing libraries (e.g., iisignature, esig, Signatory, keras_sig) often rely on CPU-based backends or inefficient GPU implementations that struggle with the combinatorial explosion of signature coefficients as depth ( $N$ ) and dimension ( $d$ ) increase.
Memory Constraints: Backpropagation through signatures typically requires storing intermediate signature values for every time step, leading to memory usage that scales linearly with sequence length ( $O(M \cdot D_{sig})$ ), which quickly causes Out-Of-Memory (OOM) errors on GPUs.
Rigidity: Standard truncation (keeping all words up to length $N$ ) often includes redundant or irrelevant features, leading to high dimensionality without guaranteed performance gains.

2. Methodology

The paper introduces pathsig, a PyTorch-native library designed to compute path signatures directly on GPUs using CUDA. The core methodological innovations include:

A. Direct Word-Basis Computation with Prefix-Closed Sets

Instead of operating on graded tensor levels (which requires complex linear algebra abstractions), pathsig operates directly in the canonical word basis of the tensor algebra.

Prefix-Closed Decomposition: The library decomposes the set of words into "prefix-closed" sets. A set is prefix-closed if, for every word in the set, all its prefixes are also in the set.
Parallelism: Each CUDA thread is assigned a prefix-closed set generated by a single word. This allows independent updates of signature coefficients using Horner's method to evaluate the tensor exponential terms efficiently. This approach minimizes intermediate floating-point operations and improves memory locality compared to block-level assignments.

B. Memory-Efficient Backpropagation

To enable training without storing all intermediate signatures:

Algebraic Reconstruction: The library exploits the group-like property of signatures. Instead of storing $S_{0, t_j}(X)$ for all $j$ , it reconstructs necessary intermediate values during the backward pass.
Reverse Recursion: It computes required terms by iterating backward in time using the inverse signature (signature of the time-reversed path) and the tensor exponential of increments. This reduces memory complexity from $O(M \cdot D_{sig})$ to nearly $O(D_{sig})$ , keeping peak memory usage close to the theoretical minimum required for the output.

C. Generalized Projections

pathsig moves beyond standard truncation ( $W_{\leq N}$ ) to support arbitrary projections:

Word Projections: Users can specify arbitrary subsets of words ( $I \subset W$ ) to compute, enabling sparse feature selection based on domain knowledge (e.g., specific channel interactions).
Anisotropic Truncation: Instead of truncating by word length, the library supports truncation by weighted degree ( $|w|_\gamma = \sum \gamma_{i_k}$ ). This allows for inhomogeneous regularity assumptions across different channels of the input path.
Windowed Signatures: It supports computing signatures over arbitrary user-defined windows (sliding or expanding) in a single kernel launch, maximizing GPU utilization.

D. Log-Signature Support

The library computes log-signatures in the Lyndon basis directly from the signature coefficients. Crucially, it avoids materializing all signature coefficients up to depth $N$ if they are not needed for the specific log-basis projection, further reducing computational cost.

3. Key Contributions

High-Performance GPU Implementation: A PyTorch-native library that achieves 10–30× speedups for truncated signature computation and 4–10× speedups for training (backpropagation) compared to state-of-the-art libraries (keras_sig, pySigLib).
Near-Minimal Memory Footprint: By reconstructing intermediates during backpropagation, pathsig achieves peak memory usage roughly 2× the size of the output, whereas competitors scale linearly with sequence length, often causing OOM errors on large sequences.
Flexible Projection Framework: It is the first library to natively support arbitrary word projections and anisotropic truncation, allowing for dimensionality reduction and the inclusion of specific higher-order terms without full truncation costs.
Windowed Computation: Efficient single-kernel evaluation of multiple signature windows, addressing a common need in time-series analysis that previous libraries handled inefficiently.

4. Experimental Results

Benchmarks were conducted on an NVIDIA H200 GPU (140 GB VRAM) against keras_sig and pySigLib.

Speed:
- Forward Pass: Median speedup of 12.44× over keras_sig and 40.11× over pySigLib.
- Training: Median speedup of 7.88× over keras_sig and 24.88× over pySigLib.
- Log-Signatures: Speedups were even higher, reaching up to 67× over pySigLib due to optimized projection strategies.
Memory:
- pathsig maintained stable memory usage even for long sequences (up to 1600 time steps) and large batches where keras_sig failed with OOM errors.
- Memory reduction factors ranged from 159× to 1,265× compared to keras_sig depending on sequence length.
Case Study (Hurst Parameter Estimation):
- In an experiment estimating the Hurst parameter of multivariate fractional Brownian motion, a sparse lead-lag word projection (excluding redundant cross-channel terms) achieved lower test error than full truncation.
- This projection reduced feature dimension by 6.25× and training time by 2.24× while improving learning curves.

5. Significance

The paper bridges the gap between the theoretical power of path signatures and practical, large-scale machine learning applications. By solving the scalability and memory bottlenecks, pathsig enables:

Deep Learning Integration: Seamless integration of signatures as trainable layers in PyTorch models without prohibitive computational costs.
Domain-Specific Feature Engineering: The ability to tailor signature representations (via projections and anisotropic truncation) to specific physical or financial constraints, reducing noise and redundancy.
Scalable Time-Series Modeling: Making signature-based methods viable for high-dimensional, long-sequence data on modern GPU hardware, opening new avenues for research in rough path theory applications within AI.

The library is open-source and available via PyPI, with documentation hosted at pathsig.readthedocs.io.

pathsig: A GPU-Accelerated Library for Truncated and Projected Path Signatures

1. The Problem: The "Library of Babel"

2. The Secret Sauce: "Prefix-Closed" Groups

3. The "Memory" Trick: Not Storing Everything

4. Customizing the Lens: Projections

5. Real-World Impact: The "Lead-Lag" Example

Summary

1. Problem Statement

2. Methodology

A. Direct Word-Basis Computation with Prefix-Closed Sets

B. Memory-Efficient Backpropagation

C. Generalized Projections

D. Log-Signature Support

3. Key Contributions

4. Experimental Results

5. Significance

More like this

Complexity of Classical Acceleration for ℓ1\ell_1ℓ1​-Regularized PageRank

MapTab: Are MLLMs Ready for Multi-Criteria Route Planning in Heterogeneous Graphs?

Language Guided Adversarial Purification

Graph-based Active Learning for Entity Cluster Repair

Neural Green's Operators for Parametric Partial Differential Equations

Complexity of Classical Acceleration for $\ell_1$ -Regularized PageRank