Anomaly Detection from a Tensor Train Perspective

Original authors: Alejandro Mata Ali, Aitor Moreno Fdez. de Leceta, Jorge López Rubio

Published 2026-05-05

📖 5 min read🧠 Deep dive

Original authors: Alejandro Mata Ali, Aitor Moreno Fdez. de Leceta, Jorge López Rubio

Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). ✨ This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer

Imagine you have a giant library of books. Most of the books are copies of the same popular novel (the "normal" data), but a few are strange, handwritten scribbles or completely different genres (the "anomalies"). Your goal is to find those strange books without reading every single one.

This paper presents a new way to do that using a mathematical tool called Tensor Trains. Think of this tool not as a book, but as a highly efficient compression machine (like a super-advanced Zip file).

Here is the simple breakdown of how it works, the methods they tried, and what they found.

The Core Idea: The "Squeeze" Test

The authors' main idea is based on a simple principle: Normal things fit together; weird things don't.

The Setup: They take a dataset (like pictures of digits or computer network logs) and feed it into their compression machine.
The Squeeze: They tell the machine to "squish" the data down, throwing away the tiny, unimportant details to save space.
The Result:
- Normal Data: Because these items share common patterns (like how all the digit "1"s look similar), the machine can squish them down and then un-squish them back to almost their original shape. They fit the mold perfectly.
- Anomalous Data: Because these items are weird or unique, they don't fit the mold. When the machine tries to squish them, it throws away too much of their unique structure. When it tries to un-squish them, they look distorted or broken.

The Test: They compare the original item with the "un-squished" version. If they look very similar, it's normal. If they look very different, it's an anomaly.

The Two Main Methods

The paper describes two ways to run this test, like two different strategies for organizing that library:

1. The "Global" Method (The Group Hug)

How it works: You feed the entire library (or a huge chunk of it) into the compression machine at once. The machine learns the "average" shape of the whole group.
The Analogy: Imagine taking a photo of the whole library, compressing that photo, and then seeing how well each individual book fits into that compressed photo.
Pros: It's fast and works well for big datasets.
Cons: It needs a lot of data to start.

2. The "Local" Method (The One-on-One)

How it works: You pick just one perfect example of a "normal" book (a training example). You build a mold based on that single book. Then, you test every other book against that specific mold.
The Analogy: You take one perfect "1" from the digit dataset, memorize its shape, and then check every other number to see if it fits that specific "1" mold.
Pros: It can be incredibly accurate (sometimes perfect).
Cons: It is extremely slow. The paper notes it is about 50 times slower than the global method.

What They Tested

The authors tested these methods on three different "libraries":

Handwritten Digits: Trying to spot a "7" when the library is mostly "1"s.
Faces: Trying to spot a different face in a room full of the same person.
Cybersecurity: Trying to spot a hacker attack in a stream of normal computer requests.

The Surprising Findings

The paper revealed a few counter-intuitive results:

Don't Over-Compress: You might think squeezing the data as much as possible would be best. However, the authors found that very light compression (just a tiny squeeze) often worked best. If you squeeze too hard, you start destroying the "normal" patterns too, making it hard to tell the difference.
The "Scaler" Trap: In data science, it's common to "scale" data (like resizing all photos to the same brightness or size) before processing. The authors found that for their specific method, scaling actually ruined the results. It was like trying to fit a square peg in a round hole; the scaling destroyed the specific patterns the machine needed to see.
Speed vs. Accuracy: The "Local" method was the most accurate (getting perfect scores on digits), but it was too slow to be practical for most real-world uses. The "Global" method was a great balance, offering very good accuracy (detecting 98% of cyber-attacks) while being fast enough to use.

The Bottom Line

The authors created a new way to find "weird" data by seeing how well it survives a compression test. They showed that by keeping the "normal" structure intact and letting the "weird" structure fall apart, you can spot anomalies effectively.

Key Takeaway: Sometimes, the best way to find a needle in a haystack isn't to look harder, but to see how well the hay holds together when you try to squish it. If the hay falls apart, you might have found the needle.

Technical Summary: Anomaly Detection from a Tensor Train Perspective

Problem Statement
Anomaly detection is a critical task across domains such as industrial monitoring, medical diagnostics, fraud detection, and cybersecurity. The primary objective is to identify data points that deviate significantly from normal behavior. While traditional statistical methods, machine learning, and deep learning have achieved success, they often struggle with high-dimensional data, typically requiring dimensionality reduction techniques like Principal Component Analysis (PCA). The authors propose leveraging Tensor Networks (TN), specifically Tensor Trains (TT), to address high-dimensional data efficiently. The core hypothesis is that normal data shares common structural patterns, whereas anomalous data possesses distinct or infrequent structures. By compressing data into an approximate tensor representation, the method aims to preserve the structure of normal data while disrupting the structure of anomalous data, thereby allowing for their distinction.

Methodology
The paper presents a suite of eight algorithms based on two conceptually different compression strategies using the Tensor Train (TT) representation. The compression is controlled by a parameter $\tau$ (ranging from 0 to 1), which dictates the retention of singular values during the TT-SVD process.

Global Compression Algorithms:
- Concept: The entire dataset is treated as a single high-order tensor. The algorithm compresses the global dataset, preserving the dominant structures shared by the majority of data points (normal data). Anomalous data, lacking these shared structures, is displaced more significantly during compression.
- Decision Functions:
  - Auto Comparative (ACGCTNAD): Calculates a "self-retention score" ( $s_{self}$ ) by taking the scalar product of an original data point with its compressed reconstruction, normalized by the squared norm of the original. This score captures both directional alignment and magnitude retention.
  - Group Comparative (GCGCTNAD): Compares each data point against the compressed versions of all other data points in the set, using a cosine similarity metric to focus on geometric alignment rather than magnitude.
- Learning Modes: These methods can be applied in unsupervised (no prior knowledge), supervised (using labeled normal training data), or semi-supervised modes.
Local Compression Algorithms:
- Concept: Instead of compressing the whole dataset, this approach uses a representative normal data point (or set) to define a "normal" TT structure. The first $n-1$ nodes of the TT representation for a test data point are forced to match the training data's cores, leaving the final node to contain the unique information of the test point.
- Heuristic Alignment: The method employs a heuristic alignment step where the test data's truncated basis is aligned with the normal training cores.
- Decision Functions: Similar to the global methods, it uses self-comparative (ACLCTNAD) and group-comparative (GCLCTNAD) scoring.
- Projection-Based Variant: The authors propose a mathematically principled local variant based on orthogonal projection (minimizing least-squares error against a learned TT interface), though they note that the experimental results reported in the paper correspond to the original heuristic version.

Key Contributions

Novel Framework: The introduction of anomaly detection algorithms based on the preservation and disruption of tensor network structures during compression.
Algorithmic Suite: Development of four primary algorithms (ACGCTNAD, GCGCTNAD, ACLCTNAD, GCLCTNAD) covering both global and local compression strategies, applicable to unsupervised, supervised, and semi-supervised scenarios.
Efficiency in High Dimensions: Demonstrating that TT representations can effectively handle high-dimensional data (e.g., images, network traffic logs) without the limitations of traditional dimensionality reduction.
Empirical Validation: Testing on three distinct datasets:
- Digits Dataset: Distinguishing one digit class from others.
- Olivetti Faces Dataset: Distinguishing face identities.
- Cybersecurity Dataset: Detecting cyber-attacks (brute force, scanning, slowloris) against normal network requests.

Results

Digits Dataset:
- ACGCTNAD (Global): Achieved maximum AUROC values ranging from 0.74 to 0.997. Performance often peaked at very low compression values ( $\tau$ ), suggesting that aggressive compression removes anomalous structures while retaining normal ones.
- ACLCTNAD (Local): Achieved perfect AUROC (1.0) for all digit classes. However, the method was noted to be 50 times slower than the global method. Additionally, it exhibited a "score orientation reversal" at low compression values (AUROC dropping to 0), requiring post-hoc inversion of scores, which limits its unsupervised utility.
Olivetti Faces Dataset:
- The global method (ACGCTNAD) showed variable performance depending on the class, with AUROC values ranging from 0.69 to 1.0. The authors attribute lower performance in some cases to the small sample size (approx. 8-9 normal samples per class) or the specific nature of the data.
Cybersecurity Dataset:
- Without Scaler: The ACGCTNAD method achieved exceptional results with an AUROC of 0.98 and 97.72% accuracy at $\tau = 0.01$ .
- With Standard Scaler: Performance degraded significantly. The authors observed that applying a standard scaler "ruins the results," likely because it alters the underlying structural norms that the tensor network relies on for detection.
- Unsupervised Mode: When tested without a training dataset (using only test data), the method maintained high performance (97.5% accuracy) without a scaler, but performance dropped to 64.7% with a scaler.

Significance and Claims
The paper claims that the proposed tensor network approach offers a versatile and effective alternative for anomaly detection, particularly in high-dimensional settings. The authors highlight that:

Structure Preservation: The method's power arises from the ability of tensor networks to capture and preserve the structural relationships of normal data while discarding the diffuse structures of anomalies.
Counter-Intuitive Compression: Optimal detection often occurs at low compression values (low $\tau$ ), where the representation deletes anomalous structures but retains normal ones, a phenomenon that may seem counter-intuitive compared to standard compression goals.
Sensitivity to Preprocessing: The results emphasize that data preprocessing, specifically standard scaling, can be detrimental to this specific approach, as it may destroy the structural features the algorithm is designed to detect.
Trade-offs: While local methods (ACLCTNAD) can achieve perfect separation, they are computationally expensive and rely on heuristic alignment. Global methods (ACGCTNAD) offer a better balance of speed and accuracy, making them more practical for many applications.

The authors conclude that while their results are promising, a more exhaustive evaluation involving comparisons with standard baselines (PCA, Isolation Forest, Autoencoders, etc.) and rigorous statistical reporting (random seeds, standard deviations) is necessary for future work. They also suggest future research directions including the use of other tensor network structures (like PEPS), application to text and video data, and the evaluation of the mathematically principled projection-based local variant.