S2S-FDD: Bridging Industrial Time Series and Natural Language for Explainable Zero-shot Fault Diagnosis

Imagine you are a mechanic for a massive, complex factory. Usually, when something goes wrong, the factory's computer screams at you with a bunch of confusing numbers: "Pressure is 4.2, Temperature is 98, Flow is 0.5."

Traditional AI models are like a junior mechanic who sees those numbers, says, "Okay, that's a 'Type 4 Fault'," and stops there. It doesn't tell you why it's broken or how to fix it. It's like a doctor saying, "You have a disease," without explaining what's wrong with your body or what medicine to take.

On the other hand, you have a brilliant, well-read expert (a Large Language Model, or LLM) who knows everything about how machines should work. But this expert has a problem: they only speak "Human" (English, text, stories). They don't understand "Machine" (numbers, graphs, time-series data). If you hand them a spreadsheet of numbers, they get confused and might hallucinate a crazy answer.

This paper introduces a translator named "S2S-FDD" that bridges the gap between the two.

Here is how it works, using simple analogies:

1. The Translator (The "Signal-to-Semantics" Operator)

Imagine the factory sensors are like a heart monitor beeping in a rhythm. The numbers are the raw beeps.

The Problem: The expert doctor (the LLM) can't read the beeps directly.
The Solution: The S2S-FDD framework acts as a medical translator. It looks at the raw numbers, compares them to what a "healthy" heart looks like, and then writes a short, plain-English story for the doctor.
- Instead of saying: "Sensor A dropped from 50 to 20 between seconds 10 and 15."
- It says: "The pressure in the pipe suddenly dropped and stayed low, which is very different from the normal steady rhythm we usually see."

This translation captures the trends (is it going up or down?), the rhythm (is it steady or spiking?), and the deviations (is it acting weird?).

2. The Detective Game (Multi-turn Tree Diagnosis)

Once the translator gives the expert the story, the expert (the LLM) starts playing detective.

Step 1: The Library. The expert looks at a giant library of old maintenance logs (historical data) to see if this story matches any past cases.
Step 2: The Interrogation. If the story isn't clear enough, the expert doesn't just guess. It acts like a detective asking for more clues. It can say, "I need to see the data for Valve B to be sure," and the system automatically fetches that specific data.
Step 3: The Tree. This creates a "tree" of reasoning.
- Branch A: "Is it a leak?" -> Check data -> "No."
- Branch B: "Is it a blockage?" -> Check data -> "Yes, the flow stopped."
- Result: The expert concludes, "It's a blockage in the water line."

3. The "Zero-Shot" Superpower

Usually, to teach a computer to recognize a specific broken part, you need to show it thousands of pictures of that broken part. But in a factory, you can't always wait for a machine to break 1,000 times to learn from it.

This system is "Zero-Shot." It means it can diagnose a brand new, never-before-seen fault without ever having seen it before. How? Because it understands the logic of the machine (via the text descriptions) and the principles of physics, not just memorized patterns. It's like a mechanic who has never seen a specific new car model but can still diagnose a broken engine because they understand how engines work in general.

4. The Human in the Loop

Finally, the system doesn't just give an answer and walk away. It allows a human expert to step in. If the AI says, "I think it's a leak," the human can say, "No, that doesn't make sense because the valve is open." The AI learns from this feedback, refining its "tree" of reasoning for next time. It's a continuous learning partnership.

The Bottom Line

The researchers tested this on a complex system that mixes air, oil, and water (like a giant, pressurized blender).

Old AI: Got it right about 23% of the time.
New S2S-FDD: Got it right 77% of the time, using only data from when the machine was working perfectly (no broken examples needed).

In short: This paper teaches a super-smart AI how to "speak" the language of factory sensors, turning cold, hard numbers into a clear, logical story that explains exactly what is broken and why, all without needing to see the machine break first.

1. Problem Statement

Industrial fault diagnosis is critical for system safety, but traditional data-driven models face significant limitations:

Lack of Interpretability: Conventional models output abstract scores or categories without answering "Why" a fault occurred or "How" to repair it.
Data Scarcity: Industrial environments often lack sufficient labeled fault data, making supervised learning difficult. While few-shot and transfer learning exist, zero-shot fault diagnosis (diagnosing unseen faults without training data) remains a challenge.
The Semantic Gap: Large Language Models (LLMs) excel at reasoning but are trained on discrete text. They struggle to interpret continuous, high-dimensional, and temporal industrial sensor signals directly. There is a fundamental disconnect between raw time-series data and the natural language understanding required for LLMs to reason effectively.

2. Methodology: The S2S-FDD Framework

The authors propose the Signals-to-Semantics Fault Diagnosis (S2S-FDD) framework, which bridges the gap between sensor data and LLM reasoning through two core components:

A. Signals-to-Semantics (S2S) Operator

This module converts raw numerical time-series data into domain-aware natural language summaries.

Reconstruction-Based Anomaly Detection:
- A state matrix $D$ is constructed from $n$ representative "normal" temporal patterns (derived via K-means clustering of historical normal data).
- For an online input sample $W_{in}$ , the system computes a weight vector $\omega$ to reconstruct the signal as $W_{out} = D\omega$ using least squares regression.
- The reconstruction residual ( $RES = W_{in} - W_{out}$ ) indicates potential faults. Large residuals suggest deviation from normal behavior.
Statistical Filtering:
- The system calculates anomaly scores and identifies the earliest fault occurrence time for each variable.
- Variables are filtered based on anomaly scores and variance comparison between fault and baseline segments to select "candidate variables."
Temporal Description Generation:
- For selected variables, the system generates a structured prompt containing the process background, sensor lists, and a data table (measured vs. ideal values, deviations).
- The LLM processes this prompt to generate a natural language summary describing trends, periodicity, volatility, and key anomalies without speculating on root causes.

B. Multi-turn Tree-Structured Diagnosis Method

This component leverages the generated textual descriptions to perform zero-shot diagnosis using an LLM.

Knowledge Retrieval: Historical maintenance documents and expert records are embedded. The LLM-generated descriptions are compared against these embeddings to retrieve relevant fault knowledge ( $K_{target}$ ) via cosine similarity.
Tree-Structured Reasoning:
- The LLM receives a prompt containing the process info, retrieved fault knowledge, and the temporal descriptions.
- Dynamic Querying: If the LLM determines the current data is insufficient to diagnose, it triggers a function call (get_target_table) to request specific sensor data for further analysis.
- This creates a tree-like reasoning path where the model iteratively refines its hypothesis, potentially switching between <answer>, <tool>, and <uncertain> modes.
Human-in-the-Loop: The system supports expert feedback to refine the reasoning process, and verified diagnosis reports are fed back into the knowledge base for continuous iteration.

C. Theoretical Guarantee

The paper provides a theorem proving Fault Detectability. It establishes a lower bound for the residual energy ( $\|RES\|_2$ ) based on the singular values of the normal state matrix $D$ . This proves that if a fault introduces a deviation $\delta$ that is not perfectly aligned with the subspace of normal operations, the reconstruction residual will be non-zero and detectable.

3. Key Contributions

Semantic Gap Bridging: The first formal definition and solution for translating industrial time-series data into semantically rich natural language descriptions specifically for LLMs.
S2S Operator: A novel reconstruction-based module that transforms raw sensor data into concise, domain-aware summaries capturing trends and deviations, enabling LLMs to "understand" time-series data.
Zero-Shot Adaptive Diagnosis: A multi-turn, tree-structured diagnosis method that achieves fault detection without any fault training data, utilizing historical maintenance records and dynamic data retrieval.
Human-in-the-Loop Optimization: A framework that integrates expert feedback to refine reasoning and iteratively improve the knowledge base.

4. Experimental Results

Dataset: Experiments were conducted on the Cranfield University Multiphase Flow Process, a complex system involving water, oil, and air.
Setup: The study excluded one fault type (simulated leak) due to data inconsistencies. It used 500 normal samples to build the baseline and tested on 13 cases involving 5 different fault types.
Models Tested: Various LLMs were evaluated, including non-reasoning models (Qwen2.5-7B/72B, DeepSeek-V3) and reasoning models (DeepSeek-R1 variants, QwQ-32B).
Performance:
- The proposed framework achieved a maximum accuracy of 76.92% using the DeepSeek-R1 model.
- Reasoning Models vs. Non-Reasoning: Reasoning LLMs significantly outperformed standard instruction-tuned models (e.g., 76.92% vs. 23.08% for Qwen2.5-7B).
- Zero-Shot Capability: The system successfully diagnosed unseen faults using only normal data for baseline construction, with no fault data used during training.
- Qualitative Analysis: Case studies showed that reasoning models (like DeepSeek-R1) could correctly exclude contradictory faults (e.g., distinguishing between air and water line blockages based on flow rate trends) where smaller models failed.

5. Significance

This work represents a paradigm shift in industrial AI by moving away from "black-box" anomaly detection toward explainable, dialogue-based fault diagnosis.

Interpretability: It answers the "Why" and "How" questions by generating natural language explanations, making AI outputs actionable for human operators.
Data Efficiency: It eliminates the need for large, labeled fault datasets, which are often unavailable in real-world industrial settings.
Scalability: By leveraging the generalization capabilities of LLMs and a modular S2S operator, the framework is adaptable to various industrial processes without retraining the core model on specific fault data.
Future Direction: It establishes a foundation for "Temporal-Semantic Alignment," suggesting a future where industrial systems can communicate their status and diagnosis needs directly in natural language.