S2S-FDD: Bridging Industrial Time Series and Natural Language for Explainable Zero-shot Fault Diagnosis

The paper proposes S2S-FDD, a novel framework that bridges the semantic gap between high-dimensional industrial time-series signals and natural language by converting sensor data into descriptive summaries and utilizing a multi-turn tree-structured reasoning process with historical documents to achieve explainable, zero-shot fault diagnosis.

Baoxue Li, Chunhui Zhao

Published 2026-03-10
📖 4 min read☕ Coffee break read

Imagine you are a mechanic for a massive, complex factory. Usually, when something goes wrong, the factory's computer screams at you with a bunch of confusing numbers: "Pressure is 4.2, Temperature is 98, Flow is 0.5."

Traditional AI models are like a junior mechanic who sees those numbers, says, "Okay, that's a 'Type 4 Fault'," and stops there. It doesn't tell you why it's broken or how to fix it. It's like a doctor saying, "You have a disease," without explaining what's wrong with your body or what medicine to take.

On the other hand, you have a brilliant, well-read expert (a Large Language Model, or LLM) who knows everything about how machines should work. But this expert has a problem: they only speak "Human" (English, text, stories). They don't understand "Machine" (numbers, graphs, time-series data). If you hand them a spreadsheet of numbers, they get confused and might hallucinate a crazy answer.

This paper introduces a translator named "S2S-FDD" that bridges the gap between the two.

Here is how it works, using simple analogies:

1. The Translator (The "Signal-to-Semantics" Operator)

Imagine the factory sensors are like a heart monitor beeping in a rhythm. The numbers are the raw beeps.

  • The Problem: The expert doctor (the LLM) can't read the beeps directly.
  • The Solution: The S2S-FDD framework acts as a medical translator. It looks at the raw numbers, compares them to what a "healthy" heart looks like, and then writes a short, plain-English story for the doctor.
    • Instead of saying: "Sensor A dropped from 50 to 20 between seconds 10 and 15."
    • It says: "The pressure in the pipe suddenly dropped and stayed low, which is very different from the normal steady rhythm we usually see."

This translation captures the trends (is it going up or down?), the rhythm (is it steady or spiking?), and the deviations (is it acting weird?).

2. The Detective Game (Multi-turn Tree Diagnosis)

Once the translator gives the expert the story, the expert (the LLM) starts playing detective.

  • Step 1: The Library. The expert looks at a giant library of old maintenance logs (historical data) to see if this story matches any past cases.
  • Step 2: The Interrogation. If the story isn't clear enough, the expert doesn't just guess. It acts like a detective asking for more clues. It can say, "I need to see the data for Valve B to be sure," and the system automatically fetches that specific data.
  • Step 3: The Tree. This creates a "tree" of reasoning.
    • Branch A: "Is it a leak?" -> Check data -> "No."
    • Branch B: "Is it a blockage?" -> Check data -> "Yes, the flow stopped."
    • Result: The expert concludes, "It's a blockage in the water line."

3. The "Zero-Shot" Superpower

Usually, to teach a computer to recognize a specific broken part, you need to show it thousands of pictures of that broken part. But in a factory, you can't always wait for a machine to break 1,000 times to learn from it.

This system is "Zero-Shot." It means it can diagnose a brand new, never-before-seen fault without ever having seen it before. How? Because it understands the logic of the machine (via the text descriptions) and the principles of physics, not just memorized patterns. It's like a mechanic who has never seen a specific new car model but can still diagnose a broken engine because they understand how engines work in general.

4. The Human in the Loop

Finally, the system doesn't just give an answer and walk away. It allows a human expert to step in. If the AI says, "I think it's a leak," the human can say, "No, that doesn't make sense because the valve is open." The AI learns from this feedback, refining its "tree" of reasoning for next time. It's a continuous learning partnership.

The Bottom Line

The researchers tested this on a complex system that mixes air, oil, and water (like a giant, pressurized blender).

  • Old AI: Got it right about 23% of the time.
  • New S2S-FDD: Got it right 77% of the time, using only data from when the machine was working perfectly (no broken examples needed).

In short: This paper teaches a super-smart AI how to "speak" the language of factory sensors, turning cold, hard numbers into a clear, logical story that explains exactly what is broken and why, all without needing to see the machine break first.