Rule Extraction in Machine Learning: Chat Incremental Pattern Constructor

This paper introduces Chat Incremental Pattern Constructor (ChatIPC), a lightweight symbolic learning system that extracts ordered token-transition rules from text and generates responses through definition-based expansion and similarity-guided selection, offering a mathematically formalized, interpretable alternative to conventional opaque classifiers.

Caleb Princewill Nwokocha

Published 2026-03-20
📖 5 min read🧠 Deep dive

Imagine you are teaching a robot to write stories, but with a very strict rule: you are not allowed to use a "black box" brain. You can't just feed it millions of books and hope it magically learns how to talk. Instead, you want to see exactly how it learns, step-by-step, like watching a child build with Lego bricks.

This paper introduces a system called ChatIPC (Chat Incremental Pattern Constructor). Think of it not as a super-intelligent AI, but as a very organized, rule-following librarian who builds sentences by looking at what words usually sit next to each other.

Here is the breakdown of how it works, using simple analogies:

1. The Core Idea: The "Word Train"

Most modern AIs are like a giant, blurry cloud of math. You put a question in, and a cloud of probability spits out an answer. You don't know why it chose those words.

ChatIPC is different. It treats language like a train track.

  • If it sees the word "The" followed by "cat," it lays down a track: Thecat.
  • If it sees "cat" followed by "sat," it adds another track: catsat.
  • Over time, it builds a giant, visible map of how words connect. It doesn't "guess" the next word; it looks at the map and asks, "What tracks are connected to the last word I said?"

2. The "Dictionary Superpower" (Definition Expansion)

Here is where it gets clever. A simple map might get stuck. If the robot sees "bank," it might only know to go to "river" or "money." But what if the user meant a "bank" as in a "river bank"?

ChatIPC has a magic dictionary attached to it.

  • When it sees the word "bank," it doesn't just look at the word itself. It opens the dictionary definition of "bank."
  • It finds words like "water," "shore," "edge," and "sand."
  • It treats these definition words as invisible neighbors.
  • The Analogy: Imagine you are at a party. You know a guy named "Bob." But ChatIPC doesn't just know Bob; it knows Bob's entire family tree and his hobbies. If the conversation is about "fishing," ChatIPC realizes that even though "Bob" wasn't mentioned, his "fishing hobby" makes him relevant. This helps the robot choose better words even if the exact word hasn't been seen before.

3. The "Popularity Contest" (Similarity Scoring)

When the robot needs to pick the next word, it has a list of candidates (all the words connected to the last one on its map). How does it decide?

It plays a game of "How much do we have in common?"

  • It looks at the Context: What words are in the prompt? What words has it already said? (Plus all the "invisible neighbors" from the dictionary).
  • It looks at each Candidate: What words are connected to this new word? (Plus its dictionary neighbors).
  • It calculates a Jaccard Score: This is just a fancy way of saying, "How many items do these two lists share?"
  • The Winner: The word that shares the most "common ground" with the current conversation gets picked.

4. The "Don't Be Boring" Rule (Repetition Penalty)

Robots love to get stuck in loops. "The cat sat. The cat sat. The cat sat..."
ChatIPC has a simple rule to stop this: The Boring Tax.

  • Every time the robot uses a word, it puts a "tax" on it.
  • If it tries to pick that word again, its score goes down.
  • This forces the robot to look for a different word, keeping the conversation fresh.

5. Why is this paper important? (The "Glass Box")

In the world of AI, we usually have Black Boxes (we don't know how they think) or White Boxes (we know the rules, but they are too simple to be smart).

ChatIPC is a Glass Box.

  • Transparent: You can look at the map and see exactly why it chose "sat" after "cat." It's because the rule cat → sat exists.
  • Traceable: If the robot says something weird, you can trace it back to a specific rule or a specific dictionary definition.
  • Lightweight: It doesn't need a massive supercomputer. It runs on simple logic and a dictionary.

Summary in a Nutshell

Imagine a robot that learns to write by:

  1. Copying the order of words it sees (like a scribe).
  2. Reading the dictionary to understand the "vibe" of those words.
  3. Picking the next word based on which one fits the current vibe best.
  4. Refusing to repeat itself too much.

The paper argues that we don't always need a giant, mysterious neural network to generate text. Sometimes, a clear, step-by-step set of rules that anyone can inspect is better, especially when we need to trust the machine or understand why it said what it said. It's a return to symbolic logic in an age of statistical guessing.