Self-Supervised Inductive Logic Programming

The Big Problem: The "Expert" Bottleneck

Imagine you want to teach a robot to understand a secret language (like a grammar for a specific type of code or a set of rules for drawing shapes).

In the old way of doing this (called Inductive Logic Programming or ILP), you act like a strict teacher. You have to provide:

Examples of what is correct (Positive examples).
Examples of what is wrong (Negative examples).
A rulebook (Background theory) that explains the basic vocabulary and how the robot should think.

The Catch: To make this work, you, the human, have to be an expert. You have to hand-pick the "wrong" examples so the robot doesn't get confused, and you have to write a custom rulebook for every single new problem. If you want to teach the robot a new language, you have to sit down and write a whole new manual. It's slow, tedious, and limits how useful these robots can be in the real world.

The Solution: The "Poker" System

The author, Stassa Patsantzis, introduces a new system called Poker. Think of Poker not as a card game, but as a detective who teaches itself.

Poker changes the game by asking: "What if I don't have a rulebook or a list of 'wrong' answers? Can I figure it out anyway?"

Here is how Poker works, using a simple analogy:

1. The "Blank Slate" Background

Instead of giving the robot a specific rulebook, Poker gives it a universal, generic toolkit. Imagine giving a chef a kitchen with every possible ingredient and every possible cooking tool, but no recipe.

Old Way: You give the chef a specific recipe for "Spaghetti Carbonara" and tell them, "Do not add chocolate."
Poker Way: You give the chef a kitchen full of ingredients and say, "Here are three examples of good pasta. Figure out the rest."

2. The "Self-Supervised" Detective Work

Poker starts with a few labeled examples (the "good" pasta) and a huge pile of unlabeled examples (a mix of good and bad pasta that the robot doesn't know yet).

Poker uses a clever trick called "Contradiction Detection":

Step 1: Poker makes a guess at a rule (a hypothesis) that fits the known "good" examples.
Step 2: It tests this rule against the unlabeled pile.
Step 3: If the rule says, "This unlabeled string is good," but the rule also implies it should be bad (or if it breaks the logic of the known good examples), Poker realizes: "Wait, I made a mistake. This unlabeled string must actually be a 'bad' example, or my rule is too loose."
Step 4: Poker creates its own negative examples. It essentially says, "I found a pattern that looks like it could be right, but if I accept this, I break the rules. So, I will mark this as 'Wrong' and learn from it."

It's like a student taking a practice test. If they get a question wrong, they don't just move on; they analyze why it was wrong, create a new "wrong" example to remember, and update their study guide.

3. The "SONF" (The Universal Rulebook)

To make this possible without a custom manual, the author invented something called a Second-Order Definite Normal Form (SONF).

Analogy: Imagine instead of writing a specific rule for "How to build a chair," you write a rule for "How to build any furniture."
This SONF is a super-general set of logic patterns. It's broad enough to learn any grammar (like the rules for English sentences or fractal shapes) without needing to be tweaked for each specific task. It removes the need for the human to be a grammar expert.

The Results: Poker vs. Louise

The author tested Poker against a state-of-the-art system called Louise.

Louise (The Old Way): When given only "good" examples and no "bad" ones, Louise got confused. It started accepting everything as correct (over-generalizing). It was like a student who, having never been corrected, thinks "All words are spelled correctly."
Poker (The New Way): As Poker generated more and more of its own "bad" examples to test itself, it got smarter.
- More unlabeled data = Better performance.
- It learned to distinguish between "1010" (correct) and "1100" (incorrect) even without being told which was which initially.
- It successfully learned complex patterns like Context-Free Grammars (mathematical rules for language) and L-Systems (rules for drawing fractal plants and shapes).

Why This Matters

No More Manual Labor: You don't need to be an expert to teach the AI. You just need to give it a few examples and a pile of raw data.
Self-Correction: The system creates its own "negative feedback" loop. It learns by finding its own mistakes, just like a human does.
Versatility: Because it uses a universal "furniture-building" rulebook (SONF), it can switch from learning language rules to learning drawing rules without needing a new manual.

Summary in One Sentence

Poker is an AI that learns complex rules by teaching itself, using a universal toolkit and by inventing its own "wrong answers" to avoid getting confused, freeing humans from the burden of writing custom manuals for every new problem.

1. Problem Statement

Inductive Logic Programming (ILP), specifically Meta-Interpretive Learning (MIL), excels at learning recursive logic programs with invented predicates from few examples. However, standard ILP relies heavily on two manual, expert-driven inputs:

A Problem-Specific Background Theory ( $B$ ): Tailored to the specific learning target to guide the search space.
Negative Examples ( $E^-$ ): Carefully selected to prevent the system from over-generalizing (learning a hypothesis that accepts too many instances).

The paper identifies a critical bottleneck: in many real-world scenarios, expert knowledge to construct a tailored $B$ or to curate specific negative examples is unavailable. Without negative examples, standard ILP systems tend to over-generalize (e.g., learning that any string is valid rather than a specific pattern). The paper asks: Can ILP learn effectively without manually curated negative examples or a target-specific background theory?

2. Methodology: Self-Supervised ILP (SS-ILP)

The author proposes a new setting called Self-Supervised ILP (SS-ILP) and implements it via a new algorithm and system called Poker.

Core Concept: Contradiction Detection

Poker operates on the intuition that if a set of hypotheses $T$ accepts both a labelled positive example ( $e^+$ ) and an unlabelled example ( $e^?$ ), assuming $e^?$ is negative creates a contradiction.

Mechanism: Poker assumes unlabelled examples are negative. It removes any hypothesis from the current set $T$ that accepts these "negative" examples.
Conflict Resolution: If removing these hypotheses causes the remaining set $T$ to reject a known positive example ( $E^+$ ), the assumption that $e^?$ was negative is false. Poker then relabels $e^?$ as positive and adds it to $E^+$ .
Iterative Specialization: This process iteratively specializes the hypothesis set and labels unlabelled data until consistency is reached.

Key Components

Maximally General Background Theory (SONF):
Instead of a tailored $B$ , Poker uses a Second-Order Definite Normal Form (SONF). A SONF is a set of constrained metarules sufficiently general to express any program in a specific class (e.g., Context-Free Grammars or L-Systems). This removes the need for task-specific background theory design.
- Chomsky-Greibach SONF (C-GNF): For Context-Free Grammars (CFGs).
- Lindenmayer Normal Form (LNF): For L-System grammars.
Automatic Example Generation:
Poker executes the current hypothesis set as a generator to create new examples. These are added to the pool of unlabelled examples ( $E^?$ ), increasing the data available for the contradiction detection process.
Algorithm Flow (Algorithm 1):
- Generalize: Construct an initial set of hypotheses $T$ that cover the labelled positive examples $E^+$ using the SONF.
- Generate: Execute $T$ to generate new examples, adding them to $E^?$ .
- Label: Iterate through $E^?$ . Assume an example is negative. If removing hypotheses that accept it causes $T$ to fail on $E^+$ , re-label the example as positive.
- Output: A refined hypothesis $H$ and a complete labelling of all examples.

3. Key Contributions

New Setting (SS-ILP): Formalizes a learning setting where only labelled positive examples and unlabelled examples are provided, with no negative examples and a maximally general background theory.
Poker System: A new MIL system implementing the SS-ILP algorithm in Prolog.
Second-Order Definite Normal Forms (SONFs): Introduces a principled method for selecting second-order background theories (metarules) that are general enough to learn entire classes of programs (CFGs and L-Systems) without task-specific tuning.
Theoretical Proof: Proves that the probability of Poker returning a correct hypothesis increases monotonically with the number of unlabelled examples (Theorem 1).
Empirical Validation: Demonstrates that Poker outperforms state-of-the-art systems when negative examples are absent.

4. Experimental Results

The authors compared Poker against Louise (a state-of-the-art MIL system) on two tasks:

Learning Context-Free Grammars (CFLs): Including languages like $a^n b^n$ , palindromes, and balanced parentheses.
Learning L-System Grammars: Including fractals like the Dragon Curve and Sierpinski Triangle.

Key Findings:

Performance vs. Unlabelled Data: Poker's performance (measured by True Positive Rate and True Negative Rate for CFLs; Generative Accuracy for L-Systems) improves as the number of automatically generated examples increases.
Over-Generalization: Without negative examples, Louise consistently over-generalizes. As the number of labelled examples increases, Louise's hypothesis size grows, and its ability to reject invalid strings (TNR) drops significantly.
Robustness: Poker successfully learned complex recursive grammars using only a maximally general background theory (terminal vocabulary) and positive examples, without manual negative example curation.

5. Significance

This work represents a significant step toward making ILP more accessible and applicable to real-world problems where expert domain knowledge (for background theories) and curated negative datasets are scarce.

Reduced Human Burden: It eliminates the need for users to manually design background theories or curate negative examples for every new learning task.
Theoretical Rigor: By introducing SONFs, it provides a formal framework for "maximally general" background knowledge, bridging the gap between general logic and specific learning targets.
Self-Supervision in Logic: It successfully adapts the concept of self-supervised learning (common in deep learning) to symbolic logic programming, demonstrating that logic programs can be learned and refined through internal consistency checks rather than external supervision alone.

In conclusion, Poker demonstrates that with a sufficiently general background theory and an algorithm capable of detecting contradictions via unlabelled data, ILP systems can learn complex recursive structures without the traditional burden of manual negative example selection.