A comprehensive benchmark of publicly available image foundation models for their usability to predict gene expression from whole slide images

This study benchmarks five publicly available image foundation models for predicting gene expression from whole-slide images in breast cancer, demonstrating that histopathology-specific models, particularly Phikon, significantly outperform general-purpose encoders in morphology-to-transcriptome inference.

Original authors: Jabin, A., Ahmad, S.

Published 2026-03-03
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you have a massive library of microscopic photographs of human tissue (called Whole Slide Images or WSIs). These photos are so detailed they look like high-resolution maps of a city, showing individual cells and structures.

Now, imagine you want to know the "secret recipe" (the gene expression) of the cancer in that tissue just by looking at the map, without needing to run expensive chemical tests. This is the challenge the paper tackles: Can an AI look at a picture of tissue and accurately guess the genetic activity inside it?

To solve this, the authors acted like a taste-test judge for five different "super-eyes" (AI models) to see which one is best at this task.

The Contestants: Five Different "Super-Eyes"

The researchers tested five different AI models, each trained differently. Think of them as different types of students taking a final exam:

  1. DINOv2 (The Generalist): This AI was trained on millions of pictures of cats, cars, and landscapes. It's great at recognizing general shapes but has never seen a microscope slide before.
    • Analogy: Like a brilliant art student who has studied every painting in the Louvre but has never stepped foot in a hospital.
  2. MedSigLIP (The Medical Generalist): This AI learned from a mix of medical images and text descriptions. It knows some medical stuff but isn't a specialist in tissue slides.
    • Analogy: A medical student who has read the textbooks but hasn't done many internships in the pathology lab.
  3. UNI, H-Optimus-0, and Phikon (The Pathology Specialists): These three AIs were trained specifically on millions of images of human tissue slides. They have "seen" billions of cells and know exactly what healthy and cancerous tissue looks like.
    • Analogy: These are veteran pathologists who have spent decades staring at microscope slides. They know the difference between a normal cell and a cancer cell just by a glance.

The Test: The "Morphology-to-Genome" Challenge

The researchers took a specific set of breast cancer cases (from the TCGA-BRCA dataset). For each patient, they had:

  • The Picture: The high-res tissue slide.
  • The Answer Key: The actual genetic data (RNA-seq) from that patient.

They fed the pictures into the five AI models. The AIs tried to predict the genetic data based only on the visual patterns in the image. The researchers then compared the AI's guess to the real answer key using a score called Spearman Correlation (a number between -1 and 1, where 1 is a perfect match).

The Results: Who Won?

The results were clear and followed a strict hierarchy:

  • 🏆 The Winner: Phikon
    This model was the clear champion. It predicted the genetic activity with the highest accuracy and consistency.

    • Why? Because it was trained specifically on the "language" of tissue slides. It learned that specific patterns in the tissue (like how crowded the cells are or how they are arranged) directly correlate to specific genetic switches being turned on or off.
  • 🥈 The Runners-Up: UNI and H-Optimus-0
    These two also performed very well, significantly better than the general models, but they didn't quite reach Phikon's level of precision. They are still excellent "specialists."

  • 🥉 The Middle Pack: MedSigLIP
    It did okay, better than the generalist, but not as good as the tissue specialists. It had some medical knowledge but lacked the deep, specific training on tissue structure.

  • 📉 The Loser: DINOv2
    The generalist model struggled the most. While it could recognize that "this is a picture of cells," it couldn't decode the subtle biological secrets hidden in the arrangement of those cells.

    • Why? It was like asking someone who only knows how to drive a car to perform heart surgery. They know the basics of movement, but they lack the specific domain knowledge required for the task.

The Big Takeaway

The paper proves a simple but powerful rule: Specialization wins.

If you want an AI to understand the complex relationship between what a tissue looks like and what its genes are doing, you shouldn't just give it a general education (like DINOv2). You need to give it a specialized medical degree (like Phikon).

In everyday terms:
If you want to guess a person's personality just by looking at their messy desk, you'd want someone who has studied psychology and office habits (the specialist), not someone who just knows how to organize a bookshelf (the generalist). The "specialist" AIs learned that the "mess" on the tissue slide (the morphology) is actually a direct map to the genetic instructions inside.

This study provides a "menu" for doctors and scientists: if you are building tools to predict cancer genetics from images, choose the specialist models (Phikon, UNI, H-Optimus) over the general ones. It saves time, money, and leads to more accurate medical insights.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →