DBT-2026, a de-identified publicly available dataset of digital breast tomosynthesis exams with ground truth biopsies

This paper introduces DBT-2026, a freely available, de-identified dataset comprising 558 digital breast tomosynthesis exams with expert annotations and clinical reports from patients with BI-RADS scores of 0, 1, or 2, designed to facilitate non-commercial research in breast cancer imaging.

Wu, J., Perandini, L., Batra, T., Igoshin, S., Bari, S., de Araujo, A. L., Willemink, M. J.

Published 2026-03-04
📖 4 min read☕ Coffee break read
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you are trying to find a specific needle in a haystack, but the hay is so thick and tangled that the needles hide behind each other. This is exactly the challenge radiologists face when looking at standard 2D mammograms of women with dense breast tissue. The layers of tissue overlap, making it hard to see if a "needle" (a potential cancer) is actually there or just a trick of the light.

Enter DBT-2026, a new "super-tool" for researchers, described in this paper. Here is the breakdown in simple terms:

1. The Problem: The "Flat" Picture vs. The "3D" Slice

Traditional mammograms are like taking a flat photograph of a sandwich. If you have too many ingredients (dense tissue), you can't tell if there's a hidden pickle (a tumor) or just a slice of tomato overlapping it.

Digital Breast Tomosynthesis (DBT) is like taking that sandwich and slicing it into thin layers, then looking at each slice individually. It's a 3D view that separates the overlapping tissue, making it much easier to spot the "pickles." While doctors are getting better at using this 3D tech, they need a massive library of examples to teach computers (Artificial Intelligence) how to spot these issues automatically.

2. The Solution: A Massive, Annotated "Training Library"

The authors created DBT-2026, which is essentially a giant, open-source library of 558 real-world 3D breast scans.

  • The Collection: They gathered scans from 558 different women.
  • The "Answer Key": What makes this special is that for every scan, they have the "answer key." They know for a fact if the patient had cancer, a benign lump, or nothing at all, because many of these women had biopsies (where a tiny piece of tissue was tested in a lab).
  • The Privacy Shield: Just like you wouldn't want your medical records posted on a billboard, the researchers scrubbed every single piece of personal information (names, dates, locations) from the images and reports. They used advanced computer tools (like a digital eraser) to ensure no one could figure out who the patients were.

3. The "Team of Detectives"

You can't just dump a pile of photos on a computer and expect it to learn. Humans had to label them first.

  • A team of highly trained breast imaging experts (like master detectives) looked at every single scan.
  • They drew circles around suspicious spots, measured them, and wrote down exactly what they saw.
  • They used a "Doer-Checker" system: One expert labeled the image, and a second expert double-checked their work to make sure it was perfect.
  • Finally, a senior US-based radiologist gave the final "stamp of approval" to ensure the labels were accurate.

4. Who Can Use It? (The Rules of the Game)

The authors are making this library free for researchers, but with strict rules:

  • Non-Commercial Only: You can use it to learn, teach, and invent new ways to detect cancer. You cannot sell it or use it to make a profit.
  • No Clinical Use: You cannot use this specific dataset to diagnose a real patient in a hospital. It is for research and training AI models only.
  • No Re-identification: You are strictly forbidden from trying to figure out who the patients were.

5. Why Does This Matter?

Think of AI as a student. To become a doctor, a student needs to study thousands of textbooks and practice cases. Before this dataset, AI students trying to learn 3D breast imaging had very few textbooks to study, and many of them were missing the "answer key" (biopsy results).

DBT-2026 is like handing that student a complete, high-quality textbook with the answers in the back. By giving researchers this data, the authors hope to speed up the development of AI tools that can help doctors find breast cancer earlier and more accurately, especially in women with dense breasts where it's hardest to see.

In a nutshell: They built a secure, 3D "training gym" for AI, filled with real cases and expert labels, so that future computer programs can become better at spotting breast cancer and saving lives.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →