Rotterdam artery-vein segmentation (RAV) dataset

This paper introduces the Rotterdam Artery-Vein (RAV) dataset, a diverse collection of high-quality color fundus images with connectivity-validated artery-vein segmentation annotations derived from the Rotterdam Study, designed to support the development and benchmarking of robust machine learning models for retinal vascular analysis under real-world imaging conditions.

Jose Vargas Quiros, Bart Liefers, Karin van Garderen, Jeroen Vermeulen, Eyened Reading Center, Caroline Klaver

Published 2026-02-19
📖 4 min read☕ Coffee break read

Imagine your retina (the back of your eye) as a bustling city map. The blood vessels are the roads, and the doctors need to know exactly which roads are "arteries" (delivering fresh oxygen) and which are "veins" (taking away used blood). Why? Because the condition of these tiny roads tells a huge story about your overall health, from your heart to your brain.

For a long time, drawing these roads on a map was like doing it by hand with a pencil—slow, tiring, and hard to do for thousands of people. Then, computers got smart enough to do it automatically. But here's the problem: computers are only as smart as the examples they learn from. If you only show a computer maps of sunny, perfect cities, it will get confused when it sees a rainy, foggy city.

This paper introduces a new, massive "training school" for computers called the Rotterdam Artery-Vein (RAV) dataset. Here is the simple breakdown of what they did and why it matters:

1. The Problem: The "Perfect City" Bias

Most existing training data for eye scans is like a collection of photos taken only on perfect, sunny days with high-end cameras. They are clean, clear, and often focus on just one part of the eye (like the center of the city).

  • The Reality: Real life is messy. People have cataracts, the cameras vary from old film models to new digital ones, and the lighting isn't always perfect.
  • The Gap: If we train our AI only on "perfect" data, it fails when it meets a real patient with a slightly blurry photo or an older camera.

2. The Solution: The "Real-World" Training Ground

The researchers gathered 206 eye images from the famous "Rotterdam Study" (a long-term study of people in the Netherlands).

  • The Mix: They didn't just pick the best photos. They intentionally included "messy" ones—images that other computer programs would usually throw away because they looked too blurry or dark.
  • The Analogy: Think of it like teaching a driver. Instead of only letting them practice on an empty, sunny highway, this dataset throws them into rush hour, rain, fog, and different types of cars. This ensures the AI learns to drive in any condition.

3. The Secret Sauce: The "Co-Pilot" System

Drawing the lines for arteries and veins is incredibly hard. It's like trying to untangle a knot of red and blue yarn while blindfolded.

  • The Old Way: Humans would stare at a photo and try to draw every single line from scratch. This takes forever.
  • The New Way (RAV Method): The researchers used a smart computer to draw a rough outline of all the roads first. Then, human experts acted as "editors." They didn't start from zero; they just took that rough outline and colored the arteries Red and the veins Blue.
  • The Result: This made the job much faster and allowed them to create a high-quality "answer key" for the computer to learn from. They even built a special tool that let the editors zoom in and fix any "broken roads" (gaps in the lines) to make sure the map was perfect.

4. The Treasure Chest

The result is a free, public library containing:

  • The Photos: High-quality images of eyes (some clear, some challenging).
  • The Answer Key: Digital masks that show exactly where the arteries and veins are, color-coded for the computer to read.
  • The Metadata: Details about the patient's age, sex, and what kind of camera took the picture.

Why Should You Care?

This isn't just about better eye scans. It's about early warning systems.

  • If a computer can accurately read these "road maps," it can spot tiny changes that signal high blood pressure, diabetes, or even early signs of Alzheimer's disease before a human doctor might notice.
  • Because this dataset is so diverse (different ages, different cameras, different health issues), the AI trained on it will be more reliable when used in real hospitals around the world, not just in research labs.

In a nutshell: The authors built a massive, diverse, and "messy" library of eye maps with perfect labels. They did this to teach computers how to be better doctors, ensuring that when AI looks at your eyes, it understands the whole picture, not just the perfect parts.

Get papers like this in your inbox

Personalized daily or weekly digests matching your interests. Gists or technical summaries, in your language.

Try Digest →