User-driven development and evaluation of an agentic framework for analysis of large pathway diagrams

This paper presents the user-driven development and evaluation of Llemy, an LLM-based agentic framework designed to assist domain experts in navigating and summarizing complex molecular interaction maps through iterative prototyping and feedback.

Original authors: Corradi, M., Djidrovski, I., Ladeira, L., Staumont, B., Verhoeven, A., Sanz Serrano, J., Rougny, A., Vaez, A., Hemedan, A., Mazein, A., Niarakis, A., de Carvalho e Silva, A., Auffray, C., Wilighagen
Published 2026-03-12
📖 5 min read🧠 Deep dive
⚕️

This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer

Imagine you have a massive, ancient library filled with millions of books about how the human body works. These aren't just normal books; they are intricate, hand-drawn maps showing how every single cell, chemical, and organ talks to one another. Scientists call these "molecular interaction maps."

The problem? These maps are so huge and complicated that even experts get lost in them. It's like trying to find a specific street in a city the size of a continent without a GPS.

This paper is about building a smart, talking GPS for these biological maps. They call this GPS "Llemy."

Here is the story of how they built it, how they tested it, and what they learned, explained simply:

1. The Problem: Getting Lost in the Jungle

Scientists have created beautiful, detailed maps of how our bodies fight disease, process food, or react to drugs. But these maps are hard to read. If you want to know, "What happens to my liver if I take this specific pill?" you might have to spend hours hunting through thousands of lines of data.

2. The Solution: A "Digital Librarian" (Llemy)

The team decided to use Large Language Models (LLMs)—the same kind of AI technology that powers chatbots—to act as a guide. They built a system called Llemy.

Think of Llemy as a super-smart tour guide who has memorized every single one of these biological maps.

  • You ask a question: "Show me how bile acids are processed in the liver."
  • Llemy looks at the map: It doesn't just guess; it actually reads the specific map you selected.
  • Llemy answers: It gives you a summary, points out the exact parts of the map you need to look at, and even tells you which scientific papers back up the information.

3. How They Built It: The "Hackathon" Kitchen

Instead of building this in a quiet office and hoping it works, the team did something different. They held a two-day "hackathon" (a coding marathon) with the actual people who use these maps: doctors, biologists, and map-makers.

  • The Analogy: Imagine a chef trying to build a new kitchen tool. Instead of designing it alone, they invited the people who actually cook to come in, throw ingredients at the chef, and say, "This is too hard to chop," or "I need a sharper knife."
  • The team built a rough prototype, let the experts try to break it, and then fixed it immediately based on what the experts said.

4. The Test Drive: 25 Users Take the Wheel

Once they had a working version, they invited 25 experts to "test drive" Llemy. They asked the system all sorts of questions, from simple ones ("What is this protein?") to complex ones ("What happens if we block this pathway?").

After every answer, the users gave the system a report card with three grades:

  1. Accuracy: Was the information correct?
  2. Conciseness: Was it short and to the point?
  3. Reliability: Did it provide good links to the source so you could check the work?

5. What They Found: The Good, The Bad, and The Slow

The results were a mix of excitement and caution:

  • The Good: The system was great at summarizing complex maps. It acted like a helpful translator, turning confusing diagrams into plain English. Users felt it saved them time.
  • The Bad:
    • Speed: When the system took too long to think, users got annoyed and gave it lower grades. It's like waiting too long for a waiter to bring your menu.
    • Hallucinations: Sometimes, the AI made things up or got confused by different names for the same thing (like calling a "heart" a "cardiac muscle" and not realizing they are the same).
    • Inconsistency: If you asked the exact same question twice, you might get two slightly different answers. This is a common quirk of AI, but it makes scientists nervous.

6. The Future: Making the GPS Better

The paper concludes that while Llemy is a promising start, it needs more work.

  • The Roadmap: They plan to make it faster, fix the "made-up facts," and make sure it always points to the right source material.
  • The Big Picture: They want to move away from expensive, closed AI systems to open-source AI (free, community-built models) so that scientists everywhere can use and improve this tool without paying huge fees.

The Bottom Line

This paper isn't just about a new software tool; it's about a new way of building science tools. Instead of scientists building tools in isolation and hoping users like them, they built this tool with the users, testing it constantly.

Llemy is like a prototype for a future where complex biological data isn't locked behind a wall of jargon, but is accessible to anyone with a simple question, guided by a smart, honest, and helpful AI companion.

Drowning in papers in your field?

Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.

Try Digest →