Engineering Systems for Data Analysis Using Interactive Structured Inductive Programming

The paper introduces iProg, an interactive tool that leverages a structured communication protocol between humans and large language models to decompose scientific data analysis tasks into declarative Data Flow Diagrams and generate corresponding code, thereby achieving significantly faster development, higher code quality, and better performance than traditional Low Code/No Code alternatives.

Shraddha Surana, Ashwin Srinivasan, Michael Bain

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you are a master architect (the human engineer) who needs to build a complex, custom house (a scientific data analysis system). You have a very talented but sometimes scatterbrained assistant (the Large Language Model, or LLM) who can write blueprints and lay bricks incredibly fast.

The problem? If you just tell the assistant, "Build me a house," they might give you a castle with a swimming pool in the roof, or a shed with no doors. If you try to fix it by just chatting back and forth, you might end up with a Frankenstein's monster of a building that looks okay from the outside but collapses when you turn on the lights.

This paper introduces iProg, a new way to work with AI assistants that solves this problem. Think of iProg not as a "magic button" that builds the house for you, but as a strict, structured communication protocol that forces the human and the AI to work together like a perfect team.

Here is how it works, broken down into simple steps:

1. The Two-Step Dance: Structure First, Bricks Second

Most "No-Code" tools try to build the whole house in one giant leap. iProg says, "No way. Let's do this in two distinct phases."

  • Phase 1: The Blueprint (The DFD)
    Before laying a single brick, the human and the AI sit down to draw a Data Flow Diagram (DFD). Imagine this as a flowchart or a subway map.

    • The AI says: "Here is my idea: We need a station for 'Data Loading,' a tunnel for 'Cleaning,' and a terminal for 'Analysis'."
    • The Human says: "Wait, that tunnel is too short. Let's add a stop for 'Validation' here."
    • They keep swapping ideas until they both Ratify (agree) on the map.
    • Why this matters: If you get the map wrong, the whole house is wrong. iProg ensures the map is perfect before any code is written.
  • Phase 2: Laying the Bricks (Code Generation)
    Once the map is approved, the AI starts building one room at a time.

    • The Human points to the "Data Loading" room on the map and says, "Okay, build this. Here are the rules: It must start with a CSV file and end with a clean table."
    • The AI builds the room.
    • The Human inspects it. "Good, but the door is the wrong color." -> Revise.
    • The AI fixes it. "Perfect." -> Ratify.
    • They move to the next room.

2. The "Traffic Light" System (The Protocol)

The secret sauce of iProg is a strict set of rules for how the human and AI talk. They don't just chat; they use specific "tags" like traffic lights:

  • 🟢 RATIFY: "This is good. Move on."
  • 🔴 REFUTE: "This is wrong. Explain why or try again."
  • 🟡 REVISE: "I see what you tried, but change this specific part."
  • REJECT: "Stop. This entire approach is broken."

This prevents the AI from rambling or the human from getting frustrated. It forces the AI to stop and think before it writes a single line of code.

3. The Results: Faster, Better, and Safer

The authors tested this on two real-world scientific projects: one about stars (Astrophysics) and one about proteins (Biochemistry).

  • The "No-Code" AI (The Wild West): When they let the AI just try to build the whole system without the map or the traffic lights, it failed. It produced code that didn't run, or systems that gave wrong answers. It was like trying to build a house by throwing bricks at a wall and hoping they stick.
  • The "Low-Code" AI (The Unstructured Helper): When they let the AI build it with some human help but no strict rules, it was okay for simple tasks but failed on complex ones.
  • The iProg Approach (The Structured Team):
    • Speed: It was 10 times faster than hiring a team of humans to do it manually.
    • Quality: The code was cleaner, easier to read, and had fewer errors than the "No-Code" attempts.
    • Accuracy: The final results were actually better than the original systems built by human teams over months.

The Big Picture Takeaway

This paper argues that we shouldn't replace humans with AI, nor should we let AI run wild.

Instead, we should use AI as a powerful junior engineer who needs a senior architect (the human) to provide the structure. By forcing the AI to draw the map first and then build room-by-room with strict approval checks, we get the speed of AI with the reliability of human engineering.

In short: iProg turns a chaotic conversation with a super-smart but confused robot into a disciplined, step-by-step construction project that builds high-quality scientific tools in record time.