Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

The paper introduces Sandpiper, a mixed-initiative system that integrates interactive researcher dashboards with agentic LLMs to enable scalable, privacy-preserving, and rigorous qualitative analysis of large-scale educational discourse while mitigating hallucinations and ensuring methodological consistency.

Daryl Hedley, Doug Pietrzak, Jorge Dias, Ian Burden, Bakhtawar Ahtisham, Zhuqian Zhou, Kirk Vanacore, Josh Marland, Rachel Slama, Justin Reich, Kenneth Koedinger, René Kizilcec

Published Tue, 10 Ma
📖 4 min read☕ Coffee break read

Imagine you are a detective trying to solve a massive mystery, but instead of a few clues, you have millions of pages of conversation logs from students and teachers talking in online classrooms. You need to read every single word to understand how people learn, but your brain can only handle so much before you get tired, and reading it all by hand would take a lifetime.

This is the problem researchers face today. They have too much data and not enough time.

Enter Sandpiper. Think of Sandpiper not as a robot that replaces the detective, but as a super-smart, tireless assistant who helps the detective do their job faster and more accurately.

Here is how Sandpiper works, broken down into simple concepts:

1. The "Privacy Shield" (DG1)

Before the assistant even looks at the messy pile of papers, it puts on a pair of magic goggles. These goggles instantly blur out names, faces, and any personal secrets (like "I'm John from 3rd grade") so that the data is anonymous.

  • The Analogy: Imagine a librarian who takes every book you bring in, covers the author's name and your address with a sticker, and then lets you read it. This ensures that even if the assistant makes a mistake, no one's privacy is ever compromised. The whole operation happens inside a secure, university-owned "fortress" so no outside hackers can get in.

2. The "Strict Rulebook" (DG2)

Usually, when you ask a smart AI to summarize things, it might make things up (called "hallucinations") or get confused. Sandpiper solves this by giving the AI a strict rulebook (a codebook) and a quality control inspector.

  • The Analogy: Imagine you are hiring a chef to make a specific cake. Instead of just saying, "Make a cake," you give them a recipe with exact measurements. If the chef tries to put salt in the cake or forgets the eggs, an inspector (the Orchestrator) stops them immediately, says, "No, that's wrong," and sends them back to try again until the cake is perfect. Sandpiper forces the AI to follow the researcher's specific rules so it never invents fake facts.

3. The "Team Huddle" (Mixed-Initiative)

Sandpiper doesn't just let the AI do everything and walk away. It keeps the human researcher in the loop.

  • The Analogy: Think of it like a co-pilot system in a plane. The AI (the co-pilot) does the heavy lifting, scanning thousands of pages and highlighting the important parts. But the Human (the pilot) is still in the seat, looking at the screen, checking the AI's work, and saying, "Yes, that's right," or "No, look at this part again." The human and the AI work together as a team.

4. The "Report Card" (DG3)

How do we know the AI is actually doing a good job? Sandpiper has a built-in scorecard.

  • The Analogy: Every time the AI labels a piece of conversation, Sandpiper compares it to what a human expert would have said. It calculates a grade (like a test score) to see how often they agree. If the AI starts getting sloppy, the researcher sees the grade drop and can tweak the instructions (the "prompt") to get better results. It turns AI coding into a science experiment where you constantly improve the tool.

Why Does This Matter?

Before Sandpiper, researchers had to choose between quality (reading a few conversations carefully) or quantity (using a computer to read everything but getting it wrong).

Sandpiper gives them both. It allows researchers to analyze massive amounts of educational data with the speed of a computer but the careful, nuanced understanding of a human expert. It turns the impossible task of reading a library's worth of conversations into a manageable, trustworthy, and efficient process.

In short: Sandpiper is the bridge that lets us use powerful AI to understand human learning without losing our privacy, our rules, or our human touch.