Imagine you are the captain of a ship (a software team) preparing for a voyage (a sprint). Before you set sail, you need to know how much fuel (effort) each task will take. In the world of software, developers don't measure this in "hours" because that's too rigid. Instead, they use Story Points, which are like a fuzzy, relative measure of "how hard this feels compared to that."
Usually, the whole team sits in a circle, plays a game called "Planning Poker," and argues until they agree on a number for every task. It's fun, but it takes a long time and depends heavily on who is in the room.
This paper asks a big question: Can a super-smart AI (a Large Language Model or LLM) look at a task description and guess the "Story Points" for us, saving us all that time?
Here is the story of what they found, explained with some everyday analogies.
1. The "Zero-Shot" Test: The Expert Who Never Met You
The Question: Can an AI guess the effort for your specific project without ever seeing your project's history?
The Analogy: Imagine hiring a world-famous chef who has never cooked in your kitchen. You hand them a recipe for "Spicy Tacos" and ask, "How hard is this to make?"
- The Old Way (Machine Learning): You'd have to show the chef 1,000 photos of your tacos and how long you took to make them before they could guess correctly.
- The New Way (LLM Zero-Shot): You just ask the chef. Surprisingly, the AI chef guessed the difficulty better than a chef who had studied 80% of your past recipes!
The Result: The AI models (like Kimi and DeepSeek) were surprisingly good at guessing the difficulty just by reading the description, even without any training data. They understood the "vibe" of the task.
2. The "Few-Shot" Test: Giving the AI a Cheat Sheet
The Question: What if we give the AI just five examples of tasks you've already finished, along with the points you assigned them?
The Analogy: You tell the chef, "Hey, remember that 'Spicy Taco' we made? It took 5 points. And that 'Giant Burrito'? That was 8 points. Now, look at this new 'Enchilada'—how many points?"
The Result:
- Magic Happens: Giving the AI just five examples made it much smarter. It learned your team's specific "scale."
- The Strategy Matters: The researchers tried two ways to pick those five examples:
- The "Most Common" Strategy: Picking five easy tasks because your team usually does easy stuff. (Bad idea: The AI gets confused by hard tasks).
- The "Full Range" Strategy: Picking one easy, one medium, one hard, one very hard, and one super-hard task. (Good idea: This gave the AI a ruler to measure against).
- Winner: The "Full Range" strategy worked best. It's like giving the AI a ruler with marks for 1, 5, and 10, rather than just showing it a pile of 1s.
3. The "Comparison" Test: Which is Harder?
The Question: Humans find it easier to say "Task A is harder than Task B" than to say "Task A is 5 points." Can the AI do the same?
The Analogy: Imagine asking the chef, "Is the Taco harder than the Burrito?" vs. "How many points is the Taco?"
- Human Intuition: Humans usually say, "Comparing is easier! I don't need to count, I just know which is bigger."
- The AI Reality: The AI did not find comparing easier. In fact, it was worse at saying "A is harder than B" than it was at just guessing the number directly.
- Why? The AI seems to have a hidden "number brain." Even when you ask it to compare, it's secretly calculating a number in its head and then converting it to a "Yes/No." It's like asking a calculator to tell you which number is bigger without it actually doing the math first—it just does the math anyway!
4. The "Comparison Cheat Sheet" Test
The Question: If the AI is bad at comparing, can we still use those comparisons as a "cheat sheet" to help it guess the numbers?
The Analogy: You tell the chef, "I know you're bad at comparing, but here are five pairs of dishes where I told you which was harder. Now, guess the points for this new dish."
The Result:
- Surprise! Even though the AI wasn't great at predicting the comparisons, using those comparisons as examples still helped it guess the numbers better.
- The Special Case: For the smaller, lighter AI models (like Gemini), using "comparisons" as examples actually worked better than giving them direct numbers. It was like a training wheel that helped the smaller bike stay upright.
The Big Takeaways (The "So What?")
- AI is Ready to Help: You don't need years of data to get a good estimate. A smart AI can guess the effort of a new project just by reading the description.
- A Little Help Goes a Long Way: If you have just five past examples, show the AI a mix of easy and hard tasks. This calibrates the AI to your team's specific style.
- AI Thinks Differently Than Us: Humans love comparing things ("This is harder than that"). AI prefers to just guess the number directly. Don't try to force the AI to be human; let it be an AI.
- Not All AI is the Same: Big, powerful AI models love seeing direct numbers. Smaller, cheaper AI models might actually learn better if you show them comparisons instead.
In short: This paper shows that we can use AI to speed up software planning. We don't need to train it for months; we just need to give it a tiny "cheat sheet" of five examples, and it can save the team hours of meeting time!