This is an AI-generated explanation of a preprint that has not been peer-reviewed. It is not medical advice. Do not make health decisions based on this content. Read full disclaimer
Imagine you walk into a gym and ask a super-smart, infinitely knowledgeable robot coach to create a workout plan for you. You ask it, "What should I do?" and it gives you a perfect plan. But then, you ask the exact same question again, word for word, and it gives you a slightly different plan. Then you ask a third time, and it's different again.
Would you trust that robot with your health?
This paper, titled "Consistency of AI-Generated Exercise Prescriptions," is basically a "reality check" for that robot coach. The author, Kihyuk Lee, wanted to see if an AI (specifically Google's Gemini 2.5 Flash) could give the same answer every time you asked it the same question, or if it was just guessing randomly like a dice roll.
Here is the breakdown of what they found, using some everyday analogies:
1. The Experiment: The "20 Times" Test
The researchers didn't just ask the AI once. They created 6 different patient profiles (ranging from a healthy 30-year-old wanting to get buff, to a 70-year-old with knee pain and a history of falls).
For each profile, they asked the AI to create an exercise plan 20 times in a row, using the exact same words. In total, they got 120 different workout plans. They then compared them to see how much the AI "wobbled" in its answers.
2. The Three Things They Checked
They looked at the AI's answers through three different lenses:
- The "Vibe" Check (Semantic Consistency):
- The Metaphor: Imagine asking a friend to describe a movie. If you ask them 20 times, will they tell you the same story?
- The Result: Yes, mostly. The AI was very good at telling the same "story." The words and general tone were almost identical every time (90% similar). It didn't get confused about who the patient was.
- The "Recipe" Check (Structural Consistency):
- The Metaphor: Imagine a recipe for chocolate cake. If you ask a chef 20 times, they should always say "2 cups of flour, 3 eggs." If one time they say "2 cups" and the next time they say "a handful," the cake might fail.
- The Result: Here's the problem. While the AI knew what to do, it couldn't agree on the numbers.
- Frequency: It was pretty good at saying "do this 3 times a week."
- Intensity (The Big Issue): This was the messiest part. For resistance training (lifting weights), the AI couldn't decide on the weight. In 10% to 25% of the plans, it gave vague answers like "lift heavy" without saying how heavy, or it gave numbers that didn't make sense. It was like a chef saying, "Add a pinch of salt" one time and "Add a cup of salt" the next.
- The "Safety Net" Check (Safety Consistency):
- The Metaphor: Does the robot always remind you to wear a helmet?
- The Result: Yes, but with a twist. The AI always included safety warnings (100% of the time). However, the amount of warning varied wildly. For a healthy young person, it gave a short safety note. For a sick, older patient, it wrote a whole novel of warnings. This is actually good! It shows the AI knows that sicker people need more caution.
3. The Big Takeaway: "Strict Rules = Better Answers"
The study found something interesting about constraints.
- When the patient had a very specific, strict medical condition (like "I have knee pain and can't walk far"), the AI gave very consistent answers. It was like a student taking a strict exam with only one right answer.
- When the patient was healthy and just wanted to "get strong," the AI had more freedom to guess. It gave different answers every time because there were many "right" ways to get strong.
4. Why This Matters (The "So What?")
The author concludes that AI is great at writing the story of a workout, but it's still shaky at doing the math.
If you are a doctor or a trainer using AI to help patients:
- Don't trust the numbers blindly. The AI might say "run at 70% speed" today and "run at 60% speed" tomorrow for the same person.
- The AI is a Draftsman, not the Architect. It can generate a great-looking plan, but a human expert needs to double-check the specific numbers (intensity, weight, time) to make sure they are safe and consistent.
In a nutshell: The AI is a very polite, well-read assistant who remembers the rules of exercise perfectly. But if you ask it to do the math on how heavy a weight should be, it might give you a different answer every time you blink. Until we fix that, we need a human to hold the clipboard and double-check the work.
Drowning in papers in your field?
Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.