Original paper licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). This is an AI-generated explanation of the paper below. It is not written by the authors. For technical accuracy, refer to the original paper. Read full disclaimer
Imagine you are trying to give a complex instruction to a very smart, but slightly literal, robot assistant.
The Old Way (The "CRUD" Problem):
Right now, most enterprise software (like the systems banks or stores use) is built for humans. If you want a human to "find the downtown branch that opened last month," they can look at a map, read a sign, and figure it out.
But if you ask a robot to do this using today's standard software interfaces, it's like asking the robot to fill out a tax form where it must know the exact 10-digit ID number of the branch before it can even start. If the robot guesses the ID wrong, the system just says "Error 404" and stops. The robot has to guess again, get another error, and eventually give up or ask a human for help. This is what the paper calls the "CRUD" mismatch: the software expects exact IDs and precise data, but the AI starts with a vague, natural-language goal.
The New Way (Agent-First Tool APIs):
The authors propose a new way of designing these tools specifically for AI agents. Instead of a rigid form, they treat the tool like a helpful human assistant who knows how to handle ambiguity.
Here is how their "Six-Verb" system works, using the analogy of a Travel Agent:
- Semantic Search (The "What do you mean?" phase):
- Old Way: You must say "Book flight to JFK."
- New Way: You say, "Book a flight to the airport near Times Square." The tool doesn't panic; it searches its database, finds three airports near Times Square, and says, "I found JFK, LaGuardia, and Newark. Which one did you mean?"
- Resolve Candidates (The "Clarification" phase):
- The AI picks the right one (JFK) from the list. The tool confirms, "Got it, JFK."
- Preview Action (The "Dry Run" phase):
- Before actually booking the ticket (which costs money), the tool shows a draft: "Here is what I'm about to do: Book a flight to JFK for $500. Is this okay?" This prevents mistakes before they happen.
- Execute Action (The "Do it" phase):
- Once the AI (or a human manager) says "Yes," the tool actually books the ticket.
- Verify Result (The "Did it work?" phase):
- The tool immediately checks its own work: "I just booked the ticket. Let me double-check the database to make sure the confirmation number is real."
- Recover from Error (The "Plan B" phase):
- If something goes wrong (e.g., the flight is sold out), the tool doesn't just crash. It says, "That flight is full, but here are three other flights that work. Which one should we try?"
The Safety Net (Governance):
The paper also introduces a strict "security guard" system.
- Dual-Layer Permissions: It checks two things: "Does this AI have the job title to do this?" (Capability) AND "Is this AI allowed to touch this specific store's data?" (Scope).
- Dynamic Risk: If the AI tries to do something small (like checking a ticket), it goes right through. If it tries to do something big (like deleting 500 records or changing prices for a whole brand), the system automatically pauses and asks a human manager for approval before proceeding.
The Results:
The authors tested this in a real-world system with 85 different tools (like managing work orders, training staff, or fixing equipment).
- Success Rate: The new system solved 88% of tasks, while the old system only solved 64%.
- Less Human Help: The new system needed human intervention only 6% of the time, compared to 22% for the old system.
- Fewer Mistakes: The AI made far fewer "hallucinations" (guessing wrong IDs) because the tool helped it find the right ID first.
The Trade-off:
The new system takes a little more time and uses more "computing power" (tokens) for each individual step because it does all these extra checks (searching, previewing, verifying). However, because it fails less often and doesn't get stuck in loops of guessing, the total time to finish a whole job is actually faster and much more reliable.
In Summary:
The paper argues that to make AI agents truly useful in businesses, we can't just give them the same tools we use for humans. We need to redesign the tools to be conversational, self-correcting, and safety-conscious, turning the AI from a "blind guesser" into a "supervised professional."
Drowning in papers in your field?
Get daily digests of the most novel papers matching your research keywords — with technical summaries, in your language.