Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models

This paper introduces a formal framework for "informativeness" and a corresponding hospitality-specific VQA dataset to evaluate Vision-Language Models, revealing that while current models struggle with decision-oriented reasoning, their performance significantly improves with modest domain-specific finetuning.

Jeongwoo Lee, Baek Duhyeong, Eungyeol Han, Soyeon Shin, Gukin han, Seungduk Kim, Jaehyun Jeon, Taewoo Jeong

Published 2026-03-10
📖 4 min read☕ Coffee break read

Imagine you are trying to buy a house or book a vacation rental. You don't just read the text description; you stare at the photos. You ask yourself: "Is the room big enough for my bed?" "Can I actually see the ocean from the window?" "Is there a place to put my luggage?"

You aren't looking for a robot to tell you, "This is a chair." You are looking for a robot that understands how useful that chair is for your vacation.

This paper, "Hospitality-VQA," is about teaching AI to stop being a simple photo describer and start being a helpful travel advisor. Here is the breakdown in plain English:

1. The Problem: The "Tourist" vs. The "Traveler"

Current AI models (Vision-Language Models) are like tourists who just take snapshots. If you show them a hotel room, they say, "I see a bed, a lamp, and a window." That's factually correct, but it doesn't help you decide if you should book the room.

You, the traveler, need to know:

  • Is the room layout clear enough to move around? (Spatial Legibility)
  • Does the room actually support activities like working or sleeping? (Activity Affordance)
  • Is the view blocked by a wall, or is it open and airy? (Contextual Openness)
  • Is the building shown in a way that makes sense, or is it a weird close-up? (Geometric Completeness)

The authors realized that existing AI benchmarks only test if the AI knows what objects are in a picture, not if the picture is good enough to make a decision.

2. The Solution: A New "Ruler" for Photos

To fix this, the team created a new framework called Hospitality Informativeness. Think of this as a four-point ruler used to grade hotel photos based on how helpful they are for a guest.

They built a massive new dataset called Hospitality-VQA containing 5,000 real hotel photos. Instead of asking the AI, "What color is the sofa?" they asked decision-based questions like:

  • "Can you see the whole room, or is it cut off?"
  • "Is there a desk where I could work?"
  • "How much of the outside view is visible?"

3. The Experiment: Testing the AI

The researchers took 8 of the smartest AI models available (like GPT-4o, Gemini, and others) and put them through a test using this new dataset.

The Results were surprising:

  • The "General Knowledge" Gap: The AI models were great at identifying the room type (e.g., "This is a bedroom"). But when asked about the usefulness of the room (e.g., "Is the layout clear enough to navigate?"), they struggled badly. It's like a student who can memorize the periodic table but can't do basic chemistry.
  • The "Fine-Tuning" Fix: The AI wasn't "stupid"; it just hadn't been trained for this specific job. When the researchers gave the AI a "crash course" (a small amount of extra training) specifically on these hotel questions, its performance skyrocketed. It went from guessing to being a reliable travel assistant.

4. Why This Matters

This paper is a wake-up call for the AI industry. Just because an AI can describe a picture doesn't mean it understands why humans care about that picture.

  • For Hotels: They can use this to automatically pick the best photos to show customers, ensuring the images actually help people feel confident about booking.
  • For Travelers: It paves the way for AI assistants that can say, "Don't book this room; the photo shows the bed is too close to the bathroom door, and the view is blocked," rather than just saying, "Here is a photo of a bed."

The Bottom Line

The authors built a new test to see if AI can judge the quality of information in a photo, not just the quantity of objects. They found that current AI needs a little bit of specialized training to stop being a "tourist" and start being a "travel agent."