SmartBench: Evaluating LLMs in Smart Homes with Anomalous Device States and Behavioral Contexts
This paper introduces SmartBench, the first dataset designed to evaluate LLMs on detecting anomalous device states and behavioral contexts in smart homes, revealing that current state-of-the-art models struggle significantly with this critical task.