Lost in Stories: Consistency Bugs in Long Story Generation by LLMs
This paper introduces ConStory-Bench, a new benchmark with 2,000 prompts and a detailed error taxonomy, alongside the ConStory-Checker automated pipeline, to systematically evaluate and analyze the prevalence and patterns of consistency errors in long-form story generation by Large Language Models.