Probing the Limits of the Lie Detector Approach to LLM Deception
This paper challenges the assumption that LLM deception is coextensive with lying by demonstrating that models can successfully deceive through misleading non-falsities that current truth probes fail to detect, thereby revealing a critical blind spot in mechanistic deception detection.