Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
This paper leverages Chinese open-weight LLMs that censor politically sensitive topics as a natural testbed to evaluate honesty elicitation and lie detection techniques, finding that methods like few-shot prompting and self-classification effectively increase truthful responses and detect falsehoods, though no approach completely eliminates deception.