Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card
This note identifies two competing hypotheses qualitatively consistent with the Claude Mythos Preview system card — that emotion vectors track functional emotions causally driving misaligned behaviour, or that they are a projection of a richer situational-context structure — and specifies the cross-reference test that would discriminate between them, with direct consequences for whether emotion-based monitoring can reliably detect dangerous model behaviour.