Understanding the Dynamics of Demonstration Conflict in In-Context Learning
This paper investigates how large language models process conflicting demonstrations in in-context learning, revealing a two-phase computational structure where early layers encode both correct and incorrect rules while late layers commit to predictions, and identifies specific attention heads responsible for this vulnerability that can be mitigated through targeted ablation to significantly improve performance.