Delta-Crosscoder: Robust Crosscoder Model Diffing in Narrow Fine-Tuning Regimes
The paper introduces Delta-Crosscoder, a robust method that combines BatchTopK sparsity with a delta-based loss to effectively identify and mitigate localized behavioral changes in narrowly fine-tuned models, outperforming existing SAE-based baselines across diverse model architectures and tasks.