Aligned explanations in neural networks
This paper introduces Pointwise-interpretable Networks (PiNets), a modeling framework that ensures "explanatory alignment" by combining statistical intelligence with a pseudo-linear structure to produce neural network explanations that directly underlie predictions rather than merely rationalizing them.