A Mutual Information-based Metric for Temporal Expressivity and Trainability Estimation in Quantum Policy Gradient Pipelines
This paper proposes a mutual information-based metric called MI-TET to quantify temporal expressivity and trainability in quantum policy gradient pipelines, demonstrating that the mutual information between action distributions and discretized rewards provides an upper bound for gradient norms and enables a prescreening criterion for initialization-time gradient fragility.