ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos
ViterbiPlanNet introduces a principled framework that injects procedural knowledge into instructional video planning via a Differentiable Viterbi Layer, achieving state-of-the-art performance with significantly fewer parameters and improved sample efficiency compared to existing large-scale models.