Tokenizing Semantic Segmentation with RLE
This paper introduces a unified language modeling approach for semantic and panoptic segmentation in images and videos that discretizes masks into run-length encoded tokens, employing novel compression strategies to enable autoregressive generation despite computational constraints.