Abstract: In this paper, we propose a novel Transformer based approach, namely Cross-modal Contrastive Masked AutoEncoder (C2MAE), to Self-Supervised Learning (SSL) on compressed videos. A unified ...