The document discusses advances in creating interpretable world models for visual trajectory prediction through weakly supervised learning. It introduces a novel architecture that employs weak supervision to enhance the physical interpretability of latent representations while improving prediction accuracy compared to traditional models. Experimental results indicate that the proposed physically interpretable world model (PIWM) outperforms others in terms of predictive accuracy and structural integrity, particularly in long-horizon tasks.