This paper introduces a saliency-based video object extraction (VOE) framework designed to automatically identify and extract foreground objects from videos without user intervention or prior training data. The method utilizes both visual and motion saliency information to distinguish between foreground and background across video frames, applying a conditional random field for effective pixel labeling. Experimental results demonstrate the framework's ability to maintain spatial continuity and temporal consistency, outperforming current unsupervised VOE approaches.