The document proposes a method for action localization in videos using supervoxel segmentation and modeling the video as a graph. It segments the video into irregularly shaped regions, constructs graph vertices from the regions, and graph edges to capture relationships between regions. Histograms of local features are used to represent vertices. A classifier is trained on the vertices to label actions, and an MRF model is optimized to jointly infer region labels for localization. Preliminary results on the UCF Sports dataset achieve 77.5% mAP for action localization.