This document describes CERTH's approach to the MediaEval 2012 Social Event Detection task. It involved creating a graph of images based on visual, textual and temporal similarities, clustering the graph to detect candidate events, and filtering events based on geolocation and tags. CERTH evaluated three runs of their system, finding that moving to a larger training dataset did not improve performance, and the method failed on challenge 1 due to dataset differences from training. Future work could involve training on more data and exploring different graph and clustering methods.