The document proposes a method to enhance existing how-to videos by crowdsourcing step-by-step annotations. It presents a multi-stage crowdsourcing workflow to extract timing, labels, and before/after images for steps. An evaluation of the method on 75 videos across domains found 80% accuracy compared to expert annotations. A preliminary user study also found that a step-by-step video player improved task performance, self-efficacy, and design quality compared to a regular player.