This document describes a semi-automatic annotation tool for cooking videos that integrates computer vision techniques under user supervision. The tool aims to increase annotation accuracy over fully automatic tools while reducing human effort compared to fully manual annotation. It includes modules for object detection and tracking within an incremental learning framework. The performance and usability of the tool were evaluated based on the time and effort required for users to annotate video sequences.