This document discusses a novel deep-learning-based approach for building occupancy detection using CCTV cameras, addressing the limitations of existing camera-based methods. It proposes a model consisting of two modules: one for feature extraction using a convolutional neural network and another for a three-stage detection procedure. Empirical experiments demonstrated that this approach outperformed baseline models in detecting the number and locations of occupants in university building videos.