This document discusses the development of a systematic data collection procedure for software defect prediction (SDP) datasets, emphasizing the need for standardized practices to reduce biases in existing empirical studies. The research aims to identify and quantify issues that affect data collection, linking bug reports and commit messages, and improving the comparability of studies. It concludes by highlighting the importance of rigorous methods and the evaluation of various software metrics tools to enhance future data collection processes.