This document discusses building a machine learning model to predict Parkinson's disease using voice recordings. It describes the dataset containing recordings from 80 subjects, with 40 having Parkinson's. 44 acoustic features were extracted from each recording. The model faces challenges from multicollinearity between correlated features and from the small replicated dataset. Various techniques are proposed to address these issues, including feature selection and engineering, dimensionality reduction, modeling techniques like neural networks, and constraining the model using a causal diagram. Evaluation on a separate test set aims for high sensitivity while balancing specificity and accuracy.