A distributed computing process
has been performed on a classification dataset using logistic regression as a machine learning
model. We also examined how distributed computing for large dataset is
useful in terms of time complexity .
Keywords: Logistic regression · Binary Classification · Distributed computing · Hadoop · MapReduce
9. Analysis
The dataset is for those women who are disgnosed for diabetes. Each row is an input and each column is
the attribute for the input except the last column is the output. The output is in binary, where 0 means
tested negative for diabetes and 1 for positive.
1 85 66 29 0 26.6 0.351 31 0
8 183 64 0 0 23.3 0.672 32 1