The proposed deep learning model will automatically detect malaria using cell images.
By using CNN model with various layers, iterations and tunable parameters.
The proposed technique will lead to a reduction in diagnostics costs and improve diagnostic accuracy rate.
This is a binary classification model based on publicly available Malaria cell images dataset containing 24,958 train and 2,600 test images. The dataset is well balanced and labelled as Parasitized or Uninfected.
To build a solidly reliable model, the dataset was split into train and test data. It was then rescaled to make it necessary to flow into a Convolutional Neural Network.
To further enhance cell images for parasite detection, data augmentation technique was applied.
The process involved image rotation to different angles, flipped horizontally and zoomed out to identify true markers of infected/uninfected cells.
The model performance was assessed by using a binary crossentropy loss function, ADAM optimizer and accuracy metrics.
Based on the outlined techniques, the highest performing model achieved an accuracy rate of 98% which implies that the proposed image binary classification model can outperform a skilled microscopist.
For the model to be implemented around the globe, a phone app can be developed to detect malaria in remote places that lacks adequate infrastructure and resources.
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
Deep Learning Malaria Detection Model by Emmanuel Baisire
1. Malaria Parasite Detection Model Using Convolutional Neural Network
By Emmanuel Baisire
Executive Summary
Malaria is a widelyprevalentdisease inmany parts of the world.It is transmittedto humans
through mosquito bitesand when not detectedin time,it may lead to death. The current
diagnosticmethod requiresa specializednurse practitionerto manuallyread bloodspecimens.
The proposeddeep learningmodel will automaticallydetectmalaria usingcell images. By using
CNN model withvarious layers,iterationsand tunable parameters. The proposed technique will
leadto a reduction indiagnostics costs and improve diagnosticaccuracy rate.
This is a binary classificationmodel basedon publiclyavailable Malaria cell images dataset
containing24,958 train and 2,600 testimages. The dataset iswell balancedand labelledas
Parasitizedor Uninfected.
To builda solidlyreliable model, the datasetwas splitinto train and test data. It was then
rescaledto make it necessary to flowinto a Convolutional Neural Network.To further enhance
cell imagesfor parasite detection,data augmentation technique was applied.The process
involvedimage rotationto differentangles,flippedhorizontallyandzoomed out to identify true
markers of infected/uninfectedcells.
The model performance was assessedby usinga binary crossentropyloss function,ADAM
optimizerand accuracy metrics.
Based on the outlinedtechniques, the highestperformingmodel achievedan accuracy rate of
98% which implies thatthe proposed image binary classificationmodel can outperform a skilled
microscopist.
For the model to be implemented aroundthe globe,a phone app can be developedtodetect
malaria in remote places that lacks adequate infrastructure and resources.
2. 1. Introduction
The main purpose of this project isto developa binary image classification model that would
helpin detectingmalaria plasmodiumparasite in humans. The model will helpinreduction of
human deaths due to malaria. When plasmodiumparasitesare not detectedin time,it often
leadsto serious healthrisksincludingdeath.
According to World HealthOrganization, Malaria infectionsare fatal and leadsto millionsof
death annually.In 2021, WHO reported about 241 millioncasesand 627,000 deaths around the
world.
Malaria is a widelyprevalentdisease inmany parts of the world.It is transmitted through
mosquitobites.When a personis infected,one will sufferfromhigh fever,chillsand general
fatigue.When it isnot detectedearlyenoughand treated,it may cause liverdamage or other
serioushealth relatedproblemsincludingdeath.
Malaria diagnostictools include the use of microscopesby experiencednurseswho
painstakinglylookfor any signs of parasitesin bloodsmear images.Alternative methodsinclude
rapid diagnostictest and polymerase chainreaction. All these methodstakes time and requires
trainedexpertsto read bloodsmear images for possible Malaria cases.
The objective of this project isto builda deeplearningmodel that can detectmalaria parasite
using bloodsmear images.The builtmodel can then be incorporatedin a mobile phone
applicationto detect and fight against malaria deaths,improve accuracy and efficiency.
2. Model Architecture and Data Processing
The model is be basedon publiclyavailable Malaria cell images dataset containing24,958 train
and 2,600 test images.The dataset are labelledasParasitizedfor those withPlasmodiumand
Uninfectedfor those withoutPlasmodium.
Cell images for training and testingdataset was preprocessedand resizedto size 64 image
widthand heightto supports 4D arrays.
For data label identification,featureswere mappedas 1 for parasite image markers and 0 for
the uninfected usinga One Hot Encoding technique.Thistechnique was helpful inachievinga
faster feature convergence.
3. The model was based on a balanceddataset for both trainingand test data that was feed intoa
CNN model.Train and test images were normalizedby dividing the dataset by 255 and then
convertedinto a float32 data type.
A ConvolutionNeural Network architecture was preferred because of its advantage in image
cell binary classificationand predictions.A total of 6 modelswere trained and testedto choice
the bestmodel.They were dividedinto4 distinctCNNs and 2 VGG type neural networks to
evaluate a better performingmodel.
Aftermultiple iterations,it became evidentthat a CNN model with3 initial layersoutperformed
the other models.It isbased on 3 initial layers with32 filters,Maxpoolingof 2 by 2 windows
and a defaultBatchNormalization function.
It was then followedby an additional dense layerwith 512 filtersusinga LeakyReLU with 0.1
alpha parameter.
Finally a Sigmoidoutput activation functionwas initiatedtosolve an image classification
problem.
The model was then fitwith trainingimages and training labelsusing 32 data batch size,
validationsplitof 20% and 20 epochs.This resultedintoa model that was able to achieve an
accuracy rate of 98.5%.
5. The proposed model was able to correctly predict 1278 images as parasitizedand falsely
predicted(mislabeled) only 16 as parasitizedwhenthey are actually not infectedimage cells.
It also correctly identified 1284 cell imagesas beinguninfectedandfalselypredicted
(mislabeled) 22cell images as uninfectedwhenthere were actually parasite image cell.
A summary result table is indicatedina ConfusionMatrix Table 1.
Table 1 - Confusion Matrix Table
6. Train and Validation Accuracy
Conclusions
A CNN deeplearning-basedmodel performedbetterthan other experimental models.Itwas
able to identifymalariaparasitesat an accuracy rate of 98% with relativelyfew cell images.The
added layers,data preprocessingand settinga dense layer of 512 seemsto make a big
difference inmodel performance. The dataset was splitinto train and validationdatasetsand
run into 5 differentCNN modelswithmultiple iterations.
Solutiondesignincludedmulti task actions like cell image resizing,data augmentationand
rescalingthat ledto an improvedprediction. To successfullyidentifymalariaparasites,several
techniquesincludingdropoutmethods,relu activationfunctions.Model performance was
assessedby binary_crossentropy lossfunction',ADAM optimizerand accuracy.
Based on augmentedimages,it is evidentthat most of the misclassifiedimagesbythe model
are not easilyidentifiable evenwitha human or microscopic eye.Further data augmentation
technique will be recommendedfor future model buildingto improve our prediction.
Afterrunning differentmodelsusingvaryingparameters, optimizationsandregularization
functions.The recommendationwouldbe to further simplify the modelsand avoidusing
excessive layersandpreferable use a higherDropout rate to minimize overfitting.
7. Anotherrecommendationis to perform a furtherreviewof the cell imagesthat seemedto be
wronglylabeled duringdata collection.Alternatively,we shouldcollectmore cell image dataset
sample with varyingparasite markers. The downside to thisapproach is that data collectionwill
be costly to the stakeholdersand it is manuallyexpensive.
To improve image precision,there will be a needto seekexpertknowledge froma skilled
microscopist to reviewthe mislabeledimagesbythe model prediction.The expertwill provide
additional annotationsto the images for the model to identifytrue positivesandtrue negatives.
Additional data collectionand expertknowledge mayslowdown project implementation since
it takes time to mobilize financial and human resources to undertake data collectiontasks. We
also risk cell image qualityif special attentionis not taken intoaccount during data collection
and data storage.
For model improvementand reliability,image datasetshouldbe splitinto3 sets that include
training,validationand test datasets. Insteadof usingonly 2 datasets, the added validationset
will be useful inensuringthat the usedhyperparametersof the model are finalizedbefore
fittingthe testdataset.
For validationpurpose,I would alsorecommend the K-Foldvalidation use by splittingthe data
into 3 partitionsand use the final score average of the 3 partition as the model evaluation
score.
In conclusion,the general architecture for this solutioninvolved data-preprocessingtoensure
that we are able feedthe input intothe model to predictParasitic and uninfectedpeople.
This is an ideal model because it does not needany heavy computational resources and will be
usedas a Phone App to provide diagnostic results.A user will scan cell image to the mobile app
and get patient’spositive or negative results.The model will be deployedusing TensorFlow
REST API withlesslittle infrastructure and network maintenance costs.