Extract Features and Classification of Diseases from X-Ray Images Using Viola-Jones Algorithm

Extracting Features and Classification
of Diseases in X-Ray Images Using
[Viola-Jones Algorithm]
Medical Image Processing
Syed Ebraiz Ali Chishti
P177001@nu.edu.pk

05/23/18 2
Contents
This Presentation Will Cover:
- What is Viola-Jones Algorithm?
- Haar Like Features Selection
- Creating Integral Images
- Adaboost Training
- Cascading Classifier
- Implementation With Python OpenCV?
- Results Statistics of Individual Diseases
- Results Statistics of All Diseases

05/23/18 3
Viola-Jones Algorithm
The Viola–Jones algorithm is the first object
detection framework to provide competitive object
detection rates in real-time proposed in 2001 by
Paul Viola and Michael Jones.
This algorithm is a well known method for
detecting faces, but It can also be trained to
detect various other types of objects.

05/23/18 4
Four Stages
1) Haar Feature Selection
2) Creating Integral Image
3) Adaboost Training
4) Cascading Classifier

05/23/18 5
Haar Feature For Face Detection

05/23/18 6
Haar Feature For Chest X-Ray
Right LZ Pneumonia Disease

05/23/18 7
Haar Feature Selection
Haar features are digital image features used in object
recognition.
Each feature results in a single value which is
calculated by subtracting the sum of pixels under
white rectangle from the sum of pixels under black
rectangle.
Value = Σ (black area pixels) - Σ (white area pixels)

05/23/18 8
Integral Image Cont.
The Integral Image is used as a quick and effective way of
calculating the sum of values (pixel values) in a given
rectangular subset of a grid. In an integral image the value at
pixel (x,y) is the sum of pixels above and to the left of (x,y).

05/23/18 9
Integral Image
●
S(X, Y) = 6 + 8 + 7 – 5 = 16
Integral image allows calculation for adding all the pixels inside
any given rectangle using only four values at the corners of the
rectangle.
S(X, Y) = i(X, Y) + S(X-1, Y) + S(X, Y-1) - S(X-1, Y-1)

05/23/18 10
Adaboost Training Cont.
●Adaboost algorithm is a machine learning algorithm which helps
in finding only the best feature among 160,000+ possible
features on 24x24 base resolution. When best features found,
then each of the selected features are considered to be included
so they can at least perform better than random guessing.
These features are also called as weak classifier. Adaboost
construct a strong classifier as a linear combination of the
weak classifiers.
Weak ClassifierStrong Classifier

05/23/18 11
Adaboost Training
Relevant Feature
Out of 160,000+ possible features only few set of
feature will be useful among all these features to
identify a disease. Adaboost help us in finding
relevant feature rather than irrelevant.
Irrelevant FeatureAll Features

05/23/18 12
Cascading Classifier Cont.
Cascading is a particular case of ensemble
learning based on the concatenation of several
Classifiers, using all information collected from
the output from a given classifier as additional
information for the next classifier in the cascade.
Hence a single strong classifier formed out of
linear combination of all best features is not a
good to evaluate on each window because of
computation cost.

05/23/18 13
Cascading Classifier Cont.
Therefore, a cascade classifier is composed
of stages each containing a strong classifier.
So all the features are grouped into several
stages where each stage has certain number of
features.
The job of each stages is used to determine
whether a given sub window has a disease or
not. If no disease, then discarded, otherwise
move to next stage.

05/23/18 14
Cascading Classifier

Implementation With Python
(OpenCV)

05/23/18 16
Getting Started? Cont.
1) Collect "Negative" or "background" images.
Any image, just make sure your object is not present in them.
2) Collect or create "positive" images.
Thousand of images of your object.
→ Viola-Jones Algorithm is very slow while training, with only 2000 images over 10-12 stages it
will take at least 2-4 hours to complete the training, so thats why I have trained the cascade
with only 320 images each for negative & positive image for all individual diseases.
3) Try to use small images, (eg. 100X100).
4) Have same number of positive and negative images for
training.

05/23/18 17
Getting Started?
5) Negative and Positive images need description files,
for negatives, it should be bg.txt, and for positives should
be info or pos.txt or info.dat
6) Create a positive vector file by stitching all positives
images. This is done with an OpenCV command.
7) Image should be less than the actual amount of images, if
110 images than give 100 images, while creating samples.
8) Train Cascade.
Done with OpenCV Command.

05/23/18 18
OpenCV Commands For Cacasde Training?
1) Create Descriptive Files: python pos-neg-
description-file.py
2) Training Positives: opencv_createsamples -info
info.dat -num 110 -w 24 -h 24 -vec positives.vec
3) Training Cascade: opencv_traincascade -data data
-vec positives.vec -bg bg.txt -numPos 100 -numNeg 100
-numStages 10 -w 24 -h 24
4) Detector: python detector.py

05/23/18 19
Code For Creating Description File
def create_pos_n_neg():
for file_type in ['neg','pos']:
for img in os.listdir(file_type):
if file_type == 'pos':
line = file_type+'/'+img+' 1 0 0 100 100n'
with open('info.dat','a') as f:
f.write(line)
elif file_type == 'neg':
line = file_type+'/'+img+'n'
with open('bg.txt','a') as f:
f.write(line)

05/23/18 20
Code For Detector (detector.py)
import cv2
import numpy as numpy
import glob
from skimage.io import imread
diseases_cascade = cv2.CascadeClassifier('cascade.xml')
cap = cv2.VideoCapture(0)
for file in glob.iglob("xyz.jpg"):
img = gray = imread(file)
diseases = diseases_cascade.detectMultiScale(gray, 30, 30)
for (x,y,w,h) in diseases:
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'Disease', (x+w, y+h), font, 0.5, (255,255,0), 2, cv2.LINE_AA)
cv2.imshow('img', img)
k = cv2.waitKey(500) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()

05/23/18 21
Background Description File (bg.txt)
neg/00000201_000.jpg
neg/00000218_008.jpg
neg/00000172_002.jpg
neg/00000141_000.jpg
neg/00000193_010.jpg
neg/00000167_000.jpg
neg/00000197_003.jpg
neg/00000239_000.jpg
neg/00000250_002.jpg
neg/00000170_000.jpg

05/23/18 22
Positives Description File (info.dat)
pos/00000042_006.jpg 1 0 0 100 100
pos/00000502_008.jpg 1 0 0 100 100
pos/00000877_014.jpg 1 0 0 100 100
pos/00000215_000.jpg 1 0 0 100 100
pos/00000499_006.jpg 1 0 0 100 100
pos/00000487_000.jpg 1 0 0 100 100
pos/00000444_000.jpg 1 0 0 100 100
pos/00000627_034.jpg 1 0 0 100 100
pos/00000181_035.jpg 1 0 0 100 100
pos/00000348_008.jpg 1 0 0 100 100

05/23/18 23
Results Statistics of Accuracy
Diseases With
(Neg+Pos)
With
Positives
Hernia 71.1% 32.1%
Pneumonia 60.7% 56.5%
Atelectasis 57.9% 51.5%
Cardiomegaly 70.1% 69.8%
Effusion 61.6% 50.3%

05/23/18 24
Results Statistics of Accuracy
Diseases With
(Neg+Pos)
With
Positives
Emphysema 56.5% 45.0%
Infiltration 59.4% 49.3%
Mass 55.7% 43.1%
Nodule 55.7% 38.8%
Pleural
Thickening
55.5% 48.4%
Pneumothorax 62.1% 58.3%

05/23/18 25
Overall Accuracy Statistics
Total Images 6660 Input
With Diseases 3330 Input
Without Diseases 3330 Input
Neg True 1075 Wrong
Neg False 2255 Correct
Pos True 1229 Correct
Pos False 2101 Wrong
Total True Images 3484 Output
Total False Images 3176 Output
Total Accuracy 52.31% Output

05/23/18 26
References
Pythonprogramming.net. (2018). Python Programming Tutorials.
[online] Available at: https://pythonprogramming.net/haar-cascade-
object-detection-python-opencv-tutorial/ [Accessed 5 May 2018]
Docs.opencv.org (2018). Cascade Classifier Training — OpenCV
2.4.13.0 documentation [online] Available
at:https://docs.opencv.org/2.4.13/doc/user_guide/ug_traincascade.htm
l/ [Accessed 5 May 2018]
Youtube.com (2018). VIOLA JONES FACE DETECTION
EXPLAINED [online] Available at:https://www.youtube.com/watch?
v=_QZLbR67fUU/ [Accessed 22 May 2018]

05/23/18 27
Summary
We have briefly explain the concept of Viola-
Jones Algorithm, it’s four major stages, which
includes Haar like features, Creating Integral
Images, AdaBoost Training and Cascade
Classifier.
We have also implement this algorithm to extract
features and classify 14 different diseases from x-
ray images using Python (OpenCV)

Extract Features and Classification of Diseases from X-Ray Images Using Viola-Jones Algorithm

More Related Content

Similar to Extract Features and Classification of Diseases from X-Ray Images Using Viola-Jones Algorithm

Recently uploaded

Extract Features and Classification of Diseases from X-Ray Images Using Viola-Jones Algorithm