Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Protein crystallization image analysis ICCBM-2013
1. Madhav Sigdel
Computer Science PhD Student
University of Alabama in Huntsville
14th International Conference on the
Crystallization of Biological Macromolecules
9/27/2012
4. Protein Crystallization Phases (Hampton Research)
1. Clear Drop
2. Phase Separation
3. Regular Granular Precipitate
4. Birefringent Precipitate or Microcrystals
5. Posettes and Spherulites
6. Needle Crystals (1D Growth)
7. Plate Crystals (2D Growth)
8. Single Crystals (3D Growth < 0.2 mm)
9. Single Crystals (3D Growth > 0.2 mm)
5. General Approach
Apply image processing techniques to extract features
Apply data mining techniques for classification
Image processing
Region of Interest (drop boundary) detection
Implementation of complex algorithms for edge detection
Hough transform
Canny edge detection
Geometric and texture features
Distributed computing to speed up the process
Feature extraction computationally expensive
6. Related Works
According to no of categories
Binary classification - [Xioqung 2004], [Takahashi
2005], [Ming 2008], [Roy Liu 2008]
Distinguishes between crystal and non-crystal class
only
Multiclass classification – [Kanako Saitoh 2006],
[Christian A 2010]
Reported accuracy is very less for some classes
Varieties of classification methods applied
7. Our Approach
Low cost/in-house assembled system for
image acquisition
Trace fluorescent labeling of protein
Application of intensity and simple geometric
features for processing image
Classification into 3 categories
Non-crystals
Likely leads
Crystals
13. Image Preprocessing
Image size reduction
Median filter
Thresholding techniques
Otsu threshold – select threshold intensity which maximizes
inter-class variance and minimizes intra-class variance
Dynamic thresholding I – select 90th
percentile intensity of
green component as the threshold
Dynamic thresholding II – select maximum intensity of green
component as the threshold
16. Intensity Features
Background region in the original image
Image 1: Original image resized (Img1) Image 2: Thresholded image (Img2)
Image 3: Img1 AND Img2 Image 4: Img1 AND (Img2)c
17. Intensity features
Threshold intensity (τ)
Bright pixel count (n)
Average intensity in bright region (µf)
Standard deviation of intensity in bright region (σf)
Average intensity in dark region (µb)
Standard deviation of intensity in dark region (σb)
18. Region/Blob Features
Image 1: Original image Image 2 = Binary(Image1) Image 3 = Skeleton(Image2)
Image 4: Showing the connected
regions in different colors
Largest Blob (R1) R2 R3 R4
Extracted blobs
19. Region/Blob Features
No of blobs
Consider R1 denotes the largest blob
Area(R1)
Boundary pixel count in R1
Fullness – No. of white pixels in R1/Area(R1)
Measure of boundary smoothness of R1
Variance of boundary smoothness of R1
Measure of symmetry of R1 along X and Y-axis
Consider R2,R3,R4 and R5as the 4 largest blobs excluding R1
Average area
Average fullness
20. Dataset
Category No of images Percentage
Non Crystals 1514 67.3%
Likely Leads 404 18.0%
Crystals 332 14.8%
Total images 2250
22. Other Classification Techniques
Max class ensemble method
Uses multiple classifiers with different feature
combination
Assigned class is the maximum predicted class of all
the classifiers
Decreases false negatives but increases false
positives
Exhaustive binary classifiers
Solves multiclass problem using all possible binary
classifiers
For class 3 – no of binary classifiers = 6
Overall accuracy around 82%
23. Future Work
Classify the crystals according to crystal
morphology
Track temporal evolution of the crystals
Extract other relevant image features and
improvement of accuracy
24. Summary
Intensity is shown to be an easier but useful
search parameter to identify crystals
Efficient image processing (3 sec/image)
Classification into 3 categories – non-crystals,
likely crystals and clear crystals
Comparable accuracy with other systems