Good afternoon. My name is Robin Langerak and I am a postdoctoral researcher at the Image Science Institute at the University Medical Center Utrecht. My talk of today is titled “Atlas-based segmentation using iterative atlas selection”, but that is quite a mouthful, and I could also have titled it “How to select patients for virtual organ donation”. During my presentation I hope to explain what this means.
Cancer patients are often treated with radiotherapy. Modern day treatment planning systems can very carefully deliver radiation, but they require experts to delineate the target tissue. This delineation determines what tissue receives a certain dosage and what tissue does not. The problem with this is twofold: one it requires a lot of time and two: it is very subjective. Different human experts may create different delineations, based on aspects like proficiency and available time. As a result, the quality of a delineation cannot be objectively determined. For these reasons we would like to automate delineation. During this talk I will use delineation of the prostate as an example, because this is what I work on myself, but the method is usable for any application, because it makes no assumptions on the specific shape of the tissue that is to be delineated.
One method to automate delineation is atlas-based segmentation. The idea behind this method is that past patients ‘donate’ their delineations. Each delineation of a past patient can be translated to a delineation of a new patient. By combining a large number of ‘donor’ delineations, we can quite accurately construct a delineation for a new patient. Atlas-based segmentation is reasonably fast and eliminates the human factor without eliminating the human factor, because although the delineation is constructed automatically, the ‘donor’ patients have been delineated manually. Now let me try to display this graphically.
Atlas-based segmentation starts with a registration step, in which the ‘donor’ image is registered to an image taken from the new patient. I will not go into the registration part here. What is important here is that it results in a deformation field T that represents the deformation that is needed to register the donor image to the target image.
This same deformation field can then be applied to the ‘donor’ delineations in a step called ‘label propagation’. In practice, delineations are called ‘labels’, because they label a target area in an image. This results in a number of ‘donor’ delineations…
… which must then be combined to a single delineation which, we hope, is a reasonable approximation of the delineation of the new patient. Recent studies have shown that this delineation is almost as good as that of a human expert. We’re not there yet, and that is a good thing because there would not be scientists without something to investigate.
Combining delineations is not a straightforward process. The simplest method is through a simple majority vote rule, in which a certain voxel is included in the target delineation if most labels agree that it should be. But of course not all patients are a viable donor. For example, if a patient has a very large prostate, then we prefer not to use the delineation of this patient’s prostate for the atlas-based segmentation of someone with a small prostate. Or, if we would apply this to the brain, then there is no point in using the delineation of a four-year old for delineating the brain of an adult. By first discarding these ‘unfit’ delineations, we hope to improve the result of the atlas-based segmentation. Therefore we would like to select the ‘best’ labels. Unfortunately, the performance of individual labels is unknown, which makes the selection rather difficult.
Existing methods use the image similarity of the deformed image and the image of the target patient to select delineations before combining them into a single delineation. However, the correlation between image similarity and label performance is significant, but small. Therefore, the selection of images is far from perfect and to make it worse: we do not know exactly how many labels to select. Existing research indicates that around 20 images is optimal, but this can vary greatly per image.
Therefore we need an alternative label selection method. Let’s forget about using image similarity and instead look at label similarity.
The idea is very simple. At first we combine all labels, without performing any selection. This gives us a result of which the quality is a reasonable estimation of the delineation of the target image. Then we use the label similarity of our ‘donor’ delineations with the result of the initial label combination to estimate the performance, thinking that if this result is already very similar to the target delineation, then it can be used as a substitute for the unknown ideal delineation. On the basis of this estimate, we can make a selection and combine only the selected delineations, leading to a better result for the atlas-based segmentation. But the story does not end here. If the result is improved after a selection step, then this enables us to better estimate the performance of labels. As a result, we can propose an iterative scheme, in which an improved label selection leads to a more accurate label performance estimation, which in turn leads to a better selection. As an added benefit, the number of selected labels is automatically determined and can greatly differ between cases. In our case, it varied from 17 to 52 images, compared to the number of 20 used in existing literature.
If we look at some results, the improvement becomes obvious. We used a dataset of 100 images of the prostate and performed leave-one-out experiments on all of these images. That means that we performed almost 10.000 registrations, and 100 cases in which we had to combine ninety nine labels into a single label for each of the hundred images in our dataset. Here you see the results: I compare our new method, on the left, to selection using image similarity, in the middle, and the selection that we would be able to do if the label performance was known, on the right. This is a theoretical exercise, which is indicative of what is the maximum achievable result. To better validate our method and to ensure that it is not specific for a registration procedure, we used two different registration methods: Elastix, which uses a gradient descent-based method and DROP, which uses Markov Random Fields. Both use B-spline-based deformation. On the top row the number of labels selected on average, and you can see that in our application, more than the commonly used number of labels were selected. On the second row the measure of how accurate the combined delineation is. You can see that our method is much better than using the image similarity, and that our results approach the best achievable result. At the bottom row you can see that the label performance estimation, the improvement of which is the main target of our method, is much better than it is when using image similarity and, in fact, better than any other method for label performance estimation.
Then, finally, some comparison to the performance of human experts. This is perhaps even more important than the previous slice, because after all our goal is to automate a human task. You can see that the result of our method is still worse than that of a human expert on average, but the distributions partly overlap and in a significant number of cases our automatic delineation cannot be distinguished from that of a human expert.
Then, the conclusions: the main contribution of our work is that label selection using an iterative scheme that compares the individual labels to the best estimate so far outperforms the current methods that use image similarity. As an added benefit, the number of labels to be selected is automatically determined. In addition, the results of our method are almost as good as that of human experts, and finally, we concluded that there is very little room for further improvement in the selection of atlases: further improvement will have to come from better registration techniques. The only downside of our method is that it requires all images to be registered before selection can take place. Of course this is time-consuming and if a label is not used eventually, then we rather not bother to register it in the first place. For this reason, future work will target the selection of labels prior to the registration phase.
That brings me to the end of my presentation, and I have to thank the Dutch Cancer society for funding and mention that we made use of the ITK library and the registration packages Elastix and DROP.
Atlas-based segmentation using iterative atlas selection or How to select patients for virtual organ donation Robin Langerak 1 , Alexis Kotte 2 , Uulke van der Heide 2 , Josien Pluim 1 1 Image Sciences Institute, 2 Department of Radiotherapy University Medical Center Utrecht September 10, NFBI Symposium
Introduction <ul><li>Radiotherapy treatment requires the delineation of target organs in medical images </li></ul><ul><li>Delineation is an expert task, that is time-consuming and subjective. </li></ul><ul><li>How to automate delineation? </li></ul>
Atlas-based segmentation <ul><li>Atlas-based segmentation is a method that derives a delineation from similar delineations of past patients. </li></ul><ul><li>Atlas-based segmentation is fast and eliminates the human factor without eliminating the human factor. </li></ul>
What is atlas-based segmentation? ? L 1 L n I 1 I n • • • • I t T 1 T n 1 Registration 1 Registration
What is atlas-based segmentation? ? I n I 1 L 1 L n • • • • I t T 1 T n L 1 L n 1 Registration 1 Registration 2 Propagation 2 Propagation
What is atlas-based segmentation? ? I n I 1 L 1 L n • • • • I t T 1 T n L 1 L n 1 Registration 1 Registration 2 Propagation 2 Propagation L t 3 Label fusion ≈
Label fusion: selecting the best labels <ul><li>Labels can be fused by a simple majority vote rule </li></ul><ul><li>The result of atlas-based segmentation can be improved by selecting only the ‘best’ labels </li></ul><ul><li>Problem: the performance of individual labels is unknown. </li></ul>
A smart selection of labels ? I t L 1 L n L t ≈ I 1 ’ I n ≈ ≈ Image Similarity
A smart selection of labels ? I t L 1 L n L t ≈ I 1 ’ I n ≈ ≈ Image Similarity Label Similarity
A smart selection of labels ? I t L 1 L n L t ≈ I 1 ’ I n ≈ ≈ Label fusion Label fusion Increased performance estimation
Results <ul><li>Iterative </li></ul><ul><li>label selection </li></ul>Image similarity Known label performance Number 32 + 8 20 20 --------------------------------------------------------------------------------- Similarity with ground truth 0.85 + 0.06 0.77 + 0.09 0.87 + 0.05 segmentation --------------------------------------------------------------------------------- Correlation with ground truth 0.97 + 0.01 0.57 + 0.12 1 label performance
Results <ul><li>Atlas-based </li></ul><ul><li>segmentation </li></ul>Human experts Accuracy compared to golden 0.84+0.07 0.88+0.06 standard
Conclusions <ul><li>Atlas-based segmentation with iterative label selection outperforms selection based on image similarity. </li></ul><ul><li>The results are almost as good as that of human experts. </li></ul><ul><li>There is very little room for improvement of atlas selection </li></ul><ul><li>Future work will be on the selection of labels prior to registration </li></ul>
Acknowledgements <ul><li>This work was supported by the Dutch Cancer Society </li></ul><ul><li>Implementations of B-spline deformation in the ITK library were used. Registration was done using Elastix and DROP. </li></ul><ul><li>Thank you for you attention! </li></ul><ul><li>Questions? </li></ul>