(Sunday, 7/14/2019) 4:00 PM - 5:00 PM
Room: 225BCD
Purpose: To simulate realistic manual delineation (MD) organ-at-risk (OAR) delineation variability (DV) the purpose of quantifying DV’s dosimetric impact.
Methods: Fourteen independent MD head-and-neck OAR structure sets (SS) were obtained from the ESTRO Falcon group. Seven OARs were available (BrainStem, Esophagus, OralCavity, Parotid_L, Parotid_R, SpinalCord, and Thyroid). A consensus MD SS was generated by the simultaneous truth and performance level estimation (STAPLE) method. MD DV was evaluated with respect to the STAPLE SS using the Dice coefficient and Hausdorff distance (HD) geometric similarity metrics. DVs were simulated using auto-delineation (AD)
methods: an average surface of standard deviation (ASSD) method, GrowCut segmentation, and a random walker (RW) segmentation. Each OAR AD was repeated five times with a different seed or variability level. Dice and HD were computed for each OAR AD with respect to the STAPLE SS. Dosimetric analysis was achieved by intercomparing dose-volume histograms (DVH) from a plan developed with a reference MD SS with DVHs for each MD and AD. DVH confidence bands are reported for MD and each AD method.
Results: The MD Dice was 0.7±0.2 (μ±σ). AD Dice values (ASSD, GrowCut, and RW) were 0.5±0.2, 0.7±0.2, and 0.8±0.1, respectively. HDs were 35.4±45.2, 27.3±19.1, 29.3±19.9, and 14.6±10.3. The simulated DV increased with increasing the seed standard deviations or variability level. The dosimetric effect was largest for MD DVs (larger OAR DVH confidence intervals and larger HD), even though the MD Dice was greater than the ASSD and GrowCut Dice values. GrowCut DV resulted in less dosimetric variation than RW, unlike the geometric indices.
Conclusion: We developed a framework to simulate DVs and demonstrated its feasibility. ADs were able to simulate different magnitudes of DVs, but did not replicate the dosimetric consequences of human delineation variability. The correlation between geometric similarity metrics and dosimetric consequences of DV is poor.
3. • I would like to thank the ESTRO Falcon project team, Scott Kaylor of Edu
Case, Benjamin Nelms of Proknow for the multi-delineator contour data
presented in this work
• This work was supported by NIH R01CA222216
ACKNOWLEDGEMENTS
4. To simulate realistic organ-at-risk (OAR) delineation variability (DV) in head and neck
cancer (HNC) radiation therapy
PURPOSE
4
OAR occupancy probability colorwash DVHs from various alternative OARs
5. • ESTRO Falcon contour workshop (EduCase)
– A HNC case, Larynx, 70 Gy and 35 fractions
– 14 independent Manual Delineated (MD) OAR structure sets (SS)
– BrainStem, Esophagus, OralCavity, Parotid_L, Parotid_R, SpinalCord, and Thyroid
• Consensus MD SS
– The simultaneous truth and performance level estimation (STAPLE)
DATASET
5
6. • DV Simulation using auto-delineation (AD) methods
– Average surface of standard deviation (ASSD): random perturbation, σ=2, 5, 10 mm
– GrowCut: cellular automata region growing
– Random walker (RW): probabilistic segmentation
METHODS
6
Seed points generation as below
σseed=0.5, 0.6, 0.7, 0.8, 0.9, 1.0
7. • Geometric analysis
– Similarity: Dice coefficient (Volumetric, Surface)
– Distance: Hausdorff distance (HD), Actual Average Surface Distance (AASD)
– Reference: STAPLE SS
• Dosimetric analysis
– Single dose distribution planned from a human SS
– DVH confidence bands (90%tile)
– 𝐷mean, 𝐷max, 𝐷min, 𝐷50
METHODS
7
14. • We developed a framework to simulate DVs and demonstrated
its feasibility.
• ADs were able to simulate different magnitudes of DVs
– But did not replicate the dosimetric consequences of human DVs
– Poor correlation between geometric measures and dosimetric consequences of DV
• Future works
– Setup and plan variabilities
– More human delineations (e.g. ProKnow dataset) per case and more cases
CONCLUSION
14
Over half a million patients are diagnosed with HNC each year world wide. RT is an important treatment, but it requires manually intensive delineation of radiosensitive OARs.
The OAR DV is hypothesized to impact clinical outcomes.
Testing this hypothesis requires multi-delineated image sets for multiple patient cases.
Obtaining such datasets is problematic, therefore, we developed an auto-assess DVs consequences by simulating OAR DV.
This work describes a framework to simulate DV
which can generate specific levels of DV and quantify its geometric and dosimetric variations.
We evaluated our framework on ESTRO Falcon dataset
Fourteen independent MD head-and-neck OAR structure sets (SS) were obtained from the ESTRO Falcon group.
Seven OARs were available (BrainStem, Esophagus, OralCavity, Parotid_L, Parotid_R, SpinalCord, and Thyroid).
A consensus MD SS was generated by the simultaneous truth and performance level estimation (STAPLE) method.
MD DV was evaluated with respect to the STAPLE SS using the Dice coefficient and Hausdorff distance (HD) geometric similarity metrics.
Each OAR AD was repeated five times with a different seed or variability level.
Seed points generation for GrowCut and RW
Gaussian Smooth – PSF
DV was evaluated with respect to the STAPLE SS
Each OAR AD was repeated five times with a different seed or variability level.
Dice and HD were computed for each OAR AD with respect to the STAPLE SS.
Dosimetric analysis was achieved by intercomparing dose-volume histograms (DVH) from a plan developed with a reference MD SS with DVHs for each MD and AD.
DVH confidence bands are reported for MD and each AD method.
The MD Dice was 0.7±0.2 (μ±σ).
AD Dice values (ASSD, GrowCut, and RW) were 0.5±0.2, 0.7±0.2, and 0.8±0.1, respectively.
HDs were 35.4±45.2, 27.3±19.1, 29.3±19.9, and 14.6±10.3.
The simulated DV increased with increasing the seed standard deviations or variability level.
The dosimetric effect was largest for MD DVs (larger OAR DVH confidence intervals and larger HD), even though the MD Dice was greater than the ASSD and GrowCut Dice values.
GrowCut DV resulted in less dosimetric variation than RW, unlike the geometric indices.
The MD Dice was 0.7±0.2 (μ±σ).
AD Dice values (ASSD, GrowCut, and RW) were 0.5±0.2, 0.7±0.2, and 0.8±0.1, respectively.
The MD Dice was 0.7±0.2 (μ±σ).
AD Dice values (ASSD, GrowCut, and RW) were 0.5±0.2, 0.7±0.2, and 0.8±0.1, respectively.
GrowCut showed large geometric variability but dosimetric consequence was small
We developed a framework to simulate DVs and demonstrated its feasibility.
ADs were able to simulate different magnitudes of DVs, but did not replicate the dosimetric consequences of human delineation variability.
The correlation between geometric measures and dosimetric consequences of DV is poor.
Sensitivity to DV was affected by OAR objective type (i.e. Dmean versus Dmax objectives) as well as distance from the target volume