This document discusses harmonizing clinical data across different cancer imaging datasets stored in The Cancer Imaging Archive (TCIA). It involves:
- Mapping fields from 9 lung and brain cancer clinical datasets to 108 clinical data elements from the Genomic Data Commons (GDC)
- 37 fields were commonly found in 3 or more datasets
- 23 of these 37 common fields matched GDC fields after transforming data values
- Challenges included different data encodings, lack of documentation, and semantic differences between fields
- Next steps proposed include using standards like the GDC to collect data prospectively and providing tools to help with harmonization.