Integrative Everything, Deep Learning and
Streaming Data
Joel Saltz MD, PhD
Chair and Professor Department of Biomedical Informatics
Professor Department of Pathology
Cherith Endowed Chair
Stony Brook University
Workshop on Clusters, Clouds, and Data for Scientific
Computing, September 6, 2018
Integrative Everything
For each of a tumor’s roughly 100 Gigacells
• Structure of cell
• Composition of cell
– Proteins
– DNA
– RNA
– Epigenetics
– Lipids
– Carbohydrates
– Etc
Spatial/Molecular
Structure, composition of tumors
Global Cancer Facts & Figures, 3rd Edition, produced by the
American Cancer Society in partnership with the International
Agency for Research on Cancer (IARC)
• 14.1 million: The number of new cancer cases diagnosed in 2012
worldwide. More than half of these – 8 million – occurred in economically
developing countries.
• 8.2 million: The number of cancer deaths in 2012 worldwide.
• 21.7 million: The number of new cancer cases expected to be diagnosed
in 2030
TCGA
TCGA Analysis and Organization
• Biospecimen Core Resource (BCR)
• Genome Characterization Centers (GCCs)
• Genome Sequencing Centers (GSCs)
• Proteome Characterization Centers (PCCs)
• Data Coordinating Center (DCC) and Cancer Genomics Hub
(CGHub)
• Genome Data Analysis Centers (GDACs)
• Analysis Working Groups (AWGs)
Digital Pathology
Cancer Imaging
TCGA Network
TCIA encourages and supports the cancer
imaging open science community by hosting and
managing Findable
Accessible, Interoperable, and Reusable (FAIR)
images and related data.
http://www.cancerimagingarchive.net/
Clark, et al. J Digital Imag 26.6 (2013): 1045-1057.
Cancer Imaging Archive – Integration of Pathology and
Radiology for Community Clinical Studies
SEER coverage includes 31.9 percent of Whites, 30.0
percent of African Americans, 44.0 percent of Hispanics,
49.3 percent of American Indians and Alaska Natives,
57.5 percent of Asians, and 68.5 percent of
Hawaiian/Pacific Islanders.
Consortia
• NCI Quantitative Imaging for Pathology (QuIP): Stony Brook, Emory,
MD Anderson, Institute for Systems Biology, Oak Ridge
• NCI SEER Pathology: Stony Brook, Emory, Rutgers, University of
Kentucky (three Cancer registries)
• Cancer Imaging Archive: Arkansas, Stony Brook, Emory (Stony Brook
leads Pathology component)
• Virtual Tissue Respository: Led by NCI SEER; Stony Brook, Emory
• TIES Research Network - Integrated Pathology text and imaging:
Pittsburgh, Stony Brook main sites, 6+ other sites (Stony Brook leads
digital Pathology)
TIES Cancer Research Network (TCRN)
Deep Learning aka The Magic Box
Roles for the “Magic Box”
1. Amazing predictions in a single leap
• Clinical data, genomics, imaging -> outcomes arising from different
treatments
2. Physically/biomedically meaningful predictions
• Distribution of different cell types in a tumor
• Correlation of Radiology to Pathology
• Structure of a protein
3. AI observing AI
• Curation of segmentations and labels (see 2 above)
4. Engineering AI systems used to make predictions
• Radiology, Pathology, ”omics” characterized by (2), curated by (3)
Geoffrey’s Diagram/Deep Learning Team Roles
• Task Specific Domain Experts
– Pathologists – Alex Lazar, Raj Gupta
• How do you know it works expert
– Biostatistician – Arvind Rao
• Larger Context Domain Experts
– Expert in cancer biology, cancer genetics – Vesteinn Thorsson, Ilya Shmulevich
• Deep learning/machine learning experts
– Computer Vision collaborator, students – Dimitris Samaras, Le Hou, Vu Nguyen,
Han Le
• High end computing group – Tahsin Kurc, Scott Oster, Jeremy Logan
• Developers – Erich Bremer, Tammy Diprima, Bridge Wang, Joe Balsamo
Brain
Le Hou, Dimitris Samaras, Tahsin Kurc, Yi Gao, Liz Vanner, James
Davis, Joel Saltz
Digital Pathology and Integrative Everything
• Statistical analyses and machine learning to link Radiology/Pathology
features to “omics” and outcome biological phenomena
• Image analysis and deep learning methods to extract features from
images
• Support queries against ensembles of features extracted from multiple
datasets
• Identify and segment trillions of objects – nuclei, glands, ducts, nodules,
tumor niches
• Analysis of integrated spatially mapped structural/”omic” information to
gain insight into cancer mechanism and to choose best intervention
• Stony Brook, Institute for Systems Biology, MD Anderson, Emory group
• TCGA Pan Cancer Immune Group – led by ISB researchers
• Deep dive into linked molecular and image based characterization of
cancer related immune response
http://www.cell.com/cell-reports/pdf/S2211-1247(18)30447-9.pdf
● Deep learning based
computational stain for staining
tumor infiltrating lymphocytes
(TILs)
●TIL patterns generated from
4,759 TCGA subjects (5,202 H&E
slides), 13 cancer types
●Computationally stained TILs
correlate with pathologist eye and
molecular estimates
●TIL patterns linked to tumor and
immune molecular features, cancer
type, and outcome
Le Hou – Graduate Student
Computer Science
Vu Nguyen– Graduate Student
Computer Science
Anne Zhao – Pathology Informatics
Biomedical Informatics, Pathology
(now Surg Path Fellow SBM)
Raj Gupta – Pathology Informatics
Biomedical Informatics, Pathology
Deep Learning
and Lymphocytes:
Stony Brook
Digital Pathology
Trainee Team
The future of Digital Pathology
Training, Model Creation
• Algorithm first trained on image patches
• Several cooperating deep learning algorithms generate heat
maps
• Heat maps used to generate new predictions
• Companion molecular statistical data analysis pipelines
Training, threshold adjustment, quality control
Validation – Stratified sampling from 5K whole slide images
Arvind Rao, expert in spatial biostatistics (U Michigan)
SKCM TCGA-D3-A2JF-06Z-00-DX1
TIL Pattern Descriptions
Qualitative (Alex Lazar, Raj Gupta)
• ‘‘Brisk, diffuse’’ diffusely infiltrative TILs
scattered throughout at least 30% of the
area of the tumor (1,856 cases);
• ‘‘Brisk, band-like’’ - band-like
boundaries bordering the tumor at its
periphery (1,185);
• ‘‘Nonbrisk, multi-focal’’ loosely
scattered TILs present in less
• than 30% but more than 5% of the area
of the tumor (1,083);
• ‘‘Non-brisk, focal’’ for TILs scattered
throughout less than 5% but greater than
1% of the area of the tumor (874);
• ‘‘None’’ < 1% TILS - in 143 cases
Quantitative – Arvind Rao
• Agglomerative clustering
• Cluster indices representing
cluster number, density, cluster
size, distance between clusters
• Traditional spatial statistics
measures
• R package clusterCrit by
Bernard Desgraupes - Ball-
Hall, Banfield-Raftery, C Index,
and Determinant Ratio indices
TCGA Pan Cancer Atlas – Immune Landscape of Cancer
• Six identified immune subtypes span
cancer tissue types and molecular
subtypes
• Immune subtypes differ by somatic
aberrations, microenvironment, and
survival
• Multiple control modalities of
molecular networks affect tumor-
immune interactions
• These analyses serve as a resource
for exploring immunogenicity across
cancer types
http://www.cell.com/immunity/fullt
ext/S1074-7613(18)30121-3
Integrated Radiomics/Pathomics/cancer mutations:
TCGA NSCLC Adeno Carcinoma
Integrative Radiology, Pathology, “omics”, outcome
Mary Saltz, Mark Schweitzer SBU Radiology
AMIA Marconi
Award Paper
Thanks!
ITCR Team
Stony Brook University
Joel Saltz
Tahsin Kurc
Yi Gao
Allen Tannenbaum
Erich Bremer
Jonas Almeida
Alina Jasniewski
Fusheng Wang
Tammy DiPrima
Andrew White
Le Hou
Furqan Baig
Mary Saltz
Raj Gupta
Emory University
Ashish Sharma
Adam Marcus
Oak Ridge National
Laboratory
Scott Klasky
Dave Pugmire
Jeremy Logan
Yale University
Michael Krauthammer
Harvard University
Rick Cummings
Funding – Thanks!
• This work was supported in part by U24CA180924,
U24CA215109, NCIP/Leidos 14X138 and
HHSN261200800001E, UG3CA225021-01 from the
NCI; R01LM011119-01 and R01LM009239 from the
NLM
• This research used resources provided by the
National Science Foundation XSEDE Science
Gateways program under grant TG-ASC130023 and
the Keeneland Computing Facility at the Georgia
Institute of Technology, which is supported by the
NSF under Contract OCI-0910735.

Integrative Everything, Deep Learning and Streaming Data

  • 1.
    Integrative Everything, DeepLearning and Streaming Data Joel Saltz MD, PhD Chair and Professor Department of Biomedical Informatics Professor Department of Pathology Cherith Endowed Chair Stony Brook University Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018
  • 2.
  • 4.
    For each ofa tumor’s roughly 100 Gigacells • Structure of cell • Composition of cell – Proteins – DNA – RNA – Epigenetics – Lipids – Carbohydrates – Etc
  • 5.
  • 9.
  • 10.
    Global Cancer Facts& Figures, 3rd Edition, produced by the American Cancer Society in partnership with the International Agency for Research on Cancer (IARC) • 14.1 million: The number of new cancer cases diagnosed in 2012 worldwide. More than half of these – 8 million – occurred in economically developing countries. • 8.2 million: The number of cancer deaths in 2012 worldwide. • 21.7 million: The number of new cancer cases expected to be diagnosed in 2030
  • 12.
  • 13.
    TCGA Analysis andOrganization • Biospecimen Core Resource (BCR) • Genome Characterization Centers (GCCs) • Genome Sequencing Centers (GSCs) • Proteome Characterization Centers (PCCs) • Data Coordinating Center (DCC) and Cancer Genomics Hub (CGHub) • Genome Data Analysis Centers (GDACs) • Analysis Working Groups (AWGs)
  • 14.
  • 15.
    TCIA encourages andsupports the cancer imaging open science community by hosting and managing Findable Accessible, Interoperable, and Reusable (FAIR) images and related data. http://www.cancerimagingarchive.net/ Clark, et al. J Digital Imag 26.6 (2013): 1045-1057. Cancer Imaging Archive – Integration of Pathology and Radiology for Community Clinical Studies
  • 16.
    SEER coverage includes31.9 percent of Whites, 30.0 percent of African Americans, 44.0 percent of Hispanics, 49.3 percent of American Indians and Alaska Natives, 57.5 percent of Asians, and 68.5 percent of Hawaiian/Pacific Islanders.
  • 20.
    Consortia • NCI QuantitativeImaging for Pathology (QuIP): Stony Brook, Emory, MD Anderson, Institute for Systems Biology, Oak Ridge • NCI SEER Pathology: Stony Brook, Emory, Rutgers, University of Kentucky (three Cancer registries) • Cancer Imaging Archive: Arkansas, Stony Brook, Emory (Stony Brook leads Pathology component) • Virtual Tissue Respository: Led by NCI SEER; Stony Brook, Emory • TIES Research Network - Integrated Pathology text and imaging: Pittsburgh, Stony Brook main sites, 6+ other sites (Stony Brook leads digital Pathology)
  • 21.
    TIES Cancer ResearchNetwork (TCRN)
  • 22.
    Deep Learning akaThe Magic Box
  • 23.
    Roles for the“Magic Box” 1. Amazing predictions in a single leap • Clinical data, genomics, imaging -> outcomes arising from different treatments 2. Physically/biomedically meaningful predictions • Distribution of different cell types in a tumor • Correlation of Radiology to Pathology • Structure of a protein 3. AI observing AI • Curation of segmentations and labels (see 2 above) 4. Engineering AI systems used to make predictions • Radiology, Pathology, ”omics” characterized by (2), curated by (3)
  • 24.
    Geoffrey’s Diagram/Deep LearningTeam Roles • Task Specific Domain Experts – Pathologists – Alex Lazar, Raj Gupta • How do you know it works expert – Biostatistician – Arvind Rao • Larger Context Domain Experts – Expert in cancer biology, cancer genetics – Vesteinn Thorsson, Ilya Shmulevich • Deep learning/machine learning experts – Computer Vision collaborator, students – Dimitris Samaras, Le Hou, Vu Nguyen, Han Le • High end computing group – Tahsin Kurc, Scott Oster, Jeremy Logan • Developers – Erich Bremer, Tammy Diprima, Bridge Wang, Joe Balsamo
  • 26.
    Brain Le Hou, DimitrisSamaras, Tahsin Kurc, Yi Gao, Liz Vanner, James Davis, Joel Saltz
  • 27.
    Digital Pathology andIntegrative Everything • Statistical analyses and machine learning to link Radiology/Pathology features to “omics” and outcome biological phenomena • Image analysis and deep learning methods to extract features from images • Support queries against ensembles of features extracted from multiple datasets • Identify and segment trillions of objects – nuclei, glands, ducts, nodules, tumor niches • Analysis of integrated spatially mapped structural/”omic” information to gain insight into cancer mechanism and to choose best intervention
  • 28.
    • Stony Brook,Institute for Systems Biology, MD Anderson, Emory group • TCGA Pan Cancer Immune Group – led by ISB researchers • Deep dive into linked molecular and image based characterization of cancer related immune response http://www.cell.com/cell-reports/pdf/S2211-1247(18)30447-9.pdf
  • 29.
    ● Deep learningbased computational stain for staining tumor infiltrating lymphocytes (TILs) ●TIL patterns generated from 4,759 TCGA subjects (5,202 H&E slides), 13 cancer types ●Computationally stained TILs correlate with pathologist eye and molecular estimates ●TIL patterns linked to tumor and immune molecular features, cancer type, and outcome
  • 30.
    Le Hou –Graduate Student Computer Science Vu Nguyen– Graduate Student Computer Science Anne Zhao – Pathology Informatics Biomedical Informatics, Pathology (now Surg Path Fellow SBM) Raj Gupta – Pathology Informatics Biomedical Informatics, Pathology Deep Learning and Lymphocytes: Stony Brook Digital Pathology Trainee Team The future of Digital Pathology
  • 31.
    Training, Model Creation •Algorithm first trained on image patches • Several cooperating deep learning algorithms generate heat maps • Heat maps used to generate new predictions • Companion molecular statistical data analysis pipelines
  • 32.
  • 33.
    Validation – Stratifiedsampling from 5K whole slide images Arvind Rao, expert in spatial biostatistics (U Michigan)
  • 34.
  • 35.
    TIL Pattern Descriptions Qualitative(Alex Lazar, Raj Gupta) • ‘‘Brisk, diffuse’’ diffusely infiltrative TILs scattered throughout at least 30% of the area of the tumor (1,856 cases); • ‘‘Brisk, band-like’’ - band-like boundaries bordering the tumor at its periphery (1,185); • ‘‘Nonbrisk, multi-focal’’ loosely scattered TILs present in less • than 30% but more than 5% of the area of the tumor (1,083); • ‘‘Non-brisk, focal’’ for TILs scattered throughout less than 5% but greater than 1% of the area of the tumor (874); • ‘‘None’’ < 1% TILS - in 143 cases Quantitative – Arvind Rao • Agglomerative clustering • Cluster indices representing cluster number, density, cluster size, distance between clusters • Traditional spatial statistics measures • R package clusterCrit by Bernard Desgraupes - Ball- Hall, Banfield-Raftery, C Index, and Determinant Ratio indices
  • 37.
    TCGA Pan CancerAtlas – Immune Landscape of Cancer • Six identified immune subtypes span cancer tissue types and molecular subtypes • Immune subtypes differ by somatic aberrations, microenvironment, and survival • Multiple control modalities of molecular networks affect tumor- immune interactions • These analyses serve as a resource for exploring immunogenicity across cancer types http://www.cell.com/immunity/fullt ext/S1074-7613(18)30121-3
  • 38.
    Integrated Radiomics/Pathomics/cancer mutations: TCGANSCLC Adeno Carcinoma Integrative Radiology, Pathology, “omics”, outcome Mary Saltz, Mark Schweitzer SBU Radiology AMIA Marconi Award Paper
  • 39.
  • 40.
    ITCR Team Stony BrookUniversity Joel Saltz Tahsin Kurc Yi Gao Allen Tannenbaum Erich Bremer Jonas Almeida Alina Jasniewski Fusheng Wang Tammy DiPrima Andrew White Le Hou Furqan Baig Mary Saltz Raj Gupta Emory University Ashish Sharma Adam Marcus Oak Ridge National Laboratory Scott Klasky Dave Pugmire Jeremy Logan Yale University Michael Krauthammer Harvard University Rick Cummings
  • 41.
    Funding – Thanks! •This work was supported in part by U24CA180924, U24CA215109, NCIP/Leidos 14X138 and HHSN261200800001E, UG3CA225021-01 from the NCI; R01LM011119-01 and R01LM009239 from the NLM • This research used resources provided by the National Science Foundation XSEDE Science Gateways program under grant TG-ASC130023 and the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the NSF under Contract OCI-0910735.