SlideShare a Scribd company logo
Facilitating target candidate prioritization via integrated,
interactive visualizations of molecular profiling data

Wolfgang Hoeck, Ph.D., Research Informatics, Amgen Inc.
Topics for today’s presentation
 • What is Molecular Profiling Data?
 • The problem of sharing large volume data
           • Sending files isn’t working well
 • Public molecular profiling efforts
           • The Cancer Genome Atlas
           • Sanger COSMIC
           • Broad CCLE
 • TARO - an integrated database plus interactive visualizations
 • Identities and Standard Terminologies (Taxonomies)
 • Commercial molecular profiling data repositories
 • Leveraging internal and external efforts
 • Pulling everything together
 • Closing thoughts
2/5/2012                                    Wolfgang Hoeck         2
Molecular Profiling Data as a source of potential Targets
 • What is Molecular Profiling Data?
 • High volume data (millions of data points) measuring genomic or
 transcriptomic end points
           • Gene Expression: How much of my gene is expressed under a certain
           condition?
               • Comparing gene expression of two groups – Normal/Tumor or Tumor/Tumor
               • Surveying a panel of normal tissues
           • Gene Copy Number: How many copies of my gene are present in the
           genome?
               • Which genes are contained in an amplified region of a chromosome?
               • Is a gene or gene family amplified or deleted in a given tumor setting?
               • Can we validate the copy number status in an independent dataset?
           • Somatic Mutations: Is my gene normal or mutated?
               • Is the gene clearly mutated or is there conflicting evidence?
               • Are mutations affecting genes in the same pathway?
2/5/2012                                      Wolfgang Hoeck                               3
Multiple Genomic Data Types lead to a list of possible targets
List of Targets (Target Classes)




                                             Micro-    RNA-     CGH         SNP        Exome              RNA-        ChIP       ChIP
                                             Array      seq     Array       Array    Sequencing            seq        -seq       -chip



                                                 Gene               Gene               Gene              Gene
                                                                                                                          Methylation
                                               Expression        Copy Number          Mutation           Fusion




                                                Scores             Scores             Scores            Scores             Scores




                                               Prioritized Target List #1                    Prioritized Target List #2

                           2/5/2012                                         Wolfgang Hoeck                                               4
Public and Commercial Molecular Profiling Efforts

Source            Name                      Content             Data Type           Value

NCI               The Cancer Genome         20+ tumor types,    Gene Expression     Target Identification
                  Atlas (TCGA)              500+ samples each   (uA, NGS), Copy     & Validation,
                                                                Number, Clinical    Patient
                                                                Data                Stratification
Sanger            Cancer Genome             COSMIC              Somatic Mutation    Target Identification
Wellcome Trust    Project (CGP)                                 Data                & Validation,
                                                                                    Patient
                                                                                    Stratification,
                                                                                    Model Selection
Broad Institute   Cancer Cell Line          800+ Cancer Cell    Gene Expression     Target Identification
                  Encyclopedia (CCLE)       Lines               (uA), Copy Number   & Validation, Model
                                                                (uA)                Selection
GSK-caBIG         Wooster Cell Line Panel   300+ Cancer Cell    Gene Expression     Target Identification
                                            Lines               (uA), Copy Number   & Validation, Model
                                                                (uA)                Selection
RICERCA           OncoPanel                 240 Cancer Cell     Gene Expression     Target Identification
                                            Lines               (uA)                & Validation, Model
                                                                                    Selection

2/5/2012                                       Wolfgang Hoeck                                           5
TARO Data Sharing Solution Strategy
 • Data type focused
           • Gene Expression, Copy Number and Somatic Mutations
 • Technology Independent
           • Data from Microarray, NextGen Sequencing, Sanger Sequencing, etc.
 • Source Independent
           • Data comes from multiple sources: Amgen, TCGA, Broad, Sanger, Publications
 • Data Standardization enables integration at multiple Levels
           • Gene, Tissue, Disease, Sample (Tissue Sample/ Cell Line Sample)
 • Modular Development
           • Independent Database
           • Support Multiple User Interfaces
               • Visualization UI
               • Central Research Discovery Tool
               • Web Services
2/5/2012                                    Wolfgang Hoeck                                6
TARO Use Cases

   Target Identification
                                                   Target Validation
                  Model Selection
 • Target Identification:
           • Systematically identify targets via differential-expression and/or copy number in
           one or multiple tissue datasets
 • Target Validation:
           • Validate target expression in independent tissue data sets
           • Verify target expression across many normal and diseased tissue types to
           determine tissue specificity and potential off-target effects
 • Model Selection
           • Identify cell line model that highly or lowly expresses target of interest
           • Identify cell line model that contains target gene amplification
           • Provide mutation data on typical genes within selected cell lines to highlight
           mutational background
           • Identify cell lines with a specific mutation pattern (e.g.: EGFR mut and KRAS wt)
2/5/2012                                     Wolfgang Hoeck                                      7
Layering the Information Landscape
  Decision
  Support




                                                                                           Query tools for Amgen scientists
                                                                                           to search across internal and
                                                                                           external data repositories
                                      Research                   TARO-Guides
                                      Gateway
    Convergence




                                                                                           Centralizes and organizes the
                                                                                           storage of ‘Omics data for
                                                                                           bioinformaticists and biologists
                                                                                           alike
                   Omics Repository                TARO Data Mart
   Transactional




                                                               Summ
                                                                arize


                                                       Aggre
                                                                                           Operational systems to handle
                                                        gate
                                                                                           the day-to-day execution of
                                                                   Normalize
                                                                                           ‘Omics experiments and their
                                                                                           initial analysis
                      Experiments                 Omics Analysis



                                                                               Tissue      Fulfills the baseline
 Reference




                                                                               Disease     requirements for biology
   Data




                                                                               Organism    identity / reference data
                                                                               Cell Line   systems. W/o these systems
                                                                                           none of the above is possible.
                   Gene Index                      Research Taxonomy
                                                   Foundation (RTF)

2/5/2012                                         Wolfgang Hoeck                                                               8
Take it apart, standardize, then connect and integrate …




2/5/2012                    Wolfgang Hoeck                  9
TARO Guide Collection – covering the spectrum from summaries
 to details
Interactive Visualizations in Spotfire
Client or Webplayer

• Gene Expression
   – Gene-level or probe-set level
   – Panels or Comparisons
• Copy Number
   – Whole chromosome view
   – Detail per sample
• Somatic Mutations
   – 1700+ cancer cell lines
   – COSMIC and other mutation
     data
2/5/2012                                 Wolfgang Hoeck         10
Surveying the mutation landscape in Cancer Cell Lines
                                                        Standard Gene Symbols
Standard Canonical Cell Line Name




                                                                                 Standard Mutation Nomenclature




                      2/5/2012                                  Wolfgang Hoeck                                    11
Integrating Expression and Mutation Data




2/5/2012                  Wolfgang Hoeck    12
Successes and Shortcomings of TARO
 • Ideal for pointed questions
           • Show me the expression, copy number and mutation status of Gene X
           • Generate a list of differentially expressed genes for upload into NextBio
           • Identify cell lines with a particular mutation profile
           • Great for data important to Amgen
           • Provides a foundation for accumulating knowledge


 • Shortcomings
           • Breadth of data is resource-limited
           • Data isn’t always available immediately, curation takes time
           • Complexity of data space, capability vs. simplicity
           • There is still some learning involved for scientists
           • Chosen technology doesn’t always allow the desired User Interface

2/5/2012                                   Wolfgang Hoeck                                13
Commercial Molecular Profiling Data Repositories
 • Oncomine and Oncomine Power Tools (OPT)
           • Organizing and annotating oncology data in a consistent fashion
           • Oncomine Enterprise: Web user interface, enabling customer data uploads
           • OPT: Integrated Gene Browser - Bringing multiple data-types together in a
           summary view



 • NextBio
           • NextBio Enterprise: Web user interface, enabling customer data uploads
           • Multiple Apps for variety of profiling data
           • Includes literature data
           • Provides Meta Analysis: Surveying studies across multiple sources



2/5/2012                                  Wolfgang Hoeck                                 14
Integrated Gene Browser – Oncomine Power Tools




2/5/2012                        Wolfgang Hoeck              15
BodyAtlas Cell Lines – NextBio




2/5/2012               Wolfgang Hoeck       16
Where do we go from here?
 • Why do this in the first place?
           • Better informed decisions
           • Achieve higher throughput, consider more targets
           • Help in understanding the complexity of the landscape
 • We are starting to see the fruits of “semantic integration efforts”
           • Ad-hoc integration with stand-alone profiling data of different data types
           becomes much easier (e.g.: Phosphoprotein Arrays)
           • Utilization of other public profiling datasets is easier (e.g.: from publications)
           • Migrating into the “screening data” space (e.g.: compound-treated cell line
           panels) now becomes possible
 • In-House Challenges: Domain knowledge for curation, presentation of
 complex data in limited space, Database Performance – can we make it
 good enough?
 • Vendor Challenges: Interfaces for Integration,
 • Balance knowledge management efforts: Are we just data collectors? But
 wait, there is more ….
2/5/2012                                     Wolfgang Hoeck                                       17
Acknowledgements
 • Interdisciplinary team work
           • Database Designers
           • Database Administrators
           • System Administrators
           • Business Analysts
           • Scientists
           • Bioinformaticists
           • Support Analysts
           • Project Manager


   NONE OF THIS WOULD BE POSSIBLE
   WITHOUT TEAM WORK
2/5/2012                               Wolfgang Hoeck   18
THANK YOU FOR YOUR TIME 


2/5/2012       Wolfgang Hoeck   19

More Related Content

What's hot

How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
GenomeInABottle
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
GenomeInABottle
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Nathan Olson
 
GIAB GRC Workshop slides
GIAB GRC Workshop slidesGIAB GRC Workshop slides
GIAB GRC Workshop slides
GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
GenomeInABottle
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
GenomeInABottle
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
GenomeInABottle
 
NIST Microbial Genomic RM BERM14 2015-10-15
NIST Microbial Genomic RM BERM14 2015-10-15NIST Microbial Genomic RM BERM14 2015-10-15
NIST Microbial Genomic RM BERM14 2015-10-15
Nathan Olson
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
GenomeInABottle
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
GenomeInABottle
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
GenomeInABottle
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
GenomeInABottle
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
GenomeInABottle
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
GenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GenomeInABottle
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...
Carlo Torniai
 

What's hot (20)

How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
GIAB GRC Workshop slides
GIAB GRC Workshop slidesGIAB GRC Workshop slides
GIAB GRC Workshop slides
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
 
NIST Microbial Genomic RM BERM14 2015-10-15
NIST Microbial Genomic RM BERM14 2015-10-15NIST Microbial Genomic RM BERM14 2015-10-15
NIST Microbial Genomic RM BERM14 2015-10-15
 
2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Giab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptxGiab aug2015 intro and update 150821.pptx
Giab aug2015 intro and update 150821.pptx
 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...From billing codes to expertise: mining, representing and sharing clinical re...
From billing codes to expertise: mining, representing and sharing clinical re...
 

Viewers also liked

Dmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationDmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– Presentation
Wolfgang G. Hoeck
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
David Peyruc
 
Challenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profilingChallenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profiling
Senthil Natesan
 
What is target prioritization and why
What is target prioritization and whyWhat is target prioritization and why
What is target prioritization and why
Mizuguchi Laboratory
 
Drug Target Identification
Drug Target IdentificationDrug Target Identification
Drug Target Identification
Arvind306
 
Dmla0609 Hoeck Presentation
Dmla0609 Hoeck PresentationDmla0609 Hoeck Presentation
Dmla0609 Hoeck Presentation
Wolfgang G. Hoeck
 
2008 Spotfire Life Science Forum
2008 Spotfire Life Science Forum2008 Spotfire Life Science Forum
2008 Spotfire Life Science Forum
Wolfgang G. Hoeck
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
SlideShare
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
Oneupweb
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
Kapost
 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
Empowered Presentations
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
Content Marketing Institute
 
You Suck At PowerPoint!
You Suck At PowerPoint!You Suck At PowerPoint!
You Suck At PowerPoint!
Jesse Desjardins - @jessedee
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
SlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
SlideShare
 

Viewers also liked (17)

Dmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– PresentationDmla0910 – Hoeck– Presentation
Dmla0910 – Hoeck– Presentation
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
 
Challenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profilingChallenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profiling
 
What is target prioritization and why
What is target prioritization and whyWhat is target prioritization and why
What is target prioritization and why
 
Drug Target Identification
Drug Target IdentificationDrug Target Identification
Drug Target Identification
 
Dmla0609 Hoeck Presentation
Dmla0609 Hoeck PresentationDmla0609 Hoeck Presentation
Dmla0609 Hoeck Presentation
 
2008 Spotfire Life Science Forum
2008 Spotfire Life Science Forum2008 Spotfire Life Science Forum
2008 Spotfire Life Science Forum
 
What Makes Great Infographics
What Makes Great InfographicsWhat Makes Great Infographics
What Makes Great Infographics
 
10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization10 Ways to Win at SlideShare SEO & Presentation Optimization
10 Ways to Win at SlideShare SEO & Presentation Optimization
 
Masters of SlideShare
Masters of SlideShareMasters of SlideShare
Masters of SlideShare
 
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to SlideshareSTOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
STOP! VIEW THIS! 10-Step Checklist When Uploading to Slideshare
 
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content MarketingHow To Get More From SlideShare - Super-Simple Tips For Content Marketing
How To Get More From SlideShare - Super-Simple Tips For Content Marketing
 
You Suck At PowerPoint!
You Suck At PowerPoint!You Suck At PowerPoint!
You Suck At PowerPoint!
 
2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare2015 Upload Campaigns Calendar - SlideShare
2015 Upload Campaigns Calendar - SlideShare
 
What to Upload to SlideShare
What to Upload to SlideShareWhat to Upload to SlideShare
What to Upload to SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar to Slas2012 Whoeck

GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Genome Reference Consortium
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
GenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
GenomeInABottle
 
Role of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer ResearchRole of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer Research
Joel Saltz
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Nils Gehlenborg
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Nathan Olson
 
Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Data
biobase
 
Genomics In Personal Care Product Development
Genomics In Personal Care Product DevelopmentGenomics In Personal Care Product Development
Genomics In Personal Care Product Development
Genemarkers
 
Stephen Friend WIN Symposium 2011 2011-07-06
Stephen Friend WIN Symposium 2011 2011-07-06Stephen Friend WIN Symposium 2011 2011-07-06
Stephen Friend WIN Symposium 2011 2011-07-06
Sage Base
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource Center
Pathema
 
Dr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
Dr. David Gutman: Development and Validation of Radiology Descriptors in GliomasDr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
Dr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
National Cancer Institute National Cancer Informatics Program
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
guest3d0531
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
GenomeInABottle
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
nist-spin
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
GenomeInABottle
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
laserxiong
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
GenomeInABottle
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
Dr. Gerry Higgins
 

Similar to Slas2012 Whoeck (20)

GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Role of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer ResearchRole of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer Research
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Data
 
Genomics In Personal Care Product Development
Genomics In Personal Care Product DevelopmentGenomics In Personal Care Product Development
Genomics In Personal Care Product Development
 
Stephen Friend WIN Symposium 2011 2011-07-06
Stephen Friend WIN Symposium 2011 2011-07-06Stephen Friend WIN Symposium 2011 2011-07-06
Stephen Friend WIN Symposium 2011 2011-07-06
 
Pathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource CenterPathema: A Bioinformatics Resource Center
Pathema: A Bioinformatics Resource Center
 
Dr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
Dr. David Gutman: Development and Validation of Radiology Descriptors in GliomasDr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
Dr. David Gutman: Development and Validation of Radiology Descriptors in Gliomas
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
 

Slas2012 Whoeck

  • 1. Facilitating target candidate prioritization via integrated, interactive visualizations of molecular profiling data Wolfgang Hoeck, Ph.D., Research Informatics, Amgen Inc.
  • 2. Topics for today’s presentation • What is Molecular Profiling Data? • The problem of sharing large volume data • Sending files isn’t working well • Public molecular profiling efforts • The Cancer Genome Atlas • Sanger COSMIC • Broad CCLE • TARO - an integrated database plus interactive visualizations • Identities and Standard Terminologies (Taxonomies) • Commercial molecular profiling data repositories • Leveraging internal and external efforts • Pulling everything together • Closing thoughts 2/5/2012 Wolfgang Hoeck 2
  • 3. Molecular Profiling Data as a source of potential Targets • What is Molecular Profiling Data? • High volume data (millions of data points) measuring genomic or transcriptomic end points • Gene Expression: How much of my gene is expressed under a certain condition? • Comparing gene expression of two groups – Normal/Tumor or Tumor/Tumor • Surveying a panel of normal tissues • Gene Copy Number: How many copies of my gene are present in the genome? • Which genes are contained in an amplified region of a chromosome? • Is a gene or gene family amplified or deleted in a given tumor setting? • Can we validate the copy number status in an independent dataset? • Somatic Mutations: Is my gene normal or mutated? • Is the gene clearly mutated or is there conflicting evidence? • Are mutations affecting genes in the same pathway? 2/5/2012 Wolfgang Hoeck 3
  • 4. Multiple Genomic Data Types lead to a list of possible targets List of Targets (Target Classes) Micro- RNA- CGH SNP Exome RNA- ChIP ChIP Array seq Array Array Sequencing seq -seq -chip Gene Gene Gene Gene Methylation Expression Copy Number Mutation Fusion Scores Scores Scores Scores Scores Prioritized Target List #1 Prioritized Target List #2 2/5/2012 Wolfgang Hoeck 4
  • 5. Public and Commercial Molecular Profiling Efforts Source Name Content Data Type Value NCI The Cancer Genome 20+ tumor types, Gene Expression Target Identification Atlas (TCGA) 500+ samples each (uA, NGS), Copy & Validation, Number, Clinical Patient Data Stratification Sanger Cancer Genome COSMIC Somatic Mutation Target Identification Wellcome Trust Project (CGP) Data & Validation, Patient Stratification, Model Selection Broad Institute Cancer Cell Line 800+ Cancer Cell Gene Expression Target Identification Encyclopedia (CCLE) Lines (uA), Copy Number & Validation, Model (uA) Selection GSK-caBIG Wooster Cell Line Panel 300+ Cancer Cell Gene Expression Target Identification Lines (uA), Copy Number & Validation, Model (uA) Selection RICERCA OncoPanel 240 Cancer Cell Gene Expression Target Identification Lines (uA) & Validation, Model Selection 2/5/2012 Wolfgang Hoeck 5
  • 6. TARO Data Sharing Solution Strategy • Data type focused • Gene Expression, Copy Number and Somatic Mutations • Technology Independent • Data from Microarray, NextGen Sequencing, Sanger Sequencing, etc. • Source Independent • Data comes from multiple sources: Amgen, TCGA, Broad, Sanger, Publications • Data Standardization enables integration at multiple Levels • Gene, Tissue, Disease, Sample (Tissue Sample/ Cell Line Sample) • Modular Development • Independent Database • Support Multiple User Interfaces • Visualization UI • Central Research Discovery Tool • Web Services 2/5/2012 Wolfgang Hoeck 6
  • 7. TARO Use Cases Target Identification Target Validation Model Selection • Target Identification: • Systematically identify targets via differential-expression and/or copy number in one or multiple tissue datasets • Target Validation: • Validate target expression in independent tissue data sets • Verify target expression across many normal and diseased tissue types to determine tissue specificity and potential off-target effects • Model Selection • Identify cell line model that highly or lowly expresses target of interest • Identify cell line model that contains target gene amplification • Provide mutation data on typical genes within selected cell lines to highlight mutational background • Identify cell lines with a specific mutation pattern (e.g.: EGFR mut and KRAS wt) 2/5/2012 Wolfgang Hoeck 7
  • 8. Layering the Information Landscape Decision Support Query tools for Amgen scientists to search across internal and external data repositories Research TARO-Guides Gateway Convergence Centralizes and organizes the storage of ‘Omics data for bioinformaticists and biologists alike Omics Repository TARO Data Mart Transactional Summ arize Aggre Operational systems to handle gate the day-to-day execution of Normalize ‘Omics experiments and their initial analysis Experiments Omics Analysis Tissue Fulfills the baseline Reference Disease requirements for biology Data Organism identity / reference data Cell Line systems. W/o these systems none of the above is possible. Gene Index Research Taxonomy Foundation (RTF) 2/5/2012 Wolfgang Hoeck 8
  • 9. Take it apart, standardize, then connect and integrate … 2/5/2012 Wolfgang Hoeck 9
  • 10. TARO Guide Collection – covering the spectrum from summaries to details Interactive Visualizations in Spotfire Client or Webplayer • Gene Expression – Gene-level or probe-set level – Panels or Comparisons • Copy Number – Whole chromosome view – Detail per sample • Somatic Mutations – 1700+ cancer cell lines – COSMIC and other mutation data 2/5/2012 Wolfgang Hoeck 10
  • 11. Surveying the mutation landscape in Cancer Cell Lines Standard Gene Symbols Standard Canonical Cell Line Name Standard Mutation Nomenclature 2/5/2012 Wolfgang Hoeck 11
  • 12. Integrating Expression and Mutation Data 2/5/2012 Wolfgang Hoeck 12
  • 13. Successes and Shortcomings of TARO • Ideal for pointed questions • Show me the expression, copy number and mutation status of Gene X • Generate a list of differentially expressed genes for upload into NextBio • Identify cell lines with a particular mutation profile • Great for data important to Amgen • Provides a foundation for accumulating knowledge • Shortcomings • Breadth of data is resource-limited • Data isn’t always available immediately, curation takes time • Complexity of data space, capability vs. simplicity • There is still some learning involved for scientists • Chosen technology doesn’t always allow the desired User Interface 2/5/2012 Wolfgang Hoeck 13
  • 14. Commercial Molecular Profiling Data Repositories • Oncomine and Oncomine Power Tools (OPT) • Organizing and annotating oncology data in a consistent fashion • Oncomine Enterprise: Web user interface, enabling customer data uploads • OPT: Integrated Gene Browser - Bringing multiple data-types together in a summary view • NextBio • NextBio Enterprise: Web user interface, enabling customer data uploads • Multiple Apps for variety of profiling data • Includes literature data • Provides Meta Analysis: Surveying studies across multiple sources 2/5/2012 Wolfgang Hoeck 14
  • 15. Integrated Gene Browser – Oncomine Power Tools 2/5/2012 Wolfgang Hoeck 15
  • 16. BodyAtlas Cell Lines – NextBio 2/5/2012 Wolfgang Hoeck 16
  • 17. Where do we go from here? • Why do this in the first place? • Better informed decisions • Achieve higher throughput, consider more targets • Help in understanding the complexity of the landscape • We are starting to see the fruits of “semantic integration efforts” • Ad-hoc integration with stand-alone profiling data of different data types becomes much easier (e.g.: Phosphoprotein Arrays) • Utilization of other public profiling datasets is easier (e.g.: from publications) • Migrating into the “screening data” space (e.g.: compound-treated cell line panels) now becomes possible • In-House Challenges: Domain knowledge for curation, presentation of complex data in limited space, Database Performance – can we make it good enough? • Vendor Challenges: Interfaces for Integration, • Balance knowledge management efforts: Are we just data collectors? But wait, there is more …. 2/5/2012 Wolfgang Hoeck 17
  • 18. Acknowledgements • Interdisciplinary team work • Database Designers • Database Administrators • System Administrators • Business Analysts • Scientists • Bioinformaticists • Support Analysts • Project Manager NONE OF THIS WOULD BE POSSIBLE WITHOUT TEAM WORK 2/5/2012 Wolfgang Hoeck 18
  • 19. THANK YOU FOR YOUR TIME  2/5/2012 Wolfgang Hoeck 19