Computational and informatics challenges in providing
                   clinically-relevant genome interpretation from
                          high-throughput sequencing data.
                                                                                   Reece Hart; InVitae Team, San Francisco, CA, 94107
InVitae provides sequencing and clinically-relevant genome interpretation services to                                                                                                          Report Excerpts
physicians from patient blood samples. Our value is based on three essential                                                                                                                  InVitae reports findings only for requisitioned conditions. A
                                                                                                                                                                                              report for the 150 conditions currently offered is 250 pages.
components: a database of high-quality associations of variants and conditions, carefully                                                                                                     Online reporting and organization make the report easily
designed targeted sequencing assays, and a sophisticated analysis pipeline for                                                                                                                navigable. A few features of our reports are shown below.
                                                                                                                                                                                                                                      condition groups sorted by
interpreting variants. The current process requires less than two weeks from the arrival                                                                                                                                                risk level and evidence

of blood to the delivery of a clinical report covering over 10,000 curated variants in 250
genes for up to 150 conditions (subject to physician's requisition). This poster
summarizes the computational and informatics tools that enable this process.
                                                                       Clinician's view of InVitae
    one sample                                                                 two weeks                                                                one report
   one requisition                                                         one lab, one price                                                      up to 150 conditions

                                  InVitae's process features online requisitioning and reporting, CLIA-certified
                                  sequencing, and a HIPAA-compliant information management.                                                                                                             similar conditions             carriers of known
                                                                                                                                                                                                        grouped together              pathogenic variants

                                                                                                                                                                                                                                                                    ancestry-dependent
                                            intake                       sequencing                     interpretation                 director review                                                                                                               quantitative risks


                                                       Requisitioning and Laboratory Information Management System




         Sequence Analysis and Variant Interpretation
  The InVitae pipeline is designed to provide at least 50x depth across all targeted regions for all covered genes/conditions.
                                                                                                                                                                                                 known pathogenic variants have            predicted              frequency in 1000
  Samples that do not meet stringent criteria for sequence depth, sequence coverage, and coverage of known pathogenic                                                                            strongest evidence of association          effect(s)              Genomes Project
  variants for requisitioned conditions are rerun or failed. Personal Health Information remains on premises; the rest of the
  pipeline (reads through anonymized report) executes with the Amazon Web Services platform.                                                                                                                                                                                  supporting
  ☞ See also: 3692W, lab process (Session I)                                                                                                                                                                                                                                  publications
                                                                                                                                                                known pathogenic, novel
                                                                                                                                                                pathogenic, and VUS
                                                                                                                                               known            variants appear in distinct
                                                                                                                                             pathogenic         sections of the report



                                                                                                                                              inferred
                   blood                     reads                           alignments                       variants                                                            report
                                                                                                                                             pathogenic


                                                                                                                                                 VUS


                        sample intake                   alignment                        variant calling              variant annotation                 reporting
computing challenges




                        ● online requisitioning         ● bwa                            ● GATK                       ● classification                   ● overall pipeline

                        ● barcoding                     ● base quality                   ● polyMNP caller             ● variant effect/VUS                 versioning                                                                                         haplotype alleles, inferred
                        ● information security            recalibration                  ● variant phasing              pipeline                         ● lab director                                                                                     haplotypes, and risk association
     process




                                                        ● automated                      ● haplotype calling          ● quantitative risk                  oversight
                        sequencing assay                ● coverage
                        ● assay design
                                                                                                                        modeling                                                                                                       absence of known pathogenic variants
                                                                                                                                                                                                                                (covered regions and qualities shown at end of report)
                        ● PCR fill-in
                                                          analysis
                        ● multiplexing
                                           Curation Database
                        ● automation

                        ● LIMS
                                         The heart of InVitae is the curation database, a manually curated compendium of
                                                                                                                                                                                                                                     pathogenic variants inferred from condition-specific
                                         associations of genomic variants and clinical conditions derived from literature and                                                                                                            rules for the interpretation of novel variants
                                         public sources. The curation database informs assay design and variant
                                         interpretation.


                                                                                                                                        ☞ See also:                                                                                                     variants of unknown significance,
                                                 Curated genomic variants and                                                                                                                                                                           with and without prior observations
                                                                                                                       curation         1766W curation process
                                                    clinical findings derived from                             
                                                                                                                       database         1771W variant classification
                                                literature and public databases.
                                                                                                                                        (both Session I)

                                                                                                                                                                                                                                                ancestry-aware inference
                                                                                                                                                                                                                                                of risk from combination
                                                                                                                                                                                                                                                       of odds ratios
        The Trouble with Transcripts                                                                       Variant Simulation and Report Testing
  A consistent definition of a transcript's exon structure is                                             Curators and developers may easily generate reports for
  essential to reliably mapping and interpreting variants.                                                simulated samples with arbitrary collections of curated and
  Inconsistencies lead to incorrect translations of research                                              novel variants across multiple conditions. Tests may be
  findings to clinical settings. We account for the following                                             saved for future execution and regression testing.
  challenges:
                                                                                                          simulate variants for specified                             simulate new variants
        exon structure changes for    disagreement between reference    suboptimal alignments to              genders and ancestry                                       for VUS analysis
        a single RefSeq accession          genome and transcript         the reference genome
        e.g., NM_001035.2 (RYR2)          (3514/33165 transcripts)            e.g., ALMS1


                  NM_012345.6


                  NM_012345.6


                   ENST987654


                  NM_123456.7
                                                                                                                                             select curated variants create homozygous,
                                                                                                                                                    heterozygous, and no-data loci                                                        regions where transcript sequences differ from
        structure and CDS equivalence of              transcript records with atypical record formats                                                                                                                                       the reference genome are not interpretable
         RefSeq and Ensembl transcripts                           (all 18 DMD transcripts)

ASHG 2012 Poster

  • 1.
    Computational and informaticschallenges in providing clinically-relevant genome interpretation from high-throughput sequencing data. Reece Hart; InVitae Team, San Francisco, CA, 94107 InVitae provides sequencing and clinically-relevant genome interpretation services to Report Excerpts physicians from patient blood samples. Our value is based on three essential InVitae reports findings only for requisitioned conditions. A report for the 150 conditions currently offered is 250 pages. components: a database of high-quality associations of variants and conditions, carefully Online reporting and organization make the report easily designed targeted sequencing assays, and a sophisticated analysis pipeline for navigable. A few features of our reports are shown below. condition groups sorted by interpreting variants. The current process requires less than two weeks from the arrival risk level and evidence of blood to the delivery of a clinical report covering over 10,000 curated variants in 250 genes for up to 150 conditions (subject to physician's requisition). This poster summarizes the computational and informatics tools that enable this process. Clinician's view of InVitae one sample two weeks one report one requisition one lab, one price up to 150 conditions InVitae's process features online requisitioning and reporting, CLIA-certified sequencing, and a HIPAA-compliant information management. similar conditions carriers of known grouped together pathogenic variants ancestry-dependent intake sequencing interpretation director review quantitative risks Requisitioning and Laboratory Information Management System Sequence Analysis and Variant Interpretation The InVitae pipeline is designed to provide at least 50x depth across all targeted regions for all covered genes/conditions. known pathogenic variants have predicted frequency in 1000 Samples that do not meet stringent criteria for sequence depth, sequence coverage, and coverage of known pathogenic strongest evidence of association effect(s) Genomes Project variants for requisitioned conditions are rerun or failed. Personal Health Information remains on premises; the rest of the pipeline (reads through anonymized report) executes with the Amazon Web Services platform. supporting ☞ See also: 3692W, lab process (Session I) publications known pathogenic, novel pathogenic, and VUS known variants appear in distinct pathogenic sections of the report inferred blood reads alignments variants report pathogenic VUS sample intake alignment variant calling variant annotation reporting computing challenges ● online requisitioning ● bwa ● GATK ● classification ● overall pipeline ● barcoding ● base quality ● polyMNP caller ● variant effect/VUS versioning haplotype alleles, inferred ● information security recalibration ● variant phasing pipeline ● lab director haplotypes, and risk association process ● automated ● haplotype calling ● quantitative risk oversight sequencing assay ● coverage ● assay design modeling absence of known pathogenic variants (covered regions and qualities shown at end of report) ● PCR fill-in analysis ● multiplexing Curation Database ● automation ● LIMS The heart of InVitae is the curation database, a manually curated compendium of pathogenic variants inferred from condition-specific associations of genomic variants and clinical conditions derived from literature and rules for the interpretation of novel variants public sources. The curation database informs assay design and variant interpretation. ☞ See also: variants of unknown significance, Curated genomic variants and with and without prior observations curation 1766W curation process clinical findings derived from  database 1771W variant classification literature and public databases. (both Session I) ancestry-aware inference of risk from combination of odds ratios The Trouble with Transcripts Variant Simulation and Report Testing A consistent definition of a transcript's exon structure is Curators and developers may easily generate reports for essential to reliably mapping and interpreting variants. simulated samples with arbitrary collections of curated and Inconsistencies lead to incorrect translations of research novel variants across multiple conditions. Tests may be findings to clinical settings. We account for the following saved for future execution and regression testing. challenges: simulate variants for specified simulate new variants exon structure changes for disagreement between reference suboptimal alignments to genders and ancestry for VUS analysis a single RefSeq accession genome and transcript the reference genome e.g., NM_001035.2 (RYR2) (3514/33165 transcripts) e.g., ALMS1 NM_012345.6 NM_012345.6 ENST987654 NM_123456.7 select curated variants create homozygous, heterozygous, and no-data loci regions where transcript sequences differ from structure and CDS equivalence of transcript records with atypical record formats the reference genome are not interpretable RefSeq and Ensembl transcripts (all 18 DMD transcripts)