The Pistoia Alliance is examining the challenges of the Faster Safe Companion Diagnostics (CDx) by Aligning Discovery & Clinical Data in the Regulatory Domain.
The slides discuss whether the data standards used in the research environment be aligned better with the data standards used in the regulated environment? If so, the time and cost of the development of NGS-based CDx could be reduced.
It’s likely that most of you know this, but for the sake of completeness:
Note that the term ‘next-generation’ is as compared to ‘traditional’ Sanger sequencing and is now a bit stale; it is now the current generation of sequencing technology.
NGS approaches can range from the sequencing of targeted ‘panels’ of genes implicated in a particular disease, to whole-genomes, with whole-exome an intermediate approach between the two. Exome sequencing targets the 2% or so of the genome that actually codes for proteins.
This chart shows the dramatic decline in cost of sequencing (2001-2017) has fuelled the growth of both population-level efforts and clinical applications. Around 2016 we reached the milestone of the $1000 whole-genome.
This cost refers to the basic data generation and analysis process, but does not include costs associated with interpreting the data and making decisions based on it.
There are an ever-increasing number of genomics projects, and these are continually increasing in size.
As an example, I have shown examples of many of those looking at over 100,000 individuals.
Large genomics projects are a product of both reductions in speed and costs of the technology to generate the data and to analyse it.
Whilst all the initial large-scale projects were research-based, there has been a rapid growth in new large-scale projects applied to clinical samples for medical diagnostics, better understanding drug actions, and helping to generate so-called ‘personalised medicines’ targeted to patients with specific phenotypes or genotypes.
NGS analyses need to satisfy criteria related to validity and utility in order to move from research to clinical use.
Welcome to precisionFDA, the community platform for NGS assay evaluation and regulatory science exploration.
One of the major challenges posed by NGS technology as compared to some other assay technologies, is the reproducibility of the analysis methods and pipelines that go from raw reads to variants and genotypes. NGS results depend as much on computation as on chemistry.
The PrecisionFDA Consistency Challenge was an attempt to evaluate various analysis pipelines for concordance with a well-established reference data set (GIAB), as well as the reproducibilty of the results from each pipeline. In a minute we’ll see the reproducibility results.
Each step in this pipeline is the subject of many possible algorithms, and adds variability to the end results.
Navigating from ‘Raw Unmapped Reads to ‘Use in project’ is an exercise in routefinding.
There is a lot of detail here, but focus on the columns at the right: when running the exact same data through the exact same analysis pipeline twice in succession, only 8 of 18 (accepted) challenge entries produced the same results (Deterministic). This does not include variability introduced by different sequencing platforms, laboratory procedures, personnel, etc.
The last entry in the table represents a difference of about 2.6% of variants detected
A couple of examples from the literature of efforts to standardize parts of the analysis and interpretation process; in the first case regarding the interpretation of variants (rather than the detection of variants from raw, which is more of a technical issue)
There are also technologies that are being developed to help with challenges of reproducibilty and data provenance and integrity.
One such effort is the BioCompute Object or BCO, being developed by the FDA and academic and industry partners. BCO builds on other established standards such as the Common Workflow Language and container technologies, to define a single referenceable object that contains the data, processing pipeline and parameters, and results of any analysis. These objects can then be submitted to a regulator and/or databased to be searched and reused by others.
The Global Alliance for Genomics and Health (GA4GH) is focused on technologies to support a set of real-world Driver Projects including for example the Clinical Genome Resource (ClinGen), Genomics England, and the NCI Genomic Data Commons.
This is a (not exhaustive) list of some of the data and technology-related efforts that try to address issues of standardization, reproducibility, and sharing of genomic and biomedical data.
There is no shortage of efforts in this area, but that can make it difficult to know which one(s) to follow.
Clarify/recap definitions of CDx
Growth in use of biomarkers
Growth in use of CDx
Add in slide with example of complementary diagnostic