Copy-numberVariations inLymphoblas-  toid Cell    Lines    Fei YuMotivationA StrangeScenario                       Copy-nu...
Copy-numberVariations in                                                         MotivationLymphoblas-  toid Cell    Lines...
Motivation             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04      Motivation                      ...
Copy-numberVariations in                                                         MotivationLymphoblas-  toid Cell    Lines...
Motivation             Copy-number Variations in Lymphoblastoid Cell Lines          • Advancement in DNA sequencing techno...
Copy-numberVariations in                                                         MotivationLymphoblas-  toid Cell    Lines...
Motivation             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04      Motivation                      ...
Copy-numberVariations in                                                            MotivationLymphoblas-  toid Cell    Li...
Motivation             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04      Motivation                      ...
Copy-numberVariations in                                                             MotivationLymphoblas-  toid Cell    L...
Motivation             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04      Motivation                      ...
Copy-numberVariations inLymphoblas-  toid Cell    Lines              1 Motivation    Fei Yu                  A Strange Sce...
Copy-numberVariations in          Inference from Blood and Cell: A Strange ScenarioLymphoblas-  toid Cell    Lines    Fei ...
Inference from Blood and Cell: A Strange Scenario             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-0...
Copy-numberVariations in          Inference from Blood and Cell: A Strange ScenarioLymphoblas-  toid Cell    Lines    Fei ...
Inference from Blood and Cell: A Strange Scenario             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-0...
Copy-numberVariations in          Inference from Blood and Cell: A Strange ScenarioLymphoblas-  toid Cell    Lines    Fei ...
Inference from Blood and Cell: A Strange Scenario             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-0...
Copy-numberVariations in                     Detour: What is Copy-number Variation?Lymphoblas-  toid Cell    Lines    Fei ...
Detour: What is Copy-number Variation?             Copy-number Variations in Lymphoblastoid Cell Lines          Copy-numbe...
Copy-numberVariations in          Inference from Blood and Cell: A Strange ScenarioLymphoblas-  toid Cell    Lines    Fei ...
Inference from Blood and Cell: A Strange Scenario             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-0...
Copy-numberVariations in          Inference from Blood and Cell: A Strange ScenarioLymphoblas-  toid Cell    Lines    Fei ...
Inference from Blood and Cell: A Strange Scenario             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-0...
Copy-numberVariations in                                                                      GoalLymphoblas-  toid Cell  ...
Copy-numberVariations in                                                                     DataLymphoblas-  toid Cell   ...
Data             Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04      Data                                  ...
Copy-numberVariations in                                                                                 PipelineLymphobla...
Copy-numberVariations in                                                                                                  ...
Copy-numberVariations in                                                        Pipeline: NGSLymphoblas-  toid Cell    Lin...
Copy-numberVariations in                                                  Pipeline: BAM filesLymphoblas-  toid Cell    Line...
Copy-numberVariations in                                                  Pipeline: BAM filesLymphoblas-  toid Cell    Line...
Copy-numberVariations in          Pipeline: BAM filesLymphoblas-  toid Cell    Lines    Fei YuMotivationA StrangeScenarioDa...
Copy-numberVariations in                                                                           Pipeline: GATK, Samtool...
Copy-numberVariations in                                                                           Pipeline: GATK, Samtool...
Copy-numberVariations in                                                              Pipeline: GATK, SamtoolsLymphoblas- ...
Copy-numberVariations in                                                                                                  ...
Copy-numberVariations inLymphoblas-  toid Cell    Lines              1 Motivation    Fei Yu                  A Strange Sce...
Copy-numberVariations in                                                                 NotationsLymphoblas-  toid Cell  ...
Copy-numberVariations in                                                                 NotationsLymphoblas-  toid Cell  ...
Copy-numberVariations in                                                                 NotationsLymphoblas-  toid Cell  ...
Copy-numberVariations in                                          Distribution of (GB , GC ) ILymphoblas-  toid Cell    Li...
Copy-numberVariations in                                         Distribution of (GB , GC ) IILymphoblas-  toid Cell    Li...
Copy-numberVariations inLymphoblas-            Probability of observing (GB = 1, GC = 0) in each of the four  toid Cell   ...
Copy-number Variations in Lymphoblastoid Cell Lines        Probability of observing (GB = 1, GC = 0) in each of the four  ...
Copy-numberVariations in                                                       More on GATKLymphoblas-  toid Cell    Lines...
Copy-numberVariations in                                                       More on GATKLymphoblas-  toid Cell    Lines...
Copy-numberVariations in                                                                FilteringLymphoblas-  toid Cell   ...
Copy-numberVariations in                                                        Filtering: Step ILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                        Filtering: Step ILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                        Filtering: Step ILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                        Filtering: Step ILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                        Filtering: Step ILymphoblas-  toid Cell   ...
Copy-numberVariations inLymphoblas-  toid Cell            Figure: KS-tests for runs of 1s against the gamma distribution. ...
Copy-numberVariations in                                                       Filtering: Step IILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                       Filtering: Step IILymphoblas-  toid Cell   ...
Copy-numberVariations inLymphoblas-  toid Cell                       An important covariate for VQSR is strand bias.    Li...
Copy-numberVariations in                                         Quantifying Strand BiasLymphoblas-  toid Cell    Lines   ...
Copy-numberVariations in                                                       Filtering: Step IILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                       Filtering: Step IILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                      Filtering: Step IIILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                      Filtering: Step IIILymphoblas-  toid Cell   ...
Copy-numberVariations in                                            Features of the DataLymphoblas-  toid Cell    Lines   ...
Copy-numberVariations in                                            Features of the DataLymphoblas-  toid Cell    Lines   ...
Copy-numberVariations in                                                    Sequencing QualityLymphoblas-  toid Cell    Li...
Copy-numberVariations in                                                    Sequencing QualityLymphoblas-  toid Cell    Li...
Copy-numberVariations in                                                                                              Logi...
Copy-numberVariations in                                                      Filtering: Step IIILymphoblas-  toid Cell   ...
Copy-numberVariations in                                                   Filtering: Step IVLymphoblas-  toid Cell    Lin...
Copy-numberVariations in                                                   Filtering: Step IVLymphoblas-  toid Cell    Lin...
Copy-numberVariations in                                                                       Filtering: Step IVLymphobla...
Copy-numberVariations in                                                      Filtering: Step IVLymphoblas-  toid Cell    ...
Copy-numberVariations in                                                                        (Future Work) Filtering: S...
Copy-numberVariations in                                                                 ResultsLymphoblas-  toid Cell    ...
Copy-numberVariations in                                                                 ResultsLymphoblas-  toid Cell    ...
Copy-numberVariations in                                                           ConclusionsLymphoblas-  toid Cell    Li...
Upcoming SlideShare
Loading in...5
×

09 apr2012 presentation

249

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
249
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

09 apr2012 presentation

  1. 1. Copy-numberVariations inLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenario Copy-number Variations in LymphoblastoidDataPipeline Cell LinesHow to detectCNVsFiltering:Filtering: Step I Step II Fei YuFiltering: StepIIIFiltering: Step Carnegie Mellon UniversityIVResultsConclusions April 4, 2012 Advisors: Bernie Devlin, Kathryn Roeder, Chad Schafer
  2. 2. Copy-numberVariations in MotivationLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline • Advancement in DNA sequencing technology and rareHow to detect genetic diseases such as autism.CNVsFiltering: Step I • Data collection rush. 100,000 samples in 15 years.Filtering: Step IIFiltering: StepIII • Money. Time. Logistics.Filtering: StepIVResultsConclusions
  3. 3. Motivation Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation • Advancement in DNA sequencing technology and rare genetic diseases such as autism. • Data collection rush. 100,000 samples in 15 years. • Money. Time. Logistics. Motivation A decade ago, people had few successes in finding genetic variants that cause rare diseases. One of the challenges was that they could only afford to look at small regions of the genome that they thought are linked to the disease. Today, as DNA sequencing technology develops, cheap and fast whole genome sequencing becomes available. Now, people can look at all the genes.
  4. 4. Copy-numberVariations in MotivationLymphoblas- toid Cell Lines Fei Yu • Advancement in DNA sequencing technology and rare genetic diseases such as autism.MotivationA StrangeScenario • Data collection rush. 100,000 samples in 15 years.Data • Money. Time. Logistics.PipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions
  5. 5. Motivation Copy-number Variations in Lymphoblastoid Cell Lines • Advancement in DNA sequencing technology and rare2012-04-04 genetic diseases such as autism. Motivation • Data collection rush. 100,000 samples in 15 years. • Money. Time. Logistics. Motivation The graph shows the cost of sequencing a genome over the past decade. In 2001, the cost was 100 Million, which is just prohibitively high. Today, a company called Illumina offers the service at $5000 per genome. They even give you a 20 % discount when you place an order of 50 genomes or more. The drastic drop in cost triggered a rush to collect as many DNA samples as possible. It is projected that in 15 years, we will have over 100,000 samples.
  6. 6. Copy-numberVariations in MotivationLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline • Advancement in DNA sequencing technology and rareHow to detect genetic diseases such as autism.CNVsFiltering: Step I • Data collection rush. 100,000 samples in 15 years.Filtering: Step IIFiltering: StepIII • Money. Time. Logistics.Filtering: StepIVResultsConclusions
  7. 7. Motivation Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation • Advancement in DNA sequencing technology and rare genetic diseases such as autism. • Data collection rush. 100,000 samples in 15 years. • Money. Time. Logistics. Motivation Despite the relatively low cost per genome, it still costs hundreds of millions to gather so many samples. Also, building infrastructures to store, maintain and distribute the data can cost as much money as that spent on sequencing. Furthermore, because these experiments involve human subjects, the researchers will also have to deal with obtaining permissions from the patients and safeguarding their privacy. All in all, it is a huge investment of our society’s resources.
  8. 8. Copy-numberVariations in MotivationLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenario But there is one problem: most DNA sequencing projects useData lymphoblastoid cell line instead of peripheral blood.PipelineHow to detect Cell line - Immortal(!)CNVsFiltering: Step I - Cultivated from peripheral bloodFiltering: Step IIFiltering: StepIIIFiltering:IV Step Blood - Obtained from peripheral blood cellsResults consisting of red blood cells, white bloodConclusions cells, and platelet - Best source of the DNA - Mortal
  9. 9. Motivation Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation But there is one problem: most DNA sequencing projects use lymphoblastoid cell line instead of peripheral blood. Cell line - Immortal(!) - Cultivated from peripheral blood Blood - Obtained from peripheral blood cells Motivation consisting of red blood cells, white blood cells, and platelet - Best source of the DNA - Mortal But there is one problem: most DNA sequencing projects use lymphoblastoid cell line instead of peripheral blood. Cell lines are immortal, so they are suitable for permanent storage. But they are products of peripheral blood cultivation. Blood data are obtained directly from peripheral blood cells consisting of red blood cells, white blood cells, and platelet. They are the best source of the DNA. However, because they are mortal, it is not practical to store them and use them in a later time. That’s why people use cell lines for sequencing.
  10. 10. Copy-numberVariations in MotivationLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipelineHow to detectCNVs Are cell line data truthful representationsFiltering:Filtering:Filtering: Step I Step II Step of the DNA?IIIFiltering: StepIVResultsConclusions In other words, how close are cell line data to blood data?
  11. 11. Motivation Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation Are cell line data truthful representations of the DNA? Motivation In other words, how close are cell line data to blood data? Our concern is whether cell line data are truthful representations of the DNA. In other words, we want to know how close cell line data are to blood data. If the cell lines are corrupted, any subsequent analyses will lose their bases, and all the time, money, and efforts invested on collecting these DNA samples would have gone to waste.
  12. 12. Copy-numberVariations inLymphoblas- toid Cell Lines 1 Motivation Fei Yu A Strange ScenarioMotivationA StrangeScenario 2 DataDataPipeline PipelineHow to detectCNVsFiltering: Step I 3 How to detect CNVsFiltering: Step IIFiltering: Step Filtering: Step IIIIFiltering:IV Step Filtering: Step IIResults Filtering: Step IIIConclusions Filtering: Step IV Results 4 Conclusions
  13. 13. Copy-numberVariations in Inference from Blood and Cell: A Strange ScenarioLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenario For a diploid organism (human):DataPipelineHow to detect Chromosome p1CNVsFiltering: Step I A BFiltering: Step IIFiltering: Step A AA ABIII Chromosome p2Filtering:IV Step B BA BBResultsConclusions Homozygous if AA or BB. Heterozygous if AB or BA.
  14. 14. Inference from Blood and Cell: A Strange Scenario Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation For a diploid organism (human): Chromosome p1 A Strange Scenario Chromosome p2 A A AA B AB B BA BB Inference from Blood and Cell: A Strange Scenario Homozygous if AA or BB. Heterozygous if AB or BA. For diploid organisms such at humans, chromosomes come in pairs. Each chromosome contains one copy of a gene. An allele is one of two or more forms of a gene. If both alleles on a pair of chromosomes are the same, we call the genetic locus homozygous; if the alleles are different, we call the genetic locus heterozygous.
  15. 15. Copy-numberVariations in Inference from Blood and Cell: A Strange ScenarioLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline 1 = HeterozygousHow to detect 0 = HomozygousCNVsFiltering: Step I 1Filtering: Step IIFiltering: Step BloodIIIFiltering: Step Locations ...... 150 ......IV CellResultsConclusions 0
  16. 16. Inference from Blood and Cell: A Strange Scenario Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation 1 = Heterozygous 0 = Homozygous A Strange Scenario Blood 1 Locations ...... 150 ...... Cell Inference from Blood and Cell: A Strange Scenario 0 Denote a heterozygous locus by 1 and a homozygous locus by 0. The picture shows that at location 150, blood is heterozygous and cell line is homozygous.
  17. 17. Copy-numberVariations in Inference from Blood and Cell: A Strange ScenarioLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions
  18. 18. Inference from Blood and Cell: A Strange Scenario Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Motivation A Strange Scenario Inference from Blood and Cell: A Strange Scenario If we only look at loci at which blood is heterozygous, we may encounter a situation depicted by this picture. There are consecutive homozygous loci in the cell line but they are heterozygous in the blood. This looks suspicious.
  19. 19. Copy-numberVariations in Detour: What is Copy-number Variation?Lymphoblas- toid Cell Lines Fei Yu Copy-number variations (CNVs) correspond to relatively largeMotivation regions of the genome that have been deleted on aA StrangeScenario chromosome.DataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions
  20. 20. Detour: What is Copy-number Variation? Copy-number Variations in Lymphoblastoid Cell Lines Copy-number variations (CNVs) correspond to relatively large2012-04-04 regions of the genome that have been deleted on a Motivation chromosome. A Strange Scenario Detour: What is Copy-number Variation? Now we take a detour and define copy-number variation. Copy-number variations (CNVs) correspond to relatively large regions of the genome that have been deleted on a chromosome. This picture shows the black region is deleted from the chromosome.
  21. 21. Copy-numberVariations in Inference from Blood and Cell: A Strange ScenarioLymphoblas- toid Cell Lines Fei YuMotivation What a CNV in cell line looks like:A Strange Blood CellScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions
  22. 22. Inference from Blood and Cell: A Strange Scenario Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 What a CNV in cell line looks like: Motivation Blood Cell A Strange Scenario Inference from Blood and Cell: A Strange Scenario In this picture, the blood, which can be thought of as a representation of the DNA, is heterozygous. On the other hand, the cell line has the red region deleted. When we sequence the samples, we look at both chromosomes. But in this case, because the red region in the cell line is deleted, we can only sequence the remaining chromosome. As a result of the deletion, the cell line will always tell us this genetic locus is homozygous even though the DNA is heterozygous.
  23. 23. Copy-numberVariations in Inference from Blood and Cell: A Strange ScenarioLymphoblas- toid Cell Lines Fei YuMotivation This could be a CNV!A StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions
  24. 24. Inference from Blood and Cell: A Strange Scenario Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 This could be a CNV! Motivation A Strange Scenario Inference from Blood and Cell: A Strange Scenario Let’s go back to this picture. This scenario fits the profile of a CNV. If this indeed happens in the cell line, we know the cell line is corrupted at that region.
  25. 25. Copy-numberVariations in GoalLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipelineHow to detectCNVs Having CNVs in the cell line means the cell line is locallyFiltering: Step I corrupted. The goal of this project is to use the amount ofFiltering: Step IIFiltering:III Step CNVs to quantify how reliable the cell line is as a source ofFiltering:IV Step DNA.ResultsConclusions
  26. 26. Copy-numberVariations in DataLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioData The data we have:PipelineHow to detect • 16 individuals’ entire exomes sequenced by next-generationCNVsFiltering: Step I sequencing (NGS) technology.Filtering: Step IIFiltering:III Step • Each individual is sequenced twice: once using bloodFiltering:IV Step samples and the other time using cell line samples.ResultsConclusions
  27. 27. Data Copy-number Variations in Lymphoblastoid Cell Lines2012-04-04 Data The data we have: • 16 individuals’ entire exomes sequenced by next-generation sequencing (NGS) technology. • Each individual is sequenced twice: once using blood Data samples and the other time using cell line samples. The data we have allow us to compare cell line data and blood data and answer of the questions of whether they are the same.
  28. 28. Copy-numberVariations in PipelineLymphoblas- toid Cell Lines NGS Fei Yu blood and cell line BAM filesMotivation samplesA StrangeScenarioData GATK SamtoolsPipelineHow to detectCNVsFiltering: Step IFiltering: Step II VCF files additional locus-specificFiltering: Step informationIIIFiltering: StepIVResultsConclusions Python scripts Data ready for analysis
  29. 29. Copy-numberVariations in Pipeline: NGSLymphoblas- toid Cell 3/28/12 pipeline1.svg Lines GATK VCF files Fei Yu NGS blood and cell line BAM files Python scripts Data ready for analysis samplesMotivation additional locus-specificA Strange Samtools informationScenarioData 3/28/12 ngs_demo_short.svgPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions file://localhost/Users/feiyu/Dropbox/University_Files/ADA/Presentation/2012/graphs/pipeline1.svg 1/1
  30. 30. Copy-numberVariations in Pipeline: NGSLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenario Next-generation sequencing (NGS) technologyDataPipelineHow to detect Advantages:CNVsFiltering: Step I • FastFiltering: Step IIFiltering:III Step • Cost-effectiveFiltering: StepIVResultsConclusions Disadvantages: • Short DNA reads fragments are randomly located =⇒ great challenge for fragment assembly and mapping
  31. 31. Copy-numberVariations in Pipeline: BAM filesLymphoblas- toid Cell Lines Fei YuMotivation Our raw data are BAM files. Their sizes are huge:A StrangeScenario • encode the whole genome’s nucleotide alignmentsDataPipeline • also encode quality of each read for a given locus (a locusHow to detectCNVs can be covered by as many as 1000 reads)Filtering: Step IFiltering: Step IIFiltering:III Step Mt. Sinai VanderbiltFiltering: StepIV # of subjects 7 12Results # of subjects thatConclusions 1 2 have corrupted data Average file size 7.4 GiB 17 GiB Total size ≈ 85 GiB ≈ 340 GiB
  32. 32. Copy-numberVariations in Pipeline: BAM filesLymphoblas- toid Cell Lines Fei YuMotivation Our raw data are BAM files. Their sizes are huge:A StrangeScenario • encode the whole genome’s nucleotide alignmentsDataPipeline • also encode quality of each read for a given locus (a locusHow to detectCNVs can be covered by as many as 1000 reads)Filtering: Step IFiltering: Step IIFiltering:III Step Mt. Sinai VanderbiltFiltering: StepIV # of subjects 7 12Results # of subjects thatConclusions 1 2 have corrupted data Average file size 7.4 GiB 17 GiB Total size ≈ 85 GiB ≈ 340 GiB
  33. 33. Copy-numberVariations in Pipeline: BAM filesLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions
  34. 34. Copy-numberVariations in Pipeline: GATK, SamtoolsLymphoblas- toid Cell Lines Fei YuMotivationA Strange 3/28/12 pipeline2.svgScenario GATK VCF filesData NGSPipeline blood and cell line BAM files Python scripts Data ready for analysis samplesHow to detect additional locus-specificCNVs Samtools informationFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering:IV Step • Genome Analysis Toolkit (GATK):Results - make inference from the BAM files and determine whetherConclusions a locus is homozygous or heterozygous. - apply different filters to obtain desired results. • Samtools: extract read-level information such as sequencing quality, alignment quality, read direction.
  35. 35. Copy-numberVariations in Pipeline: GATK, SamtoolsLymphoblas- toid Cell Lines Fei YuMotivationA Strange 3/28/12 pipeline2.svgScenario GATK VCF filesData NGSPipeline blood and cell line BAM files Python scripts Data ready for analysis samplesHow to detect additional locus-specificCNVs Samtools informationFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering:IV Step • Genome Analysis Toolkit (GATK):Results - make inference from the BAM files and determine whetherConclusions a locus is homozygous or heterozygous. - apply different filters to obtain desired results. • Samtools: extract read-level information such as sequencing quality, alignment quality, read direction.
  36. 36. Copy-numberVariations in Pipeline: GATK, SamtoolsLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline Processing time: ∼1 day.How to detectCNVs GATK outputs:Filtering: Step IFiltering: Step II [HEADER LINES]Filtering: StepIII #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA12878Filtering: Step chr1 873762 . T G 5231.78 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:173,141:282:99IV chr1 877664 rs3828047 A G 3931.66 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 1/1:0,105:94:99:25Results chr1 899282 rs28548431 C T 71.77 PASS [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:1,3:4:25.92:10 chr1 974165 rs9442391 T C 29.84 LowQual [ANNOTATIONS] GT:AD:DP:GQ:PL 0/1:14,4:14:60.91:Conclusions
  37. 37. Copy-numberVariations in Pipeline: tidy upLymphoblas- toid Cell Lines Fei Yu 3/28/12 pipeline3.svgMotivation GATK VCF filesA Strange NGSScenario blood and cell line BAM files Python scripts Data ready for analysis samplesData additional locus-specificPipeline Samtools informationHow to detectCNVsFiltering: Step IFiltering: Step II Python scripts:Filtering: StepIIIFiltering: Step • extract useful information from GATK and Samtools’IVResults outputsConclusions • prepare data for analysis in R
  38. 38. Copy-numberVariations inLymphoblas- toid Cell Lines 1 Motivation Fei Yu A Strange ScenarioMotivationA StrangeScenario 2 DataDataPipeline PipelineHow to detectCNVsFiltering: Step I 3 How to detect CNVsFiltering: Step IIFiltering: Step Filtering: Step IIIIFiltering:IV Step Filtering: Step IIResults Filtering: Step IIIConclusions Filtering: Step IV Results 4 Conclusions
  39. 39. Copy-numberVariations in NotationsLymphoblas- toid Cell Lines Fei YuMotivation Let T denote the zygosity of a genetic locusA StrangeScenario 1 if the locus is heterozygousData T =Pipeline 0 if the locus is homozygousHow to detectCNVs Let G denote the zygosity called by GATK.Filtering: Step IFiltering: Step IIFiltering:III Step 1 if the call is heterozygousFiltering: Step G=IV 0 if the call is homozygousResultsConclusions Let f+ = P(G = 1 | T = 0) [false positive] f− = P(G = 0 | T = 1) [false negative]
  40. 40. Copy-numberVariations in NotationsLymphoblas- toid Cell Lines Fei YuMotivation Let T denote the zygosity of a genetic locusA StrangeScenario 1 if the locus is heterozygousData T =Pipeline 0 if the locus is homozygousHow to detectCNVs Let G denote the zygosity called by GATK.Filtering: Step IFiltering: Step IIFiltering:III Step 1 if the call is heterozygousFiltering: Step G=IV 0 if the call is homozygousResultsConclusions Let f+ = P(G = 1 | T = 0) [false positive] f− = P(G = 0 | T = 1) [false negative]
  41. 41. Copy-numberVariations in NotationsLymphoblas- toid Cell Lines Fei YuMotivation Let T denote the zygosity of a genetic locusA StrangeScenario 1 if the locus is heterozygousData T =Pipeline 0 if the locus is homozygousHow to detectCNVs Let G denote the zygosity called by GATK.Filtering: Step IFiltering: Step IIFiltering:III Step 1 if the call is heterozygousFiltering: Step G=IV 0 if the call is homozygousResultsConclusions Let f+ = P(G = 1 | T = 0) [false positive] f− = P(G = 0 | T = 1) [false negative]
  42. 42. Copy-numberVariations in Distribution of (GB , GC ) ILymphoblas- toid Cell Lines Fei Yu We can describe the distribution of the observations (GB , GC )Motivation in four cases:A StrangeScenarioDataPipeline (I) TB = TC = 0How to detectCNVs Cell callFiltering:Filtering: Step I Step II 0 1Filtering:III Step 0 (1 − f+ )2 (1 − f+ )f+Filtering: Step Blood call 2IV 1 f+ (1 − f+ ) f+ResultsConclusions (II) TB = 0, TC = 1 (i.e., a mutation) Cell call 0 1 0 (1 − f+ )f− 2 (1 − f+ )(1 − f− ) Blood call 1 f+ f− f+ (1 − f− )
  43. 43. Copy-numberVariations in Distribution of (GB , GC ) IILymphoblas- toid Cell Lines Fei YuMotivation (III) TB = 1, TC = 0 (i.e., a deletion)A StrangeScenario Cell callDataPipeline 0 1How to detect 0 f− (1 − f+ ) f− f+CNVs Blood callFiltering: Step I 1 (1 − f− )(1 − f+ ) (1 − f− )f+Filtering: Step IIFiltering: StepIIIFiltering: Step (IV) TB = TC = 1 (i.e., not a deletion)IVResults Cell callConclusions 0 1 0 f−2 f− (1 − f− ) Blood call 1 (1 − f− )f− (1 − f− )2
  44. 44. Copy-numberVariations inLymphoblas- Probability of observing (GB = 1, GC = 0) in each of the four toid Cell Lines possible cases. Fei YuMotivationA StrangeScenarioDataPipeline TB=0 TB=1How to detectCNVsFiltering: Step IFiltering: Step II TC=0 TC=1 Deletion (TC=0) No deletion (TC=1)Filtering: StepIIIFiltering: StepIVResultsConclusions Case I Case II Case III Case IV
  45. 45. Copy-number Variations in Lymphoblastoid Cell Lines Probability of observing (GB = 1, GC = 0) in each of the four possible cases.2012-04-04 How to detect CNVs TB=0 TB=1 TC=0 TC=1 Deletion (TC=0) No deletion (TC=1) Case I Case II Case III Case IV Let’s focus on the (GB = 1, GC = 0) observations and find out which observations indeed come from CNVs.
  46. 46. Copy-numberVariations in More on GATKLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioData GATK takes into account the number of each type ofPipeline nucleotide acid, read quality, and mapping quality of a geneticHow to detectCNVs locus to make inference on its true .Filtering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResults But the inference is not always accurate. Luckily, we canConclusions control how GATK makes mistakes, which I will explain in a moment
  47. 47. Copy-numberVariations in More on GATKLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioData GATK takes into account the number of each type ofPipeline nucleotide acid, read quality, and mapping quality of a geneticHow to detectCNVs locus to make inference on its true .Filtering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResults But the inference is not always accurate. Luckily, we canConclusions control how GATK makes mistakes, which I will explain in a moment
  48. 48. Copy-numberVariations in FilteringLymphoblas- toid Cell Lines TB=0, TC=0 TB=0, TC=1 TB=1, TC=0 TB=1, TC=1 Fei Yu Case I Case II Case III Case IVMotivationA StrangeScenarioData Outline:Pipeline 1 Use GATK to minimize Case II and Case IV by controllingHow to detectCNVs threshold parameters that reduce f− at the expense ofFiltering:Filtering: Step I Step II allowing a larger f+ .Filtering:III Step 2 Filter the variants called in the previous step and eliminateFiltering: StepIV calls with lower quality metrics. By reducing f+ , we canResultsConclusions eliminate many variants in Case I. 3 Use hypothesis tests to pick out Case III candidate loci. 4 Fit the candidate loci to a hidden Markov model to pick out the most likely candidate loci.
  49. 49. Copy-numberVariations in Filtering: Step ILymphoblas- toid Cell Lines TB=0, TC=0 TB=0, TC=1 TB=1, TC=0 TB=1, TC=1 Fei Yu Case I Case II Case III Case IVMotivationA StrangeScenarioData • Run GATK with low threshold parameters to obtain aPipelineHow to detect crude set of loci.CNVs • Effects: f− ≈ 0, increase f+ .Filtering: Step IFiltering: Step II • f− ≈ 0 =⇒ minimize Case II and Case IV.Filtering: StepIIIFiltering: Step • f+ is bounded above by a small number:IVResults ˆ #(1, 0) + #(0, 1)Conclusions f+ = ≈ 0.05 #(1, 0) + #(0, 1) + #(GB = 0, GC = 0) • Minimize Case II and Case IV. Retain Case I and Case III. Number of loci retained = 15,971.
  50. 50. Copy-numberVariations in Filtering: Step ILymphoblas- toid Cell Lines TB=0, TC=0 TB=0, TC=1 TB=1, TC=0 TB=1, TC=1 Fei Yu Case I Case II Case III Case IVMotivationA StrangeScenarioData • Run GATK with low threshold parameters to obtain aPipelineHow to detect crude set of loci.CNVs • Effects: f− ≈ 0, increase f+ .Filtering: Step IFiltering: Step II • f− ≈ 0 =⇒ minimize Case II and Case IV.Filtering: StepIIIFiltering: Step • f+ is bounded above by a small number:IVResults ˆ #(1, 0) + #(0, 1)Conclusions f+ = ≈ 0.05 #(1, 0) + #(0, 1) + #(GB = 0, GC = 0) • Minimize Case II and Case IV. Retain Case I and Case III. Number of loci retained = 15,971.
  51. 51. Copy-numberVariations in Filtering: Step ILymphoblas- toid Cell Lines TB=0, TC=0 TB=0, TC=1 TB=1, TC=0 TB=1, TC=1 Fei Yu Case I Case II Case III Case IVMotivationA StrangeScenarioData • Run GATK with low threshold parameters to obtain aPipelineHow to detect crude set of loci.CNVs • Effects: f− ≈ 0, increase f+ .Filtering: Step IFiltering: Step II • f− ≈ 0 =⇒ minimize Case II and Case IV.Filtering: StepIIIFiltering: Step • f+ is bounded above by a small number:IVResults ˆ #(1, 0) + #(0, 1)Conclusions f+ = ≈ 0.05 #(1, 0) + #(0, 1) + #(GB = 0, GC = 0) • Minimize Case II and Case IV. Retain Case I and Case III. Number of loci retained = 15,971.
  52. 52. Copy-numberVariations in Filtering: Step ILymphoblas- toid Cell Lines TB=0, TC=0 TB=0, TC=1 TB=1, TC=0 TB=1, TC=1 Fei Yu Case I Case II Case III Case IVMotivationA StrangeScenarioData • Run GATK with low threshold parameters to obtain aPipelineHow to detect crude set of loci.CNVs • Effects: f− ≈ 0, increase f+ .Filtering: Step IFiltering: Step II • f− ≈ 0 =⇒ minimize Case II and Case IV.Filtering: StepIIIFiltering: Step • f+ is bounded above by a small number:IVResults ˆ #(1, 0) + #(0, 1)Conclusions f+ = ≈ 0.05 #(1, 0) + #(0, 1) + #(GB = 0, GC = 0) • Minimize Case II and Case IV. Retain Case I and Case III. Number of loci retained = 15,971.
  53. 53. Copy-numberVariations in Filtering: Step ILymphoblas- toid Cell Lines TB=0, TC=0 TB=0, TC=1 TB=1, TC=0 TB=1, TC=1 Fei Yu Case I Case II Case III Case IVMotivationA StrangeScenarioData • Run GATK with low threshold parameters to obtain aPipelineHow to detect crude set of loci.CNVs • Effects: f− ≈ 0, increase f+ .Filtering: Step IFiltering: Step II • f− ≈ 0 =⇒ minimize Case II and Case IV.Filtering: StepIIIFiltering: Step • f+ is bounded above by a small number:IVResults ˆ #(1, 0) + #(0, 1)Conclusions f+ = ≈ 0.05 #(1, 0) + #(0, 1) + #(GB = 0, GC = 0) • Minimize Case II and Case IV. Retain Case I and Case III. Number of loci retained = 15,971.
  54. 54. Copy-numberVariations inLymphoblas- toid Cell Figure: KS-tests for runs of 1s against the gamma distribution. Shape and scale Lines parameters for gamma are estimated for each chromosome and for each Fei Yu individual. Those cells with less than 20 runs are indicated by “-”. Cells with p-value > 0.05 are colored grey.MotivationA StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions • Runs of 1s are interrupted randomly by short runs of 0s. • Many of the 0 calls are just random noise.
  55. 55. Copy-numberVariations in Filtering: Step IILymphoblas- toid Cell Lines Fei Yu TB=0, TC=0 TB=1, TC=0Motivation Case I Case IIIA StrangeScenarioDataPipelineHow to detectCNVs • Run GATK’s Variant Quality Score Recalibration (VQSR)Filtering:Filtering: Step I Step II to filter out the false positive calls (loci in Case I).Filtering: StepIII • VQSR: fit a Gaussian Mixture Model to known variantsFiltering: StepIVResults and novel variants; filter based on the score of the variants.Conclusions • Effect: decrease f+ . • Eliminate most of Case I. Retain Case III. Number of loci retained = 380.
  56. 56. Copy-numberVariations in Filtering: Step IILymphoblas- toid Cell Lines Fei Yu TB=0, TC=0 TB=1, TC=0Motivation Case I Case IIIA StrangeScenarioDataPipelineHow to detectCNVs • Run GATK’s Variant Quality Score Recalibration (VQSR)Filtering:Filtering: Step I Step II to filter out the false positive calls (loci in Case I).Filtering: StepIII • VQSR: fit a Gaussian Mixture Model to known variantsFiltering: StepIVResults and novel variants; filter based on the score of the variants.Conclusions • Effect: decrease f+ . • Eliminate most of Case I. Retain Case III. Number of loci retained = 380.
  57. 57. Copy-numberVariations inLymphoblas- toid Cell An important covariate for VQSR is strand bias. Lines Fei Yu DNA’s double helix structure: forward and backward strandsMotivationA StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions Definition Strand bias is the tendency of making more variant calls on one direction than the other.
  58. 58. Copy-numberVariations in Quantifying Strand BiasLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioData n1· n2· n··Pipeline • Fisher’s exact test: p = /How to detect n11 n21 n·1CNVsFiltering: Step IFiltering: Step II Forward BackwardFiltering: StepIII Reference n11 n12 n1·Filtering: StepIVResults Alternative n21 n22 n2·Conclusions n·1 n·2 n··
  59. 59. Copy-numberVariations in Filtering: Step IILymphoblas- toid Cell Lines Fei Yu TB=0, TC=0 TB=1, TC=0Motivation Case I Case IIIA StrangeScenarioDataPipelineHow to detectCNVs • Run GATK’s Variant Quality Score Recalibration (VQSR)Filtering:Filtering: Step I Step II to filter out the false positive calls (loci in Case I).Filtering: StepIII • VQSR: fit a Gaussian Mixture Model to known variantsFiltering: StepIVResults and novel variants; filter based on the score of the variants.Conclusions • Effect: decrease f+ . • Eliminate most of Case I. Retain Case III. Number of loci retained = 380.
  60. 60. Copy-numberVariations in Filtering: Step IILymphoblas- toid Cell Lines Fei Yu TB=0, TC=0 TB=1, TC=0Motivation Case I Case IIIA StrangeScenarioDataPipelineHow to detectCNVs • Run GATK’s Variant Quality Score Recalibration (VQSR)Filtering:Filtering: Step I Step II to filter out the false positive calls (loci in Case I).Filtering: StepIII • VQSR: fit a Gaussian Mixture Model to known variantsFiltering: StepIVResults and novel variants; filter based on the score of the variants.Conclusions • Effect: decrease f+ . • Eliminate most of Case I. Retain Case III. Number of loci retained = 380.
  61. 61. Copy-numberVariations in Filtering: Step IIILymphoblas- toid Cell Lines TB=0, TC=0 TB=1, TC=0 Fei Yu Case I Case IIIMotivationA StrangeScenarioData • For each locus, do hypothesis test:PipelineHow to detectCNVs H0 : TB = TC H1 : TB = TCFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: Step • Logistic regression:IVResults IG =1 ∼ Iisblood + strand directionConclusions + base quality + mapping direction • Find loci for which Iisblood is significant at 10%-level. Number of Case III candidates = 126.
  62. 62. Copy-numberVariations in Filtering: Step IIILymphoblas- toid Cell Lines TB=0, TC=0 TB=1, TC=0 Fei Yu Case I Case IIIMotivationA StrangeScenarioData • For each locus, do hypothesis test:PipelineHow to detectCNVs H0 : TB = TC H1 : TB = TCFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: Step • Logistic regression:IVResults IG =1 ∼ Iisblood + strand directionConclusions + base quality + mapping direction • Find loci for which Iisblood is significant at 10%-level. Number of Case III candidates = 126.
  63. 63. Copy-numberVariations in Features of the DataLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipelineHow to detectCNVs • from blood or cell lineFiltering: Step IFiltering: Step II • strand direction (forward or backward)Filtering: StepIIIFiltering: Step • sequencing qualityIVResultsConclusions
  64. 64. Copy-numberVariations in Features of the DataLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipelineHow to detectCNVs • from blood or cell lineFiltering: Step IFiltering: Step II • strand direction (forward or backward)Filtering: StepIIIFiltering: Step • sequencing qualityIVResultsConclusions
  65. 65. Copy-numberVariations in Sequencing QualityLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioData Quality is inversely related to P( error ).Pipeline • base quality: quality of a read at a genetic locus;How to detectCNVs determined by the sequencing equipment.Filtering: Step IFiltering: Step II • mapping quality: alignment quality of a read; calculatedFiltering: StepIIIFiltering: Step from base qualities and the reference sequenceIVResults base quality + mapping quality =⇒ genotypeConclusions likelihood—likelihood of a locus being homozygous or heterozygous.
  66. 66. Copy-numberVariations in Sequencing QualityLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioData Quality is inversely related to P( error ).Pipeline • base quality: quality of a read at a genetic locus;How to detectCNVs determined by the sequencing equipment.Filtering: Step IFiltering: Step II • mapping quality: alignment quality of a read; calculatedFiltering: StepIIIFiltering: Step from base qualities and the reference sequenceIVResults base quality + mapping quality =⇒ genotypeConclusions likelihood—likelihood of a locus being homozygous or heterozygous.
  67. 67. Copy-numberVariations in Logistic RegressionLymphoblas- toid Cell Lines Fei Yu IG =1 ∼ Iisblood + strand directionMotivation + base quality + mapping directionA StrangeScenario • Each locus is fit to a logistic regression model.DataPipeline • Perform the deviance χ2 goodness-of-fit test for eachHow to detect model and we see only 2.4% of the tests are significant atCNVsFiltering: Step I 5%-level.Filtering: Step IIFiltering: StepIII Histogram of p−values from the Chi^2 tests of the residual devianceFiltering: Step 600IVResults 500Conclusions 400 Frequency 300 200 100 0 0.0 0.2 0.4 0.6 0.8 1.0 p−values
  68. 68. Copy-numberVariations in Filtering: Step IIILymphoblas- toid Cell Lines TB=0, TC=0 TB=1, TC=0 Fei Yu Case I Case IIIMotivationA StrangeScenarioData • For each locus, do hypothesis test:PipelineHow to detectCNVs H0 : TB = TC H1 : TB = TCFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: Step • Logistic regression:IVResults IG =1 ∼ Iisblood + strand directionConclusions + base quality + mapping direction • Find loci for which Iisblood is significant at 10%-level. Number of Case III candidates = 126.
  69. 69. Copy-numberVariations in Filtering: Step IVLymphoblas- toid Cell Lines Fei Yu Did the Case III candidates come from CNVs?MotivationA StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions Define the length of a run of 0s as the number of consecutive (GB , GC ) = (1, 0) calls.
  70. 70. Copy-numberVariations in Filtering: Step IVLymphoblas- toid Cell Lines Fei Yu Did the Case III candidates come from CNVs?MotivationA StrangeScenarioDataPipelineHow to detectCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResultsConclusions Define the length of a run of 0s as the number of consecutive (GB , GC ) = (1, 0) calls.
  71. 71. Copy-numberVariations in Filtering: Step IVLymphoblas- toid Cell Lines Fei Yu Density estimate of the lengths of runs of (G_B, G_C)=(1,0) callsMotivation 2.5A StrangeScenarioData 2.0PipelineHow to detectCNVsFiltering: Step I 1.5Filtering: Step II DensityFiltering: StepIIIFiltering: StepIV 1.0Results > 95% quantileConclusions 0.5 0.0 2 4 6 8 10 12 14 N = 3286 Bandwidth = 0.127
  72. 72. Copy-numberVariations in Filtering: Step IVLymphoblas- toid Cell Lines Fei Yu 10 loci come from runs of 0s of length at least 3:MotivationA StrangeScenario 1101111111|000|1111111111DataPipeline 1010111011|000|1111111111How to detect 1111111011|000|1111110111CNVsFiltering: Step I 0011111011|000|1101111111Filtering: Step IIFiltering:III Step 1111111111|000|1111111111Filtering:IV Step 1101110111|000|1111111111Results 1011111001|000|1111111010Conclusions 1111111111|000|1111011110 1111111111|000|1111111111 1111111111|000|1101111011 Notice short runs of 1s. Are they errors?
  73. 73. Copy-numberVariations in (Future Work) Filtering: Step IVLymphoblas- toid Cell Lines Fei Yu Find probability of < 1011111001|000|1111111010 > usingMotivation hidden Markov model: 3/30/12 hmm.svgA StrangeScenarioDataPipelineHow to detect CNV not CNVCNVsFiltering: Step IFiltering: Step IIFiltering: StepIIIFiltering: StepIVResults mismatched (0) matched (1)Conclusions file://localhost/Users/feiyu/Dropbox/University_Files/ADA/Presentation/2012/graphs/hmm.svg 1/1 CNV not CNV CNV γ 1−γ Pi,i+1 = not CNV 1−λ λ where γ and λ are big.
  74. 74. Copy-numberVariations in ResultsLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline • After a series of filtering, only 10 loci in the pool of 16How to detectCNVs individuals are found to be CNV candidates.Filtering: Step IFiltering:Filtering: Step II Step • Those 10 loci fall into short runs of 0s. They are unlikelyIIIFiltering: Step to be CNVs.IVResults • We will fit HMM when there are more reliable signals.Conclusions
  75. 75. Copy-numberVariations in ResultsLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline • After a series of filtering, only 10 loci in the pool of 16How to detectCNVs individuals are found to be CNV candidates.Filtering: Step IFiltering:Filtering: Step II Step • Those 10 loci fall into short runs of 0s. They are unlikelyIIIFiltering: Step to be CNVs.IVResults • We will fit HMM when there are more reliable signals.Conclusions
  76. 76. Copy-numberVariations in ConclusionsLymphoblas- toid Cell Lines Fei YuMotivationA StrangeScenarioDataPipeline • No CNV is good news. We now know a great amount ofHow to detectCNVs time, money, and effort have not gone to waste.Filtering: Step IFiltering:Filtering: Step II Step • A useful assessment procedure when labs create cell lines.IIIFiltering:IV Step • In a separate work, we extended this procedure to findingResults mutation in cell line, i.e., TB = 0, TC = 1.Conclusions

×