Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach

8,908 views

Published on

Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach

Published in: Software
  • Be the first to comment

  • Be the first to like this

Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach

  1. 1. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach CB Hong ⇤ , KJ Kim 4-5 February 2015 Contents 1 TCGA Benchmark 4 Data Set 3 1.1 GenomeTorrent| t© TCGA pt0 ‰¥‹ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Sample Data Set DX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 îú⌧ Ì Ù Ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 ‰µ` pt0 Ux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Somatic Mutation Prediction 6 2.1 SomaticSniper ‰â ✏ ¨⌅ D0 ©X0 (164 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 VarScan2 ‰â ✏ ¨⌅ D0 ©X0 (10Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 MuTect ‰â ✏ ¨⌅ D0 ©X0 (18Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Full Consensus / Partial Consensus sSNV lX0 11 3.1 Bi-allelic SNPà îúX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Full Consensus / Partial Consensus lX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Full Consensus / Partial Consensus /⇠ lX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 î D0 ©X0 13 4.1 Unifed Genotyper| t© normal, tumor variants call (8Ñ) . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Filtering SNVs - full consensus (›µ •) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3 Filtering SNVs - partial consensus (SomaticSniper/MuTect) . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4 GATK D0| © ƒ Full Consensus / Partial Consensus /⇠ lX0 . . . . . . . . . . . . . . . . . . 14 4.5 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Validation 15 5.1 COSMIC, CCLE pt0 DX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2 Validation ⇠â - consensus / parital consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.3 ¨X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6 0¿ Somatic Mutation Callers - Strelka, Virmid 17 6.1 Strelka (1Ñ38 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2 Virmid (33Ñ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ⇤KT GenomeCloud hongiiv@gmail.com 1
  2. 2. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 2 7 ⌅¥ l| ⌅ ¨⇧§ 19 7.1 ‰µ© ¨⇧§ ⌧Ñ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.2 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 - ƒ∞਩ê . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.3 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 -  ⇣î ¨⇧§ ¨©ê . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.4 ¨⇧§ ‹§ Ù LD¥0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.5 ¨⇧§ | ‹§ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.6 ¨⇧§ X‹§l î X0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7.7 | ( Ö9¥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7.8 ¨⇧§ $∏Ãl Ù . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7.9 ¨⇧§ Uï ttX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.10 ¨⇧§ å⌅∏Ë¥ $XX0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.10.1 APT| t© å⌅∏Ë¥ $X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.10.2 å§ T‹ Ù |D µ å⌅∏Ë¥ $X . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
  3. 3. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 3 1 TCGA Benchmark 4 Data Set ¯ ‰µ–⌧î TCGA mutation calling benchmark4 datasetsD t©XÏ ¥ªå somatic mutationD >D¿– t⌧ LD ¸ ÉÖ»‰. Genome sequencing benchmakr dataset@ x⌅ < tumor ÿ – | D((5%-95%)X Normal ÿ D <iXÏ ›1 pt0Ö»‰. t ⌘–⌧ ∞¨î n40t60 (mixed with 60% of the tumor and 40% of the normal)¸ t– QXî normal sampleD ¨©` ÉÖ»‰. t˘ pt0î BAM Ϙ< TCGA Benchmark Hò t¿–⌧ ‰¥‹ •i»‰. 1.1 GenomeTorrent| t© TCGA pt0 ‰¥‹ • ‰¥‹ S/W $X - Key/UUID | ‰¥‹ - ÿ ‰¥‹ • ‹)TCGA Benchmark Data SetD ⌅ Public Key ‰¥‹ • https://cghub.ucsc.edu/datasets/benchmark download.html $ cd $ wget https:// cghub.ucsc.edu/software/downloads/cghub_public.key • π |X ‰¥‹ Ù| ÏhXî UUID(universally unique identifier, ›ƒê) | • TCGA Benchmark cell line: HCC1143 tumor 50x $ curl https:// cghub.ucsc.edu/cghub/metadata/ analysisAttributes ? analysis_id=ad3d4757 -f358 -40a3 -9d92 -742463 a95e88 -o uuid.txt $ more uuid.txt <?xml version="1.0" encoding="utf -8" standalone="yes"?> <center_name >UCSC </ center_name > <study >TCGA_MUT_BENCHMARK_4 </study > <files > <file > <filename >G15511.HCC1143 .1.bam </ filename > <filesize >255795959440 </ filesize > </file > • gtdownload| t© pt0 ‰¥‹ $ cd $ gtdownload -c cghub_public.key -vv -d uuid.txt 1.2 Sample Data Set DX0 • BAMX |Ä Ì îú - ,(sort) - xqÒ (index) ¸…¥ Ë⌅ îú (-b: bam Ϙ< ú%) $ cd $ samtools view -b in.bam 1 > chr1.bam $ samtools sort chr1.bam chr1_sorted $ samtools index chr1_sorted.bam • π ÌX îú (BED | t©)
  4. 4. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 4 $ cd $ cat chr17.bed 17:5967 -6207 17:11197 -11389 17:11806 -12018 17:13897 -14017 17:22307 -22427 17:30843 -30963 17:31151 -31279 17:63618 -63738 17:65398 -65638 17:69410 -69530 17:96838 -97108 17:131511 -131661 17:169155 -169395 17:170984 -171254 17:177205 -177355 17:260100 -260308 17:262897 -263257 17:263317 -263947 $ cat chr17.bed |xargs samtools view -b in.bam > exome.bam $ samtools sort exome.bam exome_sorted $ samtools index exome_sorted.bam 1.3 îú⌧ Ì Ù Ux • readƒ ⌅X Ù| bed Ϙ< ú%‰. ⌅Ëà ucsc genome browserX custom track< î XÏ align ⌧ read Ù| Ux` ⇠ à‰. $ cd $ bamToBed -i exome_sorted.bam > cov_1.bed • BAM |X ‰Ñ¨¿| BED | ú%Xp, read depth Ù| ৆¯®< ¯¨0 ⌅ Ù © ⇠ à‰. $ cd $ samtools view -b exome_sorted.bam | genomeCoverageBed -ibam stdin > cov_2.bed 1.4 ‰µ` pt0 Ux • ÿ , ⌅¯®, |§ pt0 ©] $ cd /somatic_bench $ pwd /somatic_bench $ ls -al total 176 drwxr -xr -x 7 root root 4096 Jan 21 15:25 . drwxr -xr -x 25 root root 4096 Jan 20 08:53 ..
  5. 5. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 5 drwxr -xr -x 9 root root 4096 Jan 21 08:15 app drwxr -xr -x 2 root root 4096 Jan 21 14:38 bam drwxr -xr -x 2 root root 4096 Jan 19 11:43 reference drwxr -xr -x 2 root root 4096 Jan 21 15:24 script drwxr -xr -x 2 root root 151552 Jan 21 12:59 tmp $ more /somatic_bench/script/ somatic_call_bench .sh input_bam1="/somatic_bench/bam/hcc1143.ccle.n40t60.sorted.bam" input_bam2="/somatic_bench/bam/hcc1143.ccle.b.sorted.bam" gatk_b37="/somatic_bench/reference/ human_g1k_v37_decoy .fasta" temp_dir="/somatic_bench/tmp/" $ cd $ ln -s /somatic_bench/bam/hcc1143.ccle.n40t60.sorted.bam tumor.bam $ ln -s /somatic_bench/bam/hcc1143.ccle.b.sorted.bam normal.bam 1.5 ¨X0 • ⌅¯® ©]: wget, curl, gtdownload, samtools, bedtools(bamToBed, genomeCoverageBed) • ∞¸<: –Xî ÌÃt t¨Xî .bam, t˘ .bamX coverage| Ùϸî .bed
  6. 6. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 6 2 Somatic Mutation Prediction SomaticSniper, VarScan2, MuTectD t©XÏ ÿ pt0K< Ä0 (tumor@ matched normal bam) somatic mu- tationD >D≈»‰. • Ñ Ö9: https://gist.github.com/hongiiv/06611f189f4c8158edb0 • SAMtools: v0.1.19 • GATK: v2.8.1 • MuTect: v1.1.4 • SomaticSniper: v1.0.4 • Strelka: v1.0.14 • Virmid: v1.1.1 2.1 SomaticSniper ‰â ✏ ¨⌅ D0 ©X0 (164 ) SomaticSniperî Varscan2| Ç ÃÒ4 YX Li Ding– Xt 2011D ⌧⌧⇠»<p, Bayesian probability@ poste- rior filteringD t©‰. ¸î π’<î High computational e ciency| Ùx‰. • -J: joint genotyping mode with default prior probability of a somatic mutation (0.01) • -n, -t: normal/tumor sample id (for VCF header) • -F: output Ϙ (classic, vcf, bed) • -f: ref.fasta |X Ω $ cd $ bam - somaticsniper -J -F vcf -n HCC1143_Normal -t HCC1143_Tumor -f /somatic_bench/reference/ human_g1k_v37_decoy .fasta tumor.bam normal.bam HCC1143_somaticsniper .vcf • (D05X) Reads with a mapping quality of 0 were filtered prior to somatic mutation identification. Predictions with ’somatic score’ of 40 or greater were considered for subsequent downstaream validation and analysis step. • GATKXSelectVariants| t©XÏ –Xî variantsÃD îú` ⇠ à‰. • VCF |X FORMAT D‹X SSC (somatic score), MQ (mapping quality) Ù| t© $ cd $ ln -s /somatic_bench/app/GenomeAnalysisTK -2.8 -1/ GenomeAnalysisTK .jar ./ $ update -alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java ). Selection Path Priority ------------------------------------------------------------ 0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1 * 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 Press enter to keep the current choice [*], or type selection number: 2 update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java
  7. 7. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 7 $ java -version java version "1.7.0 _72" Java(TM) SE Runtime Environment (build 1.7.0_72 -b14) Java HotSpot(TM) 64-Bit Server VM (build 24.72 -b04 , mixed mode) $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_somaticsniper .vcf -o HCC1143_somaticsniper_filter .vcf -sn HCC1143_Tumor -sn HCC1143_Normal -select 'vc.getGenotype(" HCC1143_Tumor"). getExtendedAttribute ("SSC") >= 40 && (vc.getGenotype(" HCC1143_Tumor"). getExtendedAttribute ("MQ") > 0 || vc.getGenotype(" HCC1143_Normal "). getExtendedAttribute ("MQ") > 0)' • D0 ⌅/ƒX mutation /⇠ DPX0 $ cd $ grep -v "#" HCC1143_somaticsniper .vcf |wc -l 583 $ grep -v "#" HCC1143_somaticsniper_filter .vcf |wc -l 161 2.2 VarScan2 ‰â ✏ ¨⌅ D0 ©X0 (10Ñ) VarScan2î ÃÒ4 YX Li Ding– Xt SomaticSniperÙ‰ 1D ¶@ 2012D ⌧⌧⇠»‰. ‰x 4‰¸î Ϩ Fisher exact test@ filtering and FDR correctionD ¨©‰. ¸î π’< high-quality sSNVs– t⌧ sensitive detectionD ⇠â‰. ‰x 4‰¸ Ϩ Ö% |D .bam |t Dà pileup ⇣î mpileup |D Ö% î‰. • samtoolsX mpileupD t©XÏ normal, tumor– t⌧ pileup/mpileup ϘD ›1‰. • mpileup ˃–⌧ -q 1 (skip alignments with mapQ smaller than INT), -B (disable BAQ computation) 5XD µt filter| ⇠â‰. • VarScan–⌧ mpileup1 ϘD Ö%< ¨©Xî Ω∞ ’–mpileup 1’ 5XD ‰. $ cd $ samtools mpileup -f /somatic_bench/reference/ human_g1k_v37_decoy .fasta -q 1 -B normal.bam > HCC1143_n.pileup $ samtools mpileup -f /somatic_bench/reference/ human_g1k_v37_decoy .fasta -q 1 -B tumor.bam > HCC1143_t.pileup $ ln -s /somatic_bench/app/VarScan/VarScan.v2 .3.3. jar ./ $ java -jar VarScan.v2 .3.7. jar somatic HCC1143_n.pileup HCC1143_t.pileup HCC1143_varscan --output -vcf 1 14617150 positions in tumor 14616970 positions shared in normal 13721478 had sufficient coverage for comparison 10tX 8⌧‰@ samtoolsX pileupD ¨©Xî ÉD 0 < $Ö⇠¥ à¿Ã, samtools ≈pt∏ ⇠t⌧ pileup@ ¨|¿‡ mpileup < ¥ ⇠»‰. X¿Ã mpileup<ƒ XòX ÿ à pileupt •X‰. <` varscan–⌧î N/T ®P Ïh⌧ mpileup |D ¿–‰.
  8. 8. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 8 13700958 were called Reference 0 were mixed SNP -indel calls and filtered 18427 were called Germline 1562 were called LOH 450 were called Somatic 81 were called Unknown 0 were called Variant • VarScan2X ⇠â∞¸ Dò@ ⇡t INDEL¸ SNP Ïh⌧ ∞¸| VCF ‹ ›1⌧‰ (HCC1143 varscan.indel.vcf, HCC1143 varscan.snp.vcf). drwxr -xr -x 2 root root 4096 Jan 30 09:52 ./ drwxr -xr -x 5 root root 8192 Jan 30 09:35 ../ -rw -r--r-- 1 root root 402354 Jan 30 09:47 HCC1143_varscan .indel.vcf -rw -r--r-- 1 root root 2691462 Jan 30 09:47 HCC1143_varscan .snp.vcf • VarScan2X ∞¸ ⌘, HCC1143varscan.snp.vcf XprocessSomaticısomaticFilter|tXD0|¸. • processSomatic: high-confidence2 /low-confidence Somatic mutationsD Ѩt ‰. • somaticFilter: ê‡t –Xî D0 5X –min-coverage, –p-value, –indel-file Ò © •X‰. $ cd $ java -jar VarScan.v2 .3.3. jar processSomatic -help USAGE: java -jar VarScan.jar process [status -file] OPTIONS status -file - The VarScan output file for SNPs or Indels OPTIONS --min -tumor -freq - Minimum variant allele frequency in tumor [0.10] --max -normal -freq - Maximum variant allele frequency in normal [0.05] --p-value - P-value for high -confidence calling [0.07] $ java -jar VarScan.v2 .3.3. jar processSomatic HCC1143_varscan .snp.vcf Reading input from HCC1143_varscan .snp.vcf Opening output files: 17914 VarScan calls processed 382 were Somatic (102 high confidence) 16048 were Germline (15431 high confidence) 1451 were LOH (1447 high confidence) • processSomaticX ∞¸ Germline, LOH, Somatic– t⌧ high confidence, low confidenceX ©]t Ïh ⌧ ∞¸| ›1‰. $ ls -rw -r--r-- 1 2413169 Jan 30 09:52 HCC1143_varscan .snp.vcf.Germline -rw -r--r-- 1 2320566 Jan 30 09:52 HCC1143_varscan .snp.vcf.Germline.hc -rw -r--r-- 1 216574 Jan 30 09:52 HCC1143_varscan .snp.vcf.LOH -rw -r--r-- 1 215997 Jan 30 09:52 HCC1143_varscan .snp.vcf.LOH.hc -rw -r--r-- 1 59990 Jan 30 09:52 HCC1143_varscan .snp.vcf.Somatic -rw -r--r-- 1 17055 Jan 30 09:52 HCC1143_varscan .snp.vcf.Somatic.hc • VarScan2X ∞¸ VCFX Ω∞ ALT allele– ’G/T’ Ò< 0Xîp tî îƒ Ñ – –Ï| ⌧›‰. 0| ⌧ ’G,T’X ⌅ )›< ¿Ω‰. 2tumor–⌧ minimum variant allele frequency 0.1, normal–⌧ maximum variant allele frequency 0.05
  9. 9. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 9 $ cd $ perl -pe 's/tA //tA ,/' HCC1143_varscan .snp.vcf.Somatic.hc | perl -pe 's/tT //tT ,/'| perl -pe 's/tG //tG ,/'| perl -pe 's/tC //tC ,/' > HCC1143_varscan_filter .vcf • D0 ƒX mutation /⇠ $ cd $ grep -v "#" HCC1143_varscan_filter .vcf |wc -l 102 2.3 MuTect ‰â ✏ ¨⌅ D0 ©X0 (18Ñ) MuTect@ Broad–⌧ ⌧⌧⌧ 4 Bayesian probability with pre- and post- filteringD ⇠âXp, πà low allelic-fraction –⌧ sSNVs– t⌧ sensitive detectionD ⇠â‰. • MuTectî ê 1.6 Ñ⌅–⌧à ŸëX0 L8– ⌅¨ Java Ñ⌅D Ux ƒ– Dî‹ update-alternatives| t ©XÏ Ñ⌅D ¿Ω‰. $ cd $ ln -s /somatic_bench/app/mutect/muTect -1.1.4. jar ./ $ samtools index normal.bam $ samtools index tumor.bam $ cp /somatic_bench/reference/ccle.gatk.bed ./ $ update -alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java ). Selection Path Priority ------------------------------------------------------------ 0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1 * 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 Press enter to keep the current choice [*], or type selection number: 1 update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java $ java -version java version "1.6.0 _45" Java(TM) SE Runtime Environment (build 1.6.0_45 -b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45 -b01 , mixed mode) $ java -jar muTect -1.1.4. jar --analysis_type MuTect --reference_sequence /somatic_bench/reference/ human_g1k_v37_decoy .fasta --cosmic /somatic_bench/reference/ b37_cosmic_v54_120711 .vcf --dbsnp /somatic_bench/reference/dbsnp_132_b37.leftAligned.vcf --input_file:normal normal.bam --input_file:tumor tumor.bam --out HCC1143_mutect .out --vcf HCC1143_mutect .vcf --coverage_file HCC1143.mutect.cov.wig.txt --normal_sample_name HCC1143_Normal --tumor_sample_name HCC1143_Tumor -L ccle.gatk.bed
  10. 10. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 10 • (D05X) Predictions not labeled as ’REJECT’ were accepted as confident somatic mutation predictions, and subsequent downstream validation and analysis steps. • D0– ¨©` GATKî ê 1.7 Ñ⌅D Dî X¿ update-alternatives| t©XÏ ê Ñ⌅D ¿Ω‰. • GATKX SelectVariants| t©XÏ VCFX D0 (FILTER) D‹ÄÑt PASS⌧ (REJECT| ⌧x) variantsà >D∏‰. $ cd $ update -alternatives --config java There are 2 choices for the alternative java (providing /usr/bin/java ). Selection Path Priority ------------------------------------------------------------ 0 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 1 /usr/lib/jvm/java -6- oracle/jre/bin/java 1 * 2 /usr/lib/jvm/java -7- oracle/jre/bin/java 2 Press enter to keep the current choice [*], or type selection number: 2 update -alternatives : using /usr/lib/jvm/java -6- oracle/jre/bin/java $ java -version java version "1.7.0 _72" Java(TM) SE Runtime Environment (build 1.7.0_72 -b14) Java HotSpot(TM) 64-Bit Server VM (build 24.72 -b04 , mixed mode) $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect .vcf -o HCC1143_mutect_filter .vcf -sn HCC1143_Tumor -sn HCC1143_Normal -select 'vc.isNotFiltered ()' • GATKX SelectVariants| t©XÏ VCFX D0 (FILTER) D‹ ÄÑt PASS⌧ (REJECT| ⌧x) variantsà >D∏‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect .vcf -o HCC1143_mutect_filter .vcf -sn HCC1143_Tumor -sn HCC1143_Normal --excludeFiltered • D0 ƒX mutation /⇠ $ cd $ grep -v "#" HCC1143_mutect_filter .vcf |wc -l 109 2.4 ¨X0 • ⌅¯® ©]: VarScan2, SomaticSniper, MuTect, GATK • ∞¸<: 4ƒ D0 DÃ⌧ somatic mutation (161, 102, 112)
  11. 11. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 11 3 Full Consensus / Partial Consensus sSNV lX0 SomaticSniper, VarScan2, MuTect 3ÖX SNV detecting toolsX full consensus callD >î‰. ∞ multi-allelic¸ indel @ ⌧p‰. 3.1 Bi-allelic SNPà îúX0 • ¨⌅ D0 ∞¸– t⌧ multi-allelicD ⌧pX‡ SNPà îú‰. • GATKX SelectVariants| t©XÏ -selectTypeD SNP (INDEL, SNP, MIXED, MNP, SYMBOLIC, NO VARIATION), -restrictAllelesTo| BIALLELIC (MULTIALLELIC or BIALLELIC)< ‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect_filter .vcf -o HCC1143_mutect_1 .vcf -selectType SNP -restrictAllelesTo BIALLELIC $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_somaticsniper_filter .vcf -o HCC1143_somaticsniper_1 .vcf -selectType SNP -restrictAllelesTo BIALLELIC $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_varscan_filter .vcf -o HCC1143_varscan_1 .vcf -selectType SNP -restrictAllelesTo BIALLELIC 3.2 Full Consensus / Partial Consensus lX0 • Partial Consensus (SomaticSniper/MuTect, MuTect/VarScan2, VarScan2/SomaticSniper)@ somatic caller 3Ö– ⌅¥ consensus| l‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_somaticsniper_1 .vcf --concordance HCC1143_mutect_1 .vcf -o HCC1143_SM.vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_mutect_1 .vcf --concordance HCC1143_varscan_1 .vcf -o HCC1143_MV.vcf
  12. 12. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 12 $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_varscan_1 .vcf --concordance HCC1143_somaticsniper_1 .vcf -o HCC1143_VS.vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM.vcf --concordance HCC1143_varscan_1 .vcf -o HCC1143_SMV.vcf 3.3 Full Consensus / Partial Consensus /⇠ lX0 • full consensus ✏ parital consensus /⇠| l‰. $ cd $ grep -v "#" HCC1143_SM.vcf |wc -l 45 $ grep -v "#" HCC1143_MV.vcf |wc -l 38 $ grep -v "#" HCC1143_VS.vcf |wc -l 42 $ grep -v "#" HCC1143_SMV.vcf |wc -l 32 3.4 ¨X0 • ⌅¯® ©]: GATK • ∞¸<: consensus / parital consensus pt0
  13. 13. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 13 4 î D0 ©X0 GATK Unified Genotyper| t©XÏ specificity| ù ‹¨ ⇠ à‰. 4.1 Unifed Genotyper| t© normal, tumor variants call (8Ñ) • GATK UnifiedGenotyper| t©XÏ Normal/Tumor ÿ – t SNP| calling‰. $ cd $ java -jar GenomeAnalysisTK .jar -T UnifiedGenotyper -o HCC1143_gatk.tumor.vcf -I tumor.bam --genotype_likelihoods_model SNP -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta -L ccle.gatk.bed $ java -jar GenomeAnalysisTK .jar -T UnifiedGenotyper -o HCC1143_gatk.normal.vcf -I normal.bam --genotype_likelihoods_model SNP -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta -L ccle.gatk.bed 4.2 Filtering SNVs - full consensus (›µ •) • GATK UnifiedGenotyper| t©XÏ ›1⌧ Normal/Tumor X variants| t©XÏ SNVs predicted in tumor but not the germlines D0| ⇠â‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SMV.vcf --discordance HCC1143_gatk.normal.vcf -o HCC1143_SMV_discordance_normal .vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SMV_discordance_normal .vcf --concordance HCC1143_gatk.tumor.vcf -o HCC1143_final_filter_concordance .vcf 4.3 Filtering SNVs - partial consensus (SomaticSniper/MuTect) $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM.vcf --discordance HCC1143_gatk.normal.vcf
  14. 14. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 14 -o HCC1143_SM_discordance_normal .vcf $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM_discordance_normal .vcf --concordance HCC1143_gatk.tumor.vcf -o HCC1143_SM_final_filter_concordance .vcf 4.4 GATK D0| © ƒ Full Consensus / Partial Consensus /⇠ lX0 • GATK D0| » consensus ✏ parital consensus /⇠| l‰. $ cd $ grep -v "#" HCC1143_final_filter_concordance .vcf |wc -l 32 $ grep -v "#" HCC1143_SM_final_filter_concordance .vcf |wc -l 45 4.5 ¨X0 • ⌅¯® ©]: GATK • ∞¸<: GATK D0| © consensus / parital consensus pt0
  15. 15. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 15 5 Validation COSMIC¸CCLEX HCC1143 ÿ – ¿t ¨§∏| ¿‡ º»ò |XXî¿| LD¯‰. validation.list |@ ⌧Ñ– •⌧ | ⇣î ‰¥‹ (https://gist.github.com/hongiiv/42194181ce6402d8b629)XÏ ¨©i»‰. 5.1 COSMIC, CCLE pt0 DX0 • COSMIC¸ CCLEX HCC1143 ÿ – ¿t ©] ( 103⌧)D ı¨‰. $ cd $ cp /somatic_bench/reference/validation.list ./ $ cat validation.list | wc -l 103 5.2 Validation ⇠â - consensus / parital consensus • Ö filter⌧ consensus/partial consensus (SomaticSniper/MuTect)– t⌧ á⌧ |XXî¿| Ux‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_final_filter_concordance .vcf -o all.val.filter.vcf -L validation.list $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM_final_filter_concordance .vcf -o sm.val.filter.vcf -L validation.list $ grep -v "#" all.val.filter.vcf | wc -l 6 $ grep -v "#" sm.val.filter.vcf | wc -l 9 • î GATK D0⌅X consensus ¿t– t⌧ á⌧ |XXî¿| Ux‰. $ cd $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SMV.vcf -o all.val.vcf -L validation.list $ java -jar GenomeAnalysisTK .jar -T SelectVariants -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta --variant HCC1143_SM.vcf -o sm.val.vcf -L validation.list $ grep -v "#" all.val.vcf |wc -l 6
  16. 16. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 16 $ grep -v "#" sm.val.vcf |wc -l 9 • consensus: before GATK filter (32/6) - after GATK filter (32/6) • partial consensus-SM: before GATK filter (45/9) - after GATK filter (45/9) 5.3 ¨X0 • ⌅¯® ©]: GATK • ∞¸<: Ö consensus / partial consensus@ COSMIC, CCLE@ |XXî /⇠
  17. 17. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 17 6 0¿ Somatic Mutation Callers - Strelka, Virmid 6.1 Strelka (1Ñ38 ) Bayesian probability with posterior filtering| t© somatic mutation caller 2012D |˯ò Ç ⌅¯®t ‰. |˯òX alignerx issactò eland –à D»| bwaƒ ¿–‰.‰â)ït |⇠ ⌅¯®‰¸î }⌅ ‰x )›D t©Xîp tî |˯ò ¸ ⌧ issac ⇣ D∑ ‰â)ïD ¨©Xp, tî XòX ⌅ ∏| ®( < ¨X‡ | 1àå ¨X0 ⌅XÏ Makefile t|î ›D ¨©Xî make |î ¯¨| t© X0 L8t‰. • Strelka| ¨©X0 ⌅t⌧î StrelkaX 5Xt •⌧ |t DîXp, 0¯ < bwa, eland, isaac 3⌧X aligner| ⌅ 0¯ 5XD ⌧ı‰. • 0¯ 5X–⌧ exometò target sequencingX Ω∞ isSkipDepthFilters = 1 ¿ ‰. $ ll /somatic_bench/app/strelka -1.0.14/ etc/ total 20 drwxrwxr -x 2 viz viz 4096 Jul 10 2014 ./ drwxr -xr -x 7 root root 4096 Jan 30 11:06 ../ -rw -rw -r-- 1 viz viz 3658 Jul 10 2014 strelka_config_bwa_default .ini -rw -rw -r-- 1 viz viz 3683 Jul 10 2014 strelka_config_eland_default .ini -rw -rw -r-- 1 viz viz 3821 Jul 10 2014 strelka_config_isaac_default .ini • Strelka $X⌧  †¨@ Ñ ∞¸ •  †¨– t⌧ ¿⇠ $ D ‰. • 0¯ 5X |D ı¨X‡ configureStrelkaWorkflow.pl Ö9< Ñ Ö9¥| ›1‰. • É¥ƒ Ñ Ö9D make| µt ‰âXp tL -j 5XD µt Ñ – ¨©` thread (cpu) /⇠| ¿ ‰. • INDEL¸ SNP ƒƒX VCF Ϙ< ›1⇠p, pass ⌧ ɸ raw somatic 4⌧X ∞¸ |t ›1⌧‰. $ STRELKA_INSTALL_DIR =/ somatic_bench/app/strelka -1.0.14/ echo $ STRELKA_INSTALL_DIR /somatic_bench/app/strelka -1.0.14/ $ WORK_DIR =/ root/myWork $ cp $ STRELKA_INSTALL_DIR /etc/ strelka_config_isaac_default .ini config.ini $ STRELKA_INSTALL_DIR /bin/ configureStrelkaWorkflow .pl --normal =/ root/normal.bam --tumor =/ root/tumor.bam --ref=/ somatic_bench/reference/ human_g1k_v37_decoy .fasta --config=config.ini --output -dir =./ myAnalysis $ cd ./ myAnalysis $ make -j 8 $ ll myAnalysis/results/ total 88 drwxr -xr -x 2 root root 4096 Jan 30 11:39 ./ drwxr -xr -x 5 root root 4096 Jan 30 11:37 ../ -rw -r--r-- 1 root root 13452 Jan 30 11:37 all.somatic.indels.vcf -rw -r--r-- 1 root root 36736 Jan 30 11:37 all.somatic.snvs.vcf -rw -r--r-- 1 root root 7098 Jan 30 11:37 passed.somatic.indels.vcf -rw -r--r-- 1 root root 16070 Jan 30 11:37 passed.somatic.snvs.vcf • Ö pass⌧ somatic SNPX /⇠| Ux‰. $ cd myAnalysis/results/ $ grep -v "#" passed.somatic.snvs.vcf|wc -l 62
  18. 18. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 18 6.2 Virmid (33Ñ) Virmidî 2013D 8 YP @¡∞ P⇠ Ç å⌅∏Ë¥Ö»‰. ÿ ¡D µt tumor–⌧ normal ÿ X pro- portionD ©‰ (↵). • Ö pass⌧ somatic SNPX /⇠| Ux‰. $ java -jar /somatic_bench/app/Virmid -1.1.1/ Virmid.jar -R /somatic_bench/reference/ human_g1k_v37_decoy .fasta -D /root/tumor.bam -N /root/normal.bam -t 8 -w /root/virmid $ cd /root/virmid $ ls -la $ ls -al total 98024 drwxr -xr -x 2 root 4096 Jan 30 16:00 ./ drwxr -xr -x 8 root 8192 Jan 30 15:32 ../ -rw -r--r-- 1 root 1252161 Jan 30 16:03 tumor.bam.virmid.germ.all.vcf -rw -r--r-- 1 root 955213 Jan 30 16:03 tumor.bam.virmid.germ.passed.vcf -rw -r--r-- 1 root 262 Jan 30 16:00 tumor.bam.virmid.gm -rw -r--r-- 1 root 36564 Jan 30 16:03 tumor.bam.virmid.loh.all.vcf -rw -r--r-- 1 root 2233 Jan 30 16:01 tumor.bam.virmid.loh.passed.vcf -rw -r--r-- 1 root 992 Jan 30 16:03 tumor.bam.virmid.report -rw -r--r-- 1 root 1364144 Jan 30 15:29 tumor.bam.virmid.sample.control.bai -rw -r--r-- 1 root 53107377 Jan 30 15:29 tumor.bam.virmid.sample.control.bam -rw -r--r-- 1 root 1364104 Jan 30 15:29 tumor.bam.virmid.sample.disease.bai -rw -r--r-- 1 root 41746178 Jan 30 15:29 tumor.bam.virmid.sample.disease.bam -rw -r--r-- 1 root 84053 Jan 30 16:03 tumor.bam.virmid.som.all.vcf -rw -r--r-- 1 root 6883 Jan 30 16:03 tumor.bam.virmid.som.passed.vcf $ grep -v "#" tumor.bam.virmid.som.passed.vcf|wc -l 78
  19. 19. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 19 7 ⌅¥ l| ⌅ ¨⇧§ 7.1 ‰µ© ¨⇧§ ⌧Ñ • ⌧Ñ ¸å: xxx.xxx.xxx.xxx • Dt: edu01, edu02 • T8: kogo2015 • ˘⌘ç: http://xxx.xxx.xxx.xxx:8787 7.2 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 - ƒ∞à¨©ê • http://www.chiark.greenend.org.uk/˜sgtatham/putty/download.html ⌘ç • Intel x86© putty.exe| ‰¥‹ i»‰. • Host Name: xxx.xxx.xxx.xxx / Port: xx • Security Alert =t (t ’ (Y)’| ›i»‰. • ¯x Dt: `˘ @ Dt@ T8| ¨©i»‰. 7.3 ‰µ© ¨⇧§ ⌧Ñ ⌘çX0 -  ⇣î ¨⇧§ ¨©ê • Â(OSX)X Ω∞ ’Q©⌅¯®, ¯¨, 0¯⇣ app’D ‰âi»‰. ¨⇧§X Ω∞ ’Tt ⇣î ê ¨ ⇧§X ⌅¯® Tt–⌧ 0¯⇣D ‰â i»‰. $ ssh user_id@host_name $ ssh root@127 .0.0.1 • ssh Ö9D t©XÏ ‰µ© ¨⇧§ ⌧Ñ– ⌘çi»‰. ´à¯ ⌘ç‹ yes| ›Xt T8| ;î Ttt ò$å ⇠p tL ÄÏ @ T8| Ö%XÏ ⌘çi»‰. 7.4 ¨⇧§ ‹§ Ù LD¥0 ¯ 8⌧î ¨⇧§ 0Ï⇣3 X Xòx ’Ubuntu (∞Ñ,)’| 0⇠< $Öi»‰. ƒƒX ‹ ∆î Ω∞ ®‡ Ö XX ¨⇧§– ¨©t •i»‰. ¨⇧§î ‰ë 0Ï⇣¸ X‹Ë¥¡–⌧ ŸëXî ¥ ¥⌧Ö»‰. ê‡X ¨⇧§ ¥† XΩ–⌧ ŸëXî¿| LDP¥| å⌅∏Ë¥ $X‹ ê‡X ¨⇧§– i å⌅∏Ë¥X $X •i»‰. • ⌅¨ ê‡t ¨©Xî ¨⇧§ 0Ï⇣X ÖX ›ƒXî )ïÖ»‰. UbuntuX Ω∞ 4à 0Ï⇠î ¨⇧§ ¥ ¥⌧ ⌅¨ ‡Ñ⌅@ 14.04 LTS (Long Term Support)4 Ñ⌅Ö»‰. $ cat /etc/issue.net Ubuntu 12.04.1 LTS • ¨⇧§î ‰ë X‹Ë¥ XΩ–⌧ ¥ ⇠p ¨⇧§| ¿–Xî å⌅∏Ë¥‰@ tÏ X‹Ë¥– 0| ‰â |D 0 ⌧ıi»‰. 0|⌧ ⌅¨ ê‡t ¨©Xî X‹Ë¥ Ù| Lt ꇖå fiî å⌅∏Ë ¥| ‰¥‹XÏ ¨©` ⇠ ൻ‰. ¨⇧§ ⌧Ñ •D X‹Ë¥ ¨ë ›ƒ@ ’-m’ â, machine 5XD µt L ⇠ ൻ‰. ’x86’@ Intel 0⇠X CPU| X¯Xp, ’64’î 64D∏ X‹Ë¥| X¯5 i»‰. $ uname -m x86_64 3¨⇧§î lå ‹á ƒÙ¸ pDH ƒÙ Ѩ⇠p ƒÙƒ ‰ë 0Ï⇣t t¨‰. 4T‹Ö@ Trusty TahrÖ»‰. 5Tà ⌅Ï⌧ x64|‡ ⌅i»‰.
  20. 20. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 20 • ‰⇣@ ¨⇧§ ¥ ¥⌧X uÏ< ¨©êX Ö9D ‰⌧ X‹Ë¥| µt ‰âXƒ] i»‰. ¨⇧§ ‰⇣ @ ¨©Xî 0Ï⇣– 0| ⌧ ‰x Ñ⌅D ¨©i»‰. ⌅¨ • ‡X ¨⇧§ ‰⇣@ 3.14.3dmfh 2014D 5‘6| ⌧⌧ Ñ⌅Ö»‰. ¨⇧§ 0Ï⇣@ t⌥å ⌧⌧ ‰⇣D 0⇠< ⌧ë)»‰. ¨⇧§X ‰⇣ Ù ›ƒ tÙƒ] X†µ»‰. $ uname -r 3.2.0 -32 - virtual • X@ ¨⇧§ Ö9¥| Ö% D t| ‰âXî XΩ< ’PATH’î ⌅8§ ŸëXî )ï– •D | Xî ✓x XΩ ¿⇠ ⌘X XòÖ»‰. exportî tÏ XΩ¿⇠X ✓D $ Xî Ö9¥ Ö»‰. ¨⇧§– Ö9D Ö%Xt PATH– $ ⌧  †¨| ∞ Ä…XÏ t˘ Ö9¥ àî¿| UxX‡ t| ‰âi »‰. 0|⌧ ê‡X ¡⌘ å⌅∏Ë¥| $XX‡ ¨⇧§ ¡–⌧ ‰âXî Ω∞ ⇠‹‹ PATH| ¿ t| ¥ –⌧‡¿ ‰ât •Xp ¯⌥¿ J@ Ω∞ å⌅∏Ë¥ $X⌧  †¨ ¥–⌧à ‰ât •i»‰. X XΩ ¿⇠ Ux@ ’env’ Ö9< LD º ⇠ à<p, PATHî ’export’| µt $ i»‰. $ env | grep PATH MANPATH =/usr/local/texlive /2013/ texmf/doc/man: PATH =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin INFOPATH =/usr/local/texlive /2013/ texmf/doc/info: $ export PATH =/BIO/app/bwa -0.7.5a/:$PATH $ env | grep PATH 7.5 ¨⇧§ | ‹§ ¨⇧§X X@ XòX <¨ §l| |¨ < ÏÏ Ì< lÑXÏ ¨Xp X@ | ‹§ D ›1XÏ | ✏  †¨| ¨` ⇠ ൻ‰. • ¨⇧§ ‹§@ ÏÏ ¨©ê ¨©Xî ‹§< ê ê‡X ‡ Ìx H †¨| ¿‡ ൻ ‰. H  †¨¥–⌧î ê‡t |D ›1, ≠⌧ •i»‰. H  †¨ tŸXî Ö9@ ’cd’ Ö9 tp, ⌅¨  †¨ Ωî ’pwd’ Ö9< Ux` ⇠ ൻ‰. $ cd $ pwd /home/hongiiv •  †¨ ɇ t˘  †¨ tŸX0 $ cd $ mkdir sample_data $ ls -la total 2203488 drwxr -xr -x 16 hongiiv hongiiv 4096 May 29 10:34 . drwxr -xr -x 3 root root 4096 May 7 13:14 .. -rw ------- 1 hongiiv hongiiv 1908 May 10 11:59 .bash_history -rw -r--r-- 1 hongiiv hongiiv 220 May 7 13:14 .bash_logout -rw -r--r-- 1 hongiiv hongiiv 3763 May 10 17:06 .bashrc drwxr -xr -x 2 root root 4096 May 29 10:34 sample_data $ cd sample_data $ pwd /home/hongiiv/sample_data •  †¨ ✏ | ≠⌧X0
  21. 21. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 21 $ cd $ rm -rf sample_data $ ls -la total 2203488 drwxr -xr -x 16 hongiiv hongiiv 4096 May 29 10:34 . drwxr -xr -x 3 root root 4096 May 7 13:14 .. -rw ------- 1 hongiiv hongiiv 1908 May 10 11:59 .bash_history -rw -r--r-- 1 hongiiv hongiiv 220 May 7 13:14 .bash_logout -rw -r--r-- 1 hongiiv hongiiv 3763 May 10 17:06 .bashrc $ • ¨⇧§ | ‹§ Ù0 $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 19G 14G 4.8G 74% / udev 3.9G 4.0K 3.9G 1% /dev tmpfs 1.6G 188K 1.6G 1% /run none 5.0M 0 5.0M 0% /run/lock none 3.9G 0 3.9G 0% /run/shm /dev/xvdb1 79G 38G 38G 50% /home/hongiiv/test • <¨ X‹§l X Ù Ù0 - 21.5 GBX <¨ x /dev/xvda X‹§lî vxda1, xvda2 2⌧X  X< l1⇠¥ à<p Linux, Linux swapX |‹§ÑD Ux` ⇠ ൻ‰. $ fdisk -l Disk /dev/xvda: 21.5 GB , 21474836480 bytes 255 heads , 63 sectors/track , 2610 cylinders , total 41943040 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00034212 Device Boot Start End Blocks Id System /dev/xvda1 2048 40038399 20018176 83 Linux /dev/xvda2 40038400 41940991 951296 82 Linux swap / Solaris Disk /dev/xvdb: 300.6 GB , 300647710720 bytes 171 heads , 35 sectors/track , 98112 cylinders , total 587202560 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x3459a991 Device Boot Start End Blocks Id System /dev/xvdb1 2048 587202559 293600256 8e Linux LVM • | ‹§ »¥∏ Ù Ux $ cat /etc/fstab proc /proc proc nodev ,noexec ,nosuid 0 0 /dev/xvda1 / ext3 errors=remount -ro 0 1 /dev/xvda2 none swap sw 0 0
  22. 22. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 22 7.6 ¨⇧§ X‹§l î X0 • fdisk| µt î ⌧ X‹§l| Ux ƒ T›, |‹§ ›1, »¥∏X 3˃| p– X‹§l | ¨©i»‰. USB •X| ¨⇧§– x›X0 ⌅t⌧î mount ¸ ÃD pXt )»‰. $ fdisk /dev/xvdb $ mkfs.ext3 /dev/xvdb1 $ mkdir /new_hdd $ mount /dev/xvdb1 /new_hdd $ cd /new_hdd $ df -h 7.7 | ( Ö9¥ • touch - | l0 0x »¥ | ›1Xpò |t ›1⌧ ‹⌅D ¿Ω` ⇠ ൻ‰. ⌅9 ⌅¥ ( å⌅∏Ë¥ $Xò P!‹ ¨©Xî Ö9¥ ⇡¿X‹0 绉. $ touch a $ ls -al -rw -r--r-- 1 root root 0 Jun 18 10:04 a $ date Wed Jun 18 10:05:10 KST 2014 $ touch -c a $ ls -al -rw -r--r-- 1 root root 0 Jun 18 10:05 a • cat - |X ¥©D UxXpò ⌅Ë §lΩ∏ ë1‹ ¨©i»‰. ’cat ¿ test’ Ö9< test|î |D ›1Xt⌧ | ¥©D ë1i»‰. ë1t DÃ⌧ ƒ–î ’ctrl+D’ ѺD Ï `8ò, ⇠ ൻ‰. $ cat > test hi there my name is hong $ cat test hi there my name is hong $ ls -al -rw -r--r-- 1 root root 25 Jun 18 10:09 test • π  †¨X |X /⇠ 80 $ ls -l . | grep ^- | wc -l 50 • |X π 8êÙ ‹ëXî ÄÑD ⌧x ÄÑ ú%X0Ö»‰. VCF |¸ ⇡t ’’ ‹ëXî ÄÑ@ ¸ x Ω∞ ¸ ÃD ⌧x ‰⌧ ⌅¿tX ¨§∏| ú%i»‰. ⇣î ¯ ⇠ ¸ ÄÑÃD ú%i »‰. $ cd /BIO/data/gatk $ grep -v "#" dbsnp_138.hg19.vcf| wc -l 8087914 $ grep -F "#" dbsnp_138.hg19.vcf |wc -l 165
  23. 23. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 23 • π ¸…¥Ã ú%i»‰. t˘ ¸…¥X L ≥⌧ ’-d’, +ê⌧’-c’< ,t •i»‰. $ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| more chrM chrM chrM chrM chrM chrM chrM chrM chrM $ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| sort -d chr1 chr2 $ grep -v "#" dbsnp_138.hg19.vcf |awk '{print $1}'| uniq -c 475 chrM 4723878 chr1 3363561 chr2 $ grep -v "#" dbsnp_138.hg19.vcf | awk '{if ($1 == "chrM") printf "chrM is: %sn", $2}' chrM is: 16390 chrM is: 16391 chrM is: 16429 chrM is: 16445 chrM is: 16499 • ú%< ú%⇠î ¥©D | •X0 $ grep -v "#" dbsnp_138.hg19.vcf | awk '{if ($1 == "chrM") printf "chrM is: %sn", $2}' > ~/chr_pos.txt $ grep -v "#" dbsnp_138.hg19.vcf | awk '{if ($1 == "chr1") printf "chrM is: %sn", $2}' >> ~/chr_pos.txt 7.8 ¨⇧§ $∏Ãl Ù • $∏Ãl x0òt§– Ù eth0X inet addrt xÄ–⌧ ⌅¨ ¨⇧§ ⌘ç • ¸å6 Ö»‰. $ ifconfig eth0 Link encap:Ethernet HWaddr 02:00:5b:73:00:33 inet addr: 172.27.252.234 Bcast: 172.27.255.255 inet6 addr: fe80::5bff:fe73:33/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:501386 errors:0 dropped:0 overruns:0 frame:0 TX packets:346879 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:19357734604 (1 GB) TX bytes:2720265191 (2 GB) Interrupt:68 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host 6¨⇧§ ⌧ÑX ¸åî 172.27.252.234 êX ‰µ XΩ– 0| ‰tå ‹⌧‰.
  24. 24. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 24 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:4337 errors:0 dropped:0 overruns:0 frame:0 TX packets:4337 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2203478 (2.2 MB) TX bytes:2203478 (2.2 MB) 7.9 ¨⇧§ Uï ttX0 ¨⇧§î ‰ë UïD ¿–Xp, å⌅∏Ë¥ò pt0| 0ÏXî Ω∞ Uï⌧ |D t©XÏ 0Ïi»‰. • ¨⇧§–⌧ ¨©Xî ‰ë Uï t⌧ )ïÖ»‰. UïD t⌧ |H–î 8⌧ ‰¥àµ»‰. 8⌧| ⌧| < x‹î Ñ–åî ¡àt ¸¥—»‰. $ cd $ cp -R /BIO/data/compress ./ compress $ cd compress $ gzip -d compress01.gz $ tar xvfz compress02.tar.gz $ unzip compress03.zip $ bzip2 -d comress04.bz2 $ tar xvfz compress05.tar.gz $ tar xvf compress06.tar.bz2 • gzip: Recommended for fast network connections • bzip2: Recommended for slower network connections (smaller size but takes longer to compress) • zip: Not recommended but is provided as an option for those who cannot open the above formats • ©…X Uï⌧ ⌅¥ pt0– t UïD t⌧X¿ J‡ ¯¨ |X ¥© UxXî )ïÖ»‰. FASTQ |ÒD UxXîp ©i»‰. $ gzip -dc CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.vcf.gz | more $ gzip -dc CEUTrio.HiSeq.WGS.b37.bestPractices.hg19.tar.gz | tar -tvf - 7.10 ¨⇧§ å⌅∏Ë¥ $XX0 |⇠ < ¨⇧§– å⌅∏Ë¥| $XXî )ï@ ‰LX 3 ¿ )ït ൻ‰. ´à¯î t ¨ (‰â) |D Uï ‹ ⌧ıXî )ï< ⌅Ëà UïD t⌧XÏ  ¨©t •X‰. Pà¯î ¨⇧§–⌧ ⌧ı Xî (§¿| t©Xî )ï< ∞Ñ,X Ω∞ APT|î (§¿ ¨ ⌅¯®D t©‰. 8à¯î å§ |D t©XÏ $XXî )ït‰. 7.10.1 APT| t© å⌅∏Ë¥ $X • APT| t© (§¿ ≈pt∏ $ apt -get update $ apt -get install bwa Reading package lists ... Done Building dependency tree Reading state information ... Done Use 'apt -get autoremove ' to remove them. Suggested packages: samtools
  25. 25. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 25 The following NEW packages will be installed: bwa 0 upgraded , 1 newly installed , 0 to remove and 153 not upgraded. Need to get 135 kB of archives. After this operation , 286 kB of additional disk space will be used. Fetched 135 kB in 3s (40.1 kB/s) Selecting previously unselected package bwa. (Reading database ...17 files and directories currently installed .) Unpacking bwa (from .../ archives/bwa_0 .6.1 -1 _amd64.deb) ... Processing triggers for man -db ... Setting up bwa (0.6.1 -1) ... $ bwa Program: bwa (alignment via Burrows -Wheeler transformation ) Version: 0.6.1 - r104 Contact: Heng Li <lh3@sanger.ac.uk > Usage: bwa <command > [options] Command: index index sequences in the FASTA format aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA -SW for long queries fastmap identify super -maximal exact matches fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ pac2cspac convert PAC to color -space PAC stdsw standard SW/NW alignment • NGS ( å⌅∏Ë¥ $X| ⌅t ¯¨ 0¯ $X⇠¥| Xî (§¿ ©]Ö»‰. $ apt -get update -y $ apt -get install gcc -y $ apt -get install make -y $ apt -get install zlib1g -dev -y $ apt -get install libncurses5 -dev -y $ apt -get install g++ -y $ apt -get install tcl tk -y $ apt -get install tcl -dev -y $ apt -get install unzip -y $ apt -get install curl -y $ apt -get install screen -y $ apt -get install python -dev -y $ apt -get install python -software -properties -y $ add -apt -repository ppa:webupd8team/java $ apt -get update -y $ apt -get install oracle -java7 -installer -y
  26. 26. Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach 26 7.10.2 å§ T‹ Ù |D µ å⌅∏Ë¥ $X • å§ $XX0 $ cd $ cp /BIO/app/bwa -0.7.4. tar.bz2 ./ $ tar xvf bwa -0.7.4. tar.bz2 $ cd bwa -0.7.4 $ make $ ./bwa Program: bwa (alignment via Burrows -Wheeler transformation ) Version: 0.7.4 - r385 Contact: Heng Li <lh3@sanger.ac.uk > Usage: bwa <command > [options] Command: index index sequences in the FASTA format mem BWA -MEM algorithm fastmap identify super -maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA -SW for long queries fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ $ bwa Program: bwa (alignment via Burrows -Wheeler transformation ) Version: 0.6.2 - r126 Contact: Heng Li <lh3@sanger.ac.uk > Usage: bwa <command > [options]

×