20110114 Next Generation Sequencing Course

14,685 views
14,328 views

Published on

Next Generation Sequencing course
2011-01-14 Nantes (By the way, I remember where I found this idea of using Star-Trek: it came from a presentation of the GATK team)

Published in: Technology, Business
3 Comments
15 Likes
Statistics
Notes
  • sorry but slides are not informative , wish you use more word to explain about what is what ,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Lots of examples of useful command line invocations of samtools and more.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very impressive slide! Thank you for sharing !
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
14,685
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
902
Comments
3
Likes
15
Embeds 0
No embeds

No notes for slide
  • Samples consisting of longer fragments are first sheared into a random library of 100-300 base-pair long fragments. After fragmentation the ends of the obtained DNA-fragments are repaired and an A-overhang is added at the 3'-end of each strand. Afterwards, adaptors which are necessary for amplification and sequencing are ligated to both ends of the DNA-fragments. These fragments are then size selected and purified.
  • From the following article Next-generation DNA sequencing Jay Shendure & Hanlee Ji Nature Biotechnology 26, 1135 - 1145 (2008) Published online: 9 October 2008 doi:10.1038/nbt1486 (a) The 454, the Polonator and SOLiD platforms rely on emulsion PCR20 to amplify clonal sequencing features. In brief, an in vitro–constructed adaptor-flanked shotgun library (shown as gold and turquoise adaptors flanking unique inserts) is PCR amplified (that is, multi-template PCR, not multiplex PCR, as only a single primer pair is used, corresponding to the gold and turquoise adaptors) in the context of a water-in-oil emulsion. One of the PCR primers is tethered to the surface (5'-attached) of micron-scale beads that are also included in the reaction. A low template concentration results in most bead-containing compartments having either zero or one template molecule present. In productive emulsion compartments (where both a bead and template molecule is present), PCR amplicons are captured to the surface of the bead. After breaking the emulsion, beads bearing amplification products can be selectively enriched. Each clonally amplified bead will bear on its surface PCR products corresponding to amplification of a single molecule from the template library. (b) The Solexa technology relies on bridge PCR21, 22 (aka 'cluster PCR') to amplify clonal sequencing features. In brief, an in vitro–constructed adaptor-flanked shotgun library is PCR amplified, but both primers densely coat the surface of a solid substrate, attached at their 5' ends by a flexible linker. As a consequence, amplification products originating from any given member of the template library remain locally tethered near the point of origin. At the conclusion of the PCR, each clonal cluster contains approx1,000 copies of a single member of the template library. Accurate measurement of the concentration of the template library is critical to maximize the cluster density while simultaneously avoiding overcrowding.
  • During sequencing the huge amount of generated clusters are sequenced simultaneously. The DNA-templates are copied base by base using the four nucleotides (ACGT) which are fluorescently-labeled and reversibly terminated. After each synthesis step, the clusters are excited by a laser which causes fluorescence of the last incorporated base. After that, the fluorescence label and the blocking group are removed allowing the addition of the next base. The flourescence signal after each incorporation step is captured by a built-in camera, producing images of the flow cell.
  • The emPCR amplifies each fragment several million times. After amplification the emulsion shell is broken and the clonally amplified beads are ready for loading onto the fibre-optic PicoTiterDevice for sequencing.
  • he template strand is represented in red, the annealed primer is shown in black and the DNA polymerase is shown as the green oval. Incorporation of the complementary base (the blue "G") generates inorganic pyrophosphate (PPi), which is converted to ATP by the sulfurylase (blue arrow). Luciferase (red arrow) uses the ATP to convert luciferin to oxyluciferin, producing light.
  • Genome Biol. 2009; 10(3): R32. Published online 2009 March 27. doi: 10.1186/gb-2009-10-3-r32. PMCID: PMC2691003 Copyright © 2009 Harismendy et al.; licensee BioMed Central Ltd. Evaluation of next generation sequencing platforms for population targeted sequencing studies Olivier Harismendy,#1 Pauline C Ng,#2 Robert L Strausberg,2 Xiaoyun Wang,1 Timothy B Stockwell,2 Karen Y Beeson,2 Nicholas J Schork,1 Sarah S Murray,1 Eric J Topol,1 Samuel Levy,corresponding author2 and Kelly A Frazercorresponding author1 Performance metrics of NGS technologies. (a-f) Error bars represent minimum and maximum values obtained from the four samples. (g-i) Venn diagram representation of false positive calls (g), false negative calls (h) and discrepant variants calls (i). The inset caption displays the color-coding of each NGS technology and overlaps: for Roche 454 (red), Illumina GA (yellow) and ABI SOLiD (blue). For each NGS platform the number of base calls with errors associated with specific sequence contexts is given (repeat = repetitive element). When two sequence contexts are present they are both listed.
  • Historical trends in storage prices versus DNA sequencing costs. The blue squares describe the historic cost of disk prices in megabytes per US dollar. The long-term trend (blue line, which is a straight line here because the plot is logarithmic) shows exponential growth in storage per dollar with a doubling time of roughly 1.5 years. The cost of DNA sequencing, expressed in base pairs per dollar, is shown by the red triangles. It follows an exponential curve (yellow line) with a doubling time slightly slower than disk storage until 2004, when next generation sequencing (NGS) causes an inflection in the curve to a doubling time of less than 6 months (red line). These curves are not corrected for inflation or for the 'fully loaded' cost of sequencing and disk storage, which would include personnel costs, depreciation and overhead.
  • Cloud computing and the DNA data race Journal name: Nature Biotechnology Volume: 28, Pages: 691–693 Year published: (2010) DOI: doi:10.1038/nbt0710-691
  • HWUSI-EAS100R the unique instrument name 6 flowcell lane 73 tile number within the flowcell lane 941 'x'-coordinate of the cluster within the tile 1973 'y'-coordinate of the cluster within the tile #0 index number for a multiplexed sample (0 for no indexing) /1 the member of a pair, /1 or /2 (paired-end or mate-pair reads only)
  • TDe novo fragment assembly with short mate-paired reads: Does the read length matter?, doi: 10.1101/gr.079053.108 Genome Res. 2009. 19: 336-346 positional profile of base-calling errors for Illumina reads for 2 million 50-nt-long reads from a human BAC. The error rate across reads is shown (solid line) along with the error rate for reads with a fixed number of errors. The erroneous nucleotides in each read are detected by mapping the read to the reference genome. The high error rate in position 6 is due to the bias in our particular data set rather than a systematic problem with the Illumina technology.
  • Sequence reads with associated read identifiers are shown, with the regions that will be used for seed selection in capital letters and matched seeds of 0011 and 1100. Given read identifiers are associated with the seeds using a hash function (for example, a unique integer representation of each seed). Once such a hash table has been built for either the input read set or the reference genome, the corresponding data can be scanned with the same hash function, resulting in a much smaller subset of reads to more exactly align at each location in the genome.
  • Schematic representation of our implementation of the de Bruijn graph. Each node, represented by a single rectangle, represents a series of overlapping k-mers (in this case, k = 5), listed directly above or below. (Red) The last nucleotide of each k-mer. The sequence of those final nucleotides, copied in large letters in the rectangle, is the sequence of the node. The twin node, directly attached to the node, either below or above, represents the reverse series of reverse complement k-mers. Arcs are represented as arrows between nodes. The last k-mer of an arc’s origin overlaps with the first of its destination. Each arc has a symmetric arc. Note that the two nodes on the left could be merged into one without loss of information, because they form a chain.
  • Genome Res. 2009 Sep;19(9):1586-92. Epub 2009 Aug 5. Sensitive and accurate detection of copy number variants using read depth of coverage. Yoon S, Xuan Z, Makarov V, Ye K, Sebat J. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
  • cut -d ' ' -f 1,2 pileup.filtered.txt | awk '{printf("%s\\t%d\\t%d\\n",$1,int($2)-1,int($2));}' > $@
  • 20110114 Next Generation Sequencing Course

    1. 1. Next Generation Sequencing Nantes, December 10 th 2010 Pierre Lindenbaum PhD [email_address] http://plindenbaum.blogspot.com Twitter: @yokofakun Insititut du Thorax - INSERM UMR915
    2. 2. http://en.wikipedia.org/wiki/File:The_Thinker,_Rodin.jpg About me
    3. 3. This presentation will be posted on http://www.slideshare.net/lindenb
    4. 4. Thank you Biostar ( Istvan Albert,Jeremy Leipzig... ) http://biostar.stackexchange.com/questions/3355
    5. 5. “Next” Generation ?
    6. 6. http://en.wikipedia.org/wiki/File:ST_TOS_Cast.jpg
    7. 7. http://commons.wikimedia.org/wiki/File:Frederick_Sanger2.jpg 1977
    8. 8. http://en.wikipedia.org/wiki/File:Sequencing.jpg
    9. 9. http://en.wikipedia.org/wiki/Star_Trek:_The_Motion_Picture
    10. 10. http://www.flickr.com/photos/widdowquinn/4119516803/
    11. 11. http://commons.wikimedia.org/wiki/File:Sanger_sequencing_read_display.gif
    12. 13. http://www.nature.com/
    13. 14. http://en.wikipedia.org/wiki/Star_Trek_Next_Generation
    14. 15. 3 Main Technologies Solid
    15. 17. http://www.dkfz.de/gpcf/850.html
    16. 18. Credit: Illumina
    17. 20. http://www.dkfz.de/gpcf/850.html
    18. 21. http://www.illumina.com/technology/paired_end_sequencing_assay.ilmn
    19. 23. http://www.dkfz.de/gpcf/849.html
    20. 27. http://www.flickr.com/photos/doe_jgi/4093644608
    21. 29. The development and impact of 454 sequencing Jonathan M Rothberg & John H Leamon Nature Biotechnology 26, 1117 - 1124 (2008) Published online: 9 October 2008 doi:10.1038/nbt1485
    22. 33. Genome Biol. 2009; 10(3): R32. Published online 2009 March 27. doi: 10.1186/gb-2009-10-3-r32. Evaluation of next generation sequencing platforms for population targeted sequencing studies
    23. 35. Published online 20 November 2008 | Nature | doi:10.1038/news.2008.1245 Human genomes in minutes? Not yet, but biotechnology company is on track for 2013.
    24. 37. Sequencing technologies — the next generation Michael L. Metzker Nature Reviews Genetics 11, 31-46 (January 2010) doi:10.1038/nrg2626
    25. 38. Storage
    26. 40. http://blogs.forbes.com/sciencebiz/2010/06/03/your-genome-is-coming/
    27. 41. Genome Biol. 2010;11(5):207. Epub 2010 May 5. The case for cloud computing in genome informatics.
    28. 42. http://www.flickr.com/photos/esquimo_2ooo/5241744434/
    29. 43. http://www.flickr.com/photos/jpf/152611490/
    30. 44. http://commons.wikimedia.org/wiki/File:Torchlight_zip.png
    31. 45. http://www.flickr.com/photos/coreburn/487357814/
    32. 47. http://www.cloudera.com/what-is-hadoop/hadoop-overview/
    33. 50. FASTQ
    34. 51. @IL31_4368:1:1:996:8507/2 TCCCTTACCCCCAAGCTCCATACCCTCCTAATGCCCACACCTCTTACCTTAGGA + FFCEFFFEEFFFFFFFEFFEFFFEFCFC<EEFEFFFCEFF<;EEFF=FEE?FCE @IL31_4368:1:1:996:21421/2 CAAAAACTTTCACTTTACCTGCCGGGTTTCCCAGTTTACATTCCACTGTTTGAC + >DBDDB,B9BAA4AAB7BB?7BBB=91;+*@;5<87+*=/*@@?9=73=.7)7* @IL31_4368:1:1:997:10572/2 GATCTTCTGTGACTGGAAGAAAATGTGTTACATATTACATTTCTGTCCCCATTG + E?=EECE<EEEE98EEEEAEEBD??BE@AEAB><EEABCEEDEC<<EBDA=DEE @IL31_4368:1:1:997:15684/2 CAGCCTCAGATTCAGCATTCTCAAATTCAGCTGCGGCTGAAACAGCAGCAGGAC + EEEEDEEE9EAEEDEEEEEEEEEECEEAAEEDEE<CD=D=*BCAC?;CB,<D@, @IL31_4368:1:1:997:15249/2 AATGTTCTGAAACCTCTGAGAAAGCAAATATTTATTTTAATGAAAAATCCTTAT + EDEEC;EEE;EEE?EECE;7AEEEEEE07EECEA;D6D>+EE4E7EEE4;E=EA @IL31_4368:1:1:997:6273/2 ACATTTACCAAGACCAAAGGAAACTTACCTTGCAAGAATTAGACAGTTCATTTG + EEAAFFFEEFEFCFAFFAFCCFFEFEF>EFFFFB?ABA@ECEE=<F@DE@DDF; @IL31_4368:1:1:997:1657/2 CCCACCTCTCTCAATGTTTTCCATATGGCAGGGACTCAGCACAGGTGGATTAAT (...)
    35. 52. The syntax of Solexa/Illumina read format is almost identical to the FASTQ format, but the qualities are scaled differently. Given a character $sq, the following Perl code gives the Phred quality $Q: $Q = 10 * log(1 + 10 ** (ord($sq) - 64) / 10.0)) / log(10); http://maq.sourceforge.net/fastq.shtml Solexa/Illumina Read Format
    36. 54. Mapping the short reads on A reference genome
    37. 55. “ Running these accurate alignment algorithms as a full search of all possible places where the sequence may map is computationally infeasible.” Sense from sequence reads: methods for alignment and assembly Paul Flicek & Ewan Birney Nature Methods 6, S6 - S12 (2009) Published online: 15 October 2009 Corrected online: 6 May 2010 doi:10.1038/nmeth.1376
    38. 56. HashTable Sense from sequence reads: methods for alignment and assembly Paul Flicek & Ewan Birney Nature Methods 6, S6 - S12 (2009) doi:10.1038/nmeth.1376
    39. 57. SOAP1 BFAST MOSAIK Hash Reads MAQ Illumina's ELAND Hash Reference
    40. 58. Burrows-Wheeler Sense from sequence reads: methods for alignment and assembly Paul Flicek & Ewan Birney Nature Methods 6, S6 - S12 (2009) doi:10.1038/nmeth.1376
    41. 59. SOAP2 Bowtie BWA
    42. 60. http://www.broadinstitute.org/gsa/wiki/index.php/File:ExampleDiagram.png
    43. 61. DE NOVO SEQUENCING
    44. 62. Bruijn graphs Velvet: Algorithms for de novo short read assembly using de Bruijn graphs doi: 10.1101/gr.074492.107 Genome Res. 2008. 18: 821-829
    45. 63. Sense from sequence reads: methods for alignment and assembly Paul Flicek & Ewan Birney Nature Methods 6, S6 - S12 (2009) doi:10.1038/nmeth.1376
    46. 64. CNV detection Genome Res. 2009 Sep;19(9):1586-92. Epub 2009 Aug 5. Sensitive and accurate detection of copy number variants using read depth of coverage.
    47. 65. RNA-SEQ http://en.wikipedia.org/wiki/File:RNA-Seq-alignment.png gene regulation protein information
    48. 66. Exome Sequencing http://en.wikipedia.org/wiki/File:Exome_Sequencing_Workflow_1a.png
    49. 67. SAM A generic nucleotide alignment format Bioinformatics. 2009 Aug 15;25(16):2078-9. Epub 2009 Jun 8. The Sequence Alignment/Map format and SAMtools.
    50. 68. human-readable, scriptable
    51. 69. Field 1: Query name Field 2: Flag Field 3: Reference sequence name Field 4: 1-based leftmost coordinate of the clipped sequence Field 5: Mapping quality Field 6: CIGAR strings Field 7: Mate reference sequence name Field 8: 1-based leftmost coordinate of the clipped sequence Field 9: Insert size (5’ to 5’) Field 10: Query sequence Field 11: Sequence qualities
    52. 70. 1 name: SRR018111.1786 2 flag: 83 (read paired/mapped/reverse strand/first in pair) 3 refseq: chr22 4 position: 31232437 5 qual : 17 6 cigar: 76M 7 = 8 clipped pos: 31232403 9 insert size: -110 10 GGCCCTTAAAATCACAAACTATGCTCAACTCACTCTCTACAGCTCTCATAATTTCCAAAATCTATTTTCTT 11 41===@B=AA??B?B@A?BAAAABBBA@B@C<B>B@BBACBBBBBBCBBCABABBCCCBBBBCBABBBCBB 12 XT:A:U 13 NM:i:4 14 SM:i:17 15 AM:i:17 16 X0:i:1 17 X1:i:0 18 XM:i:4 19 XO:i:0 20 XG:i:0 21 MD:Z:6A34T0T8C24
    53. 71. Text vs. binary format
    54. 72. SAMFileReader inputSam = new SAMFileReader(inputSamOrBamFile); SAMFileWriter outputSam = new SAMFileWriterFactory().makeSAMOrBAMWriter(inputSam.getFileHeader(), true, outputSamOrBamFile); for ( SAMRecord samRecord : inputSam) { samRecord.setReadName(samRecord.getReadName().toUpperCase()); outputSam.addAlignment(samRecord); } outputSam.close(); inputSam.close();
    55. 73. compact, indexed alignments
    56. 74. Is flexible enough to store all the alignment information generated by various alignment programs Is simple enough to be easily generated by alignment programs or converted from existing alignment formats Is compact in file size Allows most of operations on the alignment to work on a stream without loading the whole alignment into memory Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.
    57. 75. CIGAR Compact Idiosyncratic Gapped Alignment Report format 'M' shows a match 'I' shows an insertion 'D' shows a deletions 'H' hard clipping 'S' soft clipping http://www.flickr.com/photos/alexbrn/3032428454/
    58. 76. 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair 0x0002 the read is mapped in a proper pair 0x0004 the query sequence itself is unmapped 0x0008 the mate is unmapped 1 0x0010 strand of the query (0 for forward; 1 for reverse strand) 0x0020 strand of the mate 1 0x0040 the read is the first read in a pair 1,2 0x0080 the read is the second read in a pair 1,2 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records) 0x0200 the read fails platform/vendor quality checks 0x0400 the read is either a PCR duplicate or an optical duplicate SAM Flags
    59. 77. SAMTOOLS http://commons.wikimedia.org/wiki/File:Swiss_Army_Knife_Wenger_Opened_20050627.jpg
    60. 78. http://samtools.sourceforge.net/
    61. 79. http://gorgonzola.cshl.edu/pfb/2010/LectureNotes/ngs2/ngs2.pdf
    62. 80. Pileup seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6< seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<< Chrom Position Ref Coverage Read bases Qualities
    63. 81. Genome (re)sequencing (why ?) http://www.nature.com/news/2008/080122/full/451378b.html
    64. 82. Map to known sequence
    65. 84. Exome Sequencing: 30,508,378 reads * 55 bp = 1,677,960,790 bb
    66. 85. http://vcftools.sourceforge.net/specs.html VCF format
    67. 86. GATK
    68. 87. http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit
    69. 89. Visualizing the alignments
    70. 90. Samtools: TVIEW
    71. 91. http://www.broadinstitute.org/software/igv/
    72. 92. http://www.flickr.com/photos/ohm17/162622755/
    73. 96. Download FASTA sequence for chr22 (hg18)
    74. 97. curl --proxy ${PROXY} &quot;http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/chr22.fa.gz&quot; | gunzip -c > chr22.fa
    75. 98. What's the length of chr22 ?
    76. 99. Index chr22 with samtools
    77. 100. ${sam.bin} faidx chr22.fa
    78. 101. chr22 49691432 7 50 51
    79. 102. Get some FastQ files (simulation via samtools)
    80. 103. ${sam.dir}/misc/wgsim chr22.fa reads_1.fastq reads_2.fastq > _rand.txt
    81. 104. Index chr22 for BWA
    82. 105. ${bwa.bin} index -p chr22db -a bwtsw chr22.fa
    83. 106. 5, 4 ,3 ,2 , 1 .... Align !
    84. 107. ${bwa.bin} aln chr22db reads_1.fastq > aln1.sai ${bwa.bin} aln chr22db reads_2.fastq > aln2.sai
    85. 108. Generate alignments in the SAM format given paired-end reads
    86. 109. ${bwa.bin} sampe chr22db aln1.sai aln2.sai reads_1.fastq reads_2.fastq | > aln.sam
    87. 110. Convert SAM to BAM
    88. 111. ${sam.bin} view -b -T chr22.fa aln.sam > aln.bam
    89. 112. Sort the alignments by position
    90. 113. ${sam.bin} sort aln.bam sorted1
    91. 114. Remove the PCR duplicates
    92. 115. ${sam.bin} rmdup sorted1.bam sorted2.bam
    93. 116. Index the alignment
    94. 117. ${sam.bin} index sorted2.bam
    95. 118. What's the coverage/depth ?
    96. 119. java -jar ${gatk.jar} -T DepthOfCoverage -o file.depth -R chr22.fa -I sorted2.bam
    97. 120. GATK: recalibration
    98. 121. http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration
    99. 122. GATK: local realignment
    100. 123. http://www.broadinstitute.org/gsa/wiki/index.php/File:IndelRealignmentAlgorithm.png
    101. 124. java -jar ${gatk.jar} -T RealignerTargetCreator -R chr22.fa -o outputs.intervals -I sorted2.bam java -jar ${gatk.jar} -T IndelRealigner -I sorted2.bam -targetIntervals outputs.intervals -o $@ -R chr22.fa .... http://www.flickr.com/photos/didier57/2423562782/
    102. 125. Generate a pileup
    103. 126. ${sam.bin} pileup -v -c -f chr22.fa realigned.bam > pileup.txt
    104. 127. Filter the pileup
    105. 128. ${sam.dir}/misc/samtools.pl varFilter -d 5 pileup.txt > pileup.filtered.txt
    106. 129. Create a VCF
    107. 130. ${sam.dir}/misc/sam2vcf.pl -r chr22.fa < pileup.filtered.txt > pileup.vcf
    108. 131. View the alignment with tview
    109. 132. http://sift.jcvi.org/www/SIFT_chr_coords_submit.html
    110. 133. $1 Coordinates : 4,99981527,1,G/A $2 Codons : - $3 Transcript ID : $4 Protein ID : $5 Substitution : NA $6 Region : NON-GENIC $7 dbSNP ID : NA $8 SNP Type : NA $9 Prediction : Not scored $10 Score : NA $11 Median Info : NA $12 # Seqs at position : NA $13 Gene ID : !N/A $14 Gene Name : !N/A $15 Gene Desc : !N/A $16 Protein Family ID : !N/A $17 Protein Family Desc : !N/A $18 Transcript Status : !N/A $19 Protein Family Size : !N/A $20 OMIM Disease : !N/A $21 Average Allele Freqs : !N/A $22 CEU Allele Freqs : !N/A $23 User Comment : !N/A
    111. 134. http://genetics.bwh.harvard.edu/pph2/bgi.shtml
    112. 135. $1 #o_snp_id : chr19:1779391.TC.uc010dsr.1 $2 snp_id : chr19:1779391.TC.uc010dsr.1 $3 acc : Q05DB0 $4 pos : 87 $5 aa1 : N $6 aa2 : D $7 prediction : benign $8 pph2_prob : 0.001 $9 pph2_FPR : 0.86 $10 pph2_TPR : 0.994 $11 Comments : !N/A
    113. 136. Give Galaxy a try
    114. 137. http://main.g2.bx.psu.edu/ Galaxy: A platform for interactive large-scale genome analysis: Genome Res. 2005. 15: 1451-1455
    115. 138. Use UCSC Table Browser to find the SNPs
    116. 139. Use UCSC mysql server to find the SNPs, the genes,...
    117. 140. Create a UCSC Custom Track
    118. 141. http://ged.msu.edu/angus/tutorials/ucsc-visualization.html
    119. 142. Wig example browser position chr19:59304200-59310700 browser hide all track type=wiggle_0 name=&quot;variableStep&quot; description=&quot;variableStep format&quot; visibility=full autoScale=off viewLimits=0.0:25.0 color=50,150,255 yLineMark=11.76 yLineOnOff=on priority=10 variableStep chrom=chr19 span=150 59304701 10.0 59304901 12.5 59305401 15.0 59305601 17.5 59305901 20.0 59306081 17.5 59306301 15.0 59306691 12.5 59307871 10.0
    120. 143. Create a ROR database from the VCF file
    121. 144. mkdir -p RAILS rails RAILS/rails4pileup awk -F ' ' 'BEGIN {printf(&quot; create table vcfs(id integer primary key,chrom varchar(50), position int, ref varchar(2), alt varchar(50),depth int);n&quot;);} {printf(&quot;insert into vcfs(chrom,position,ref,alt,depth) values(&quot;%s&quot;,%s,&quot;%s&quot;,&quot;%s&quot;,%s);n&quot;,$$1,$$2,$$3,$$4,$$5);}' pileup.filtered.txt | sqlite3 RAILS/rails4pileup/db/vcf.sqlite3 ruby RAILS/rails4pileup/script/generate scafold vcf chrom:string position:int ref:string alt:string depth:int cat RAILS/rails4pileup/config/database.yml | sed 's/(test|development|production).sqlite3/vcf.sqlite3/' > /tmp/tmp.yml mv /tmp/tmp.yml RAILS/rails4pileup/config/database.yml echo &quot;http://localhost:3000/vcfs&quot;
    122. 145. The end.

    ×