Your SlideShare is downloading. ×
2013 oct 2 rna sequencing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

2013 oct 2 rna sequencing

180
views

Published on

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
180
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Born 83 years ago July 251950 Kings College in London – joined scientific team studying living cells.Assigned to work on DNA with a graduate student – she assumed it was her own projectLabs second in command Maurice Wilkins was on vacation – returned relationship was muddiled.Personality differences – Franklin direct, quick, decisiveWilkins shy, speculative, passive.
  • Cancer landmarksTechnological advances
  • Cancer landmarksTechnological advances
  • There was only one processor that would work for the sequencing of the human genome – the only true 64 bit system – IBM had a hybrid system and SGI R4000 also not optimized for this architecture – SUN had the sparc architecture.IBM was too late with their system to help Celera with their computational infrastructure problem.
  • There was only one processor that would work for the sequencing of the human genome – the only true 64 bit system – IBM had a hyprid system and SGI R4000 also not optimized for this architecture – SUN had the sparc architecture.IBM was too late with their system to help Celera with their computational infrastructure problem.
  • There was only one processor that would work for the sequencing of the human genome – the only true 64 bit system – IBM had a hyprid system and SGI R4000 also not optimized for this architecture – SUN had the sparc architecture.IBM was too late with their system to help Celera with their computational infrastructure problem.
  • There was only one processor that would work for the sequencing of the human genome – the only true 64 bit system – IBM had a hyprid system and SGI R4000 also not optimized for this architecture – SUN had the sparc architecture.IBM was too late with their system to help Celera with their computational infrastructure problem.
  • Illustrating the nature of the reduction in DNA sequencing costs,Moore’s law describing a long-term trend in computer hardware industry that The doubling of ‘compute power’ every two years.(compute power equivalent is the number of transistors – i.e. transistor counts for integrated circuits Plotted against their dates of introduction.Law named after intel co-founder Gordon E. Moore – described the trend in 1965 paper.“noting the number of components in integrated circuits had doubled every year from the invention of the integrated circuit”Technology improvements that ‘keep up’ with moores law are widely regarding to be doing extremely well Sequencing data centers transitioned from Sanger-based (dideoxy chain termination sequencing) to ‘second’ or ‘next’ generation DNA sequencing.We’ve gotten to a point where we have huge amounts of data available to us In the public GEO database, there are already half a million microarray datasets – and many new ones are added every day
  • RNA Seq allows you to discover and profile the entire transcriptomeNo ProbesNo PrimersRNA Seq delivers unbiased, unparalleled information about the transcriptome.Simple Sequencing WorkflowIlluminas optimized TRUSeq RNA Sample Prep Kits.
  • align-reads then assemble-alignments approach
  • Trinity supports an alternative, hybrid approach to genome-based transcript reconstruction that uses a combination of RNA-Seq alignments to a genome coupled with RNA-seq read de novo assembly and transcript alignment assembly. This alternative approach involves four major steps: align-reads, assemble-reads, align-transcripts, then assemble-transcript_alignments.
  • http://bayes.cs.ucla.edu/home.htm Judea Pearl
  • Cancer therapeuticsVisit web site and play video if possible http://www.modernatx.com/ 1:52 MINUTES
  • Transcript

    • 1. RNA-sequencing: Taking Advantage of this Measurement Revolution October 1, 2013 Anne Deslattes Mays Wellstein/Riegel Laboratory Mentor: Anton Wellstein, MD, PhD 10/2/2013 Wellstein/Riegel Laboratory 1
    • 2. Talk Outline • On the Shoulders of Giants • Timelines • Personal Genome Project • RNA-Sequencing • Causality • Messenger Therapeutics 10/2/2013 Wellstein/Riegel Laboratory 2
    • 3. 10/2/2013 Wellstein/Riegel Laboratory 3 Rosalind Franklin “pioneered use of x-rays to create images of unorganized matter – such as large biological molecules – not just single crystals” http://www.pbs.org/wgbh/aso/databank/entries/bofran.html “Franklin made equipment adjustments to produce an extremely fine beam of x-rays. She extracted finer DNA fibers than ever before and arranged them in parallel bundles. Studied fibers’ reactions to humid conditions. … allowed her to discover cruical keys to DNA’s structure…. Wilkins shared this with Watson & Crick at Cambridge without her knowledge…”
    • 4. 10/2/2013 Wellstein/Riegel Laboratory 4
    • 5. 10/2/2013 Wellstein/Riegel Laboratory 5
    • 6. 10/2/2013 6 Computer Architecture Advances (64 bit) 1961 IBM 7030 Stretch Supercomputer 64 bit data words 32/64 bit instructions 1976 Cray-1 super computer 64-bit word architecture 1989 Intel i860 RISC processor “64-bit microprocessor” 32 bit architecture 3D graphics unit capable of 64 bit integer operations 1991 R4000 – 64 bit microprocessor SGI graphics workstation used this CPU 1992 DEC introduces pure 64-bit Alpha architecture 1997 IBM releases RS64 64-bit PowerPC (partial) 1999 Intel releases instruction set for IA-64 2003 AMD Opteron and Athlon 64 processors (AMD64 – first x86 based 64 bit processor Apple ships “G5” POWER PC CPU 2013 Apple announces iPhone 5s first 64 bit smartphone in the world A7ARMv8 system on a chip
    • 7. 10/2/2013 7 Computer Architecture Advances (64 bit) 1961 IBM 7030 Stretch Supercomputer 64 bit data words 32/64 bit instructions 1976 Cray-1 super computer 64-bit word architecture 1989 Intel i860 RISC processor “64-bit microprocessor” 32 bit architecture 3D graphics unit capable of 64 bit integer operations 1991 R4000 – 64 bit microprocessor SGI graphics workstation used this CPU 1992 DEC introduces pure 64-bit Alpha architecture 1997 IBM releases RS64 64-bit PowerPC (partial) 1999 Intel releases instruction set for IA-64 2003 AMD Opteron and Athlon 64 processors (AMD64 – first x86 based 64 bit processor Apple ships “G5” POWER PC CPU 2013 Apple announces iPhone 5s first 64 bit smartphone in the world A7ARMv8 system on a chip
    • 8. 10/2/2013 8 Computer Operating Systems (64 bit) 1985 Cray releases UNICOS 64 bit implementation of unix 1976 Cray-1 super computer 64-bit word architecture 1993 DEC releases DEC OSF/1 AXP Unix-like OS Later Named Tru64 UNIX 1991 R4000 – 64 bit microprocessor SGI graphics workstation used this CPU 1996 IRIX operating system supports 64 bit 2001 Linux first OS to support x86-64 (simulator – chip wasn’t there yet) 1999 Intel releases instruction set for IA-64 2003 Mac OS X 10.3 64 bit integer arithmetic support 2013 iOS7 AArch64 processors 65 bit kernal supporting 64 bit applications
    • 9. Celera Infrastructure Choice 1998 Brian Reid Palo Alto IX Visit 1998 Bench marked TIGR assembler on available architectures SGI, Sun SPARC IBM RISC DEC TRU64 Alpha 1998 DEC’s TRU 64 Architecture won out 1998 COMPAQ buys DEC
    • 10. 10/2/2013 Wellstein/Riegel Laboratory 10
    • 11. 10/2/2013 Wellstein/Riegel Laboratory 11
    • 12. 10/2/2013 Wellstein/Riegel Laboratory 12
    • 13. 10/2/2013 Wellstein/Riegel Laboratory 13 http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_ GET_Conference
    • 14. 10/2/2013 Wellstein/Riegel Laboratory 14 http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_ Conference
    • 15. 10/2/2013 Wellstein/Riegel Laboratory 15 http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_ Conference
    • 16. 10/2/2013 Wellstein/Riegel Laboratory 16
    • 17. 10/2/2013 Wellstein/Riegel Laboratory 17
    • 18. 10/2/2013 Wellstein/Riegel Laboratory 18
    • 19. 10/2/2013 Wellstein/Riegel Laboratory 19
    • 20. 10/2/2013 Wellstein/Riegel Laboratory 20 http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_ GET_Conference
    • 21. 10/2/2013 Wellstein/Riegel Laboratory 21 http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_ GET_Conference
    • 22. Cancer Systems Biology Taking advantage of measurement revolution Declining sequencing costs, decreasing computing costs How do you leverage all this data? GEO May 25, 2012 GEO June 25, 2013
    • 23. Here is an example RNA-Seq Workflow 10/2/2013 Wellstein/Riegel Laboratory 23 Experimental Design Sample Collection Quality Control Read Trimming Differential Analysis Transcript Identification Pathway Analysis Feature Discovery Sequencing
    • 24. 10/2/2013 Wellstein/Riegel Laboratory 24 http://rnaseq.uoregon.edu/index.html
    • 25. 10/2/2013 Wellstein/Riegel Laboratory 25 http://rnaseq.uoregon.edu/index.html
    • 26. 10/2/2013 Wellstein/Riegel Laboratory 26 http://rnaseq.uoregon.edu/index.html
    • 27. 10/2/2013 Wellstein/Riegel Laboratory 27 http://rnaseq.uoregon.edu/index.html
    • 28. Replicates: Type I and Type II errors 10/2/2013 Wellstein/Riegel Laboratory 28
    • 29. Detecting Signal vs. Noise 10/2/2013 Wellstein/Riegel Laboratory 29
    • 30. 10/2/2013 Wellstein/Riegel Laboratory 30
    • 31. 10/2/2013 Wellstein/Riegel Laboratory 31 RNA-seq
    • 32. What is unique about RNA-Seq? • Allows you to discover and profile the entire transcriptome of any organism • No probes or primers to design • Novel transcripts • Novel isoforms • Alternative splice sites • Rare transcripts • cSNPS – all of this in one experiment 10/2/2013 Wellstein/Riegel Laboratory 32
    • 33. 10/2/2013 Wellstein/Riegel Laboratory 33
    • 34. 10/2/2013 Wellstein/Riegel Laboratory 34
    • 35. RNA Alternative Splicing: Why you need gapped aligners 10/2/2013 Wellstein/Riegel Laboratory 35
    • 36. 10/2/2013 Wellstein/Riegel Laboratory 36 How much RNA-sequencing data? 1. 20 million paired end reads ~ 2 GB of data 2. 100 million paired end reads ~ 10 GB of data How much computation power? 1. More memory, more processors, less time it takes to compute 2. Outsource the analysis, still will need to store the results somewhere Amazon web services S3 storage EC elastic cloud on demand computational facility Georgetown University High Performance Computer Core matrix.georgetown.edu UPENN Galaxy services How much RNA-sequencing data, how much computation power and where do you go to compute?
    • 37. 10/2/2013 Wellstein/Riegel Laboratory 37 Galaxy is a web based tool committed to enable a researcher (more than just for RNA-Seq)
    • 38. 10/2/2013 Wellstein/Riegel Laboratory 38
    • 39. How to visualize mapped results? • UCSC Genome Browser (Gbrowse) • Integrated Genome Browser (IGB) • Integrated Genome Viewer (IGV) Many shared formats, reading many of the outputs generated by the programs, ability to generate ones own tracks 10/2/2013 Wellstein/Riegel Laboratory 39
    • 40. 10/2/2013 Wellstein/Riegel Laboratory 40 Scale chr21: DNase Clusters Multiz Align Human mRNAs K562 CTCF Int 1 K562 Pol2 Int 1 HeLaS3 Pol2 Int 1 GM12878 1 H1-hESC 1 K562 1 HeLa-S3 1 HepG2 1 GM12878 H1-hESC K562 HeLa-S3 HepG2 HUVEC GM12878 Pk H1-hESC Pk K562 Pk HeLa-S3 Pk 50 kb hg19 23,600,000 23,650,000 C7 Random C7 Targeted Transcription Factor ChIP-seq from ENCODE SwitchGear Genomics Transcription Start Sites H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE RefSeq Genes Human ESTs That Have Been Spliced Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE Vertebrate Multiz Alignment & Conservation (46 Species) UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) Simple Nucleotide Polymorphisms (dbSNP 137) Found in >= 1% of Samples Individual matches for article Przybylski2010 Sequences in Articles: PubmedCentral and Elsevier SNPs in Publications Human mRNAs from GenBank Regulatory elements from ORegAnno Chromatin Interaction Analysis Paired-End Tags (ChIA-PET) from ENCODE/GIS-Ruan DNA Methylation by Reduced Representation Bisulfite Seq from ENCODE/HudsonAlpha CpG Methylation by Methyl 450K Bead Arrays from ENCODE/HAIB Chromatin Interactions by 5C from ENCODE/Dekker Univ. Mass. HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:3356:23592_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2204:15017:145130_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:8319:79365_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2107:12368:117403_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2208:7212:116648_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2205:11321:72079_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_1:N:0:CTCTCA HWI-ST1129:97:D0LRDACXX:6:1203:1649:66972_2:N:0:CTCTCA HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2106:11187:101221_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2102:8052:88370_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2108:5000:141429_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_2:N:0:CACTCC HWI-ST1129:97:D0LRDACXX:6:1303:16417:184679_1:N:0:CACTCC HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_1:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2106:18235:74385_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_2:N:0:CACTCA HWI-ST1129:97:D0LRDACXX:6:2201:15196:5280_1:N:0:CACTCA HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_1:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_1:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1305:12160:63303_2:N:0:ATCACG HWI-ST1129:299:C18KJACXX:6:1102:19732:75986_2:N:0:ATCACG KCEBPB LMafK_(ab50322) KTAL1_(SC-12984) KCEBPB KKYY1 KTBP KE2F4 KTAF1 KELF1_(SC-631) KPol2-4H8 KHEY1 KE2F6_(H-50) KCEBPB KTFIIIC-110 ggNFKB GgPU.1 GBATF GIRF4_(M-17) GBCL11A GgPU.1 gPU.1 KCEBPB DA743484 BF207587 Delgado-Olguin2004 Layered H3K27Ac 100 _ 0 _ Mammal Cons K562 CTCF Sig 1 K562 Pol2 Sig 1 HeLaS3 Pol2 Sig 1
    • 41. 10/2/2013 Wellstein/Riegel Laboratory 41
    • 42. 10/2/2013 Wellstein/Riegel Laboratory 42
    • 43. What do RNA-Seq reads look like for GAPDH? Repeat masked allowing 1/2 mismatched bases blat’d reads viewed in IGB 6.7.2
    • 44. 10/2/2013 Wellstein/Riegel Laboratory 44 RNA-Seq Differential Expression analysis
    • 45. What does GAPDH look like in terms of quantitation? TOTAL BM HPP RPKM 3SEQ Counts BLAT Reads RPKM 3SEQ Counts BLAT Reads CD34 0.7 340 230 8 8 14 BST1 19.7 5374 31 31 CD133 0.2 173 176 16 16 33 THY1 0 7 4 4 A12 1 0 A5 0 0 ALK 0 9 24 0 0 3 B9 0 0 C1 0 0 C2 0 0 C7 0 0 E7 0 0 E9 2 0 F6 0 0 G12 0 0 GAPDH 3013.2 727831 356289 120.8 5559 2670 H3 0 0 Blat read raw counts ratio == 3Seq counts ratio ~= 130 to 1 RPKM ratio ~= 24.3
    • 46. 10/2/2013 Wellstein/Riegel Laboratory 46
    • 47. 10/2/2013 Wellstein/Riegel Laboratory 47
    • 48. 10/2/2013 Wellstein/Riegel Laboratory 48
    • 49. Given a list of differentially expressed Genes now enrichment analysis should be performed • Enrichment analysis allows the researcher to leverage documented experiments which provide evidence for genes roles in pathways and functions that enable the researcher to determine the results and significance of their experiments • DAVID – Gene ontology – Functional ontology • Revigo – Output of David may be placed in REVIGO for further interpretation and statistical exploration of significance of discovered sets of genes 10/2/2013 Wellstein/Riegel Laboratory 49
    • 50. Using differentially expressed genes, biological pathways should be explored • Differentially expressed genes are put into programs such as pathway studio or ingenuity • Shortest path programs and • Canonical pathway analysis • Enables a researcher to reverse engineer the pathways expressed in the course of a healthy response to a diseased response • Ideally a pathway reveals the observed phenotype – connecting the expressed gene expression program with the phenotype – genotype – gene expression program to phenotype 10/2/2013 Wellstein/Riegel Laboratory 50
    • 51. 10/2/2013 Wellstein/Riegel Laboratory 51
    • 52. 10/2/2013 Wellstein/Riegel Laboratory 52 http://bayes.cs.ucla.edu/home.htm
    • 53. 10/2/2013 Wellstein/Riegel Laboratory 53
    • 54. 10/2/2013 Wellstein/Riegel Laboratory 54
    • 55. RNA-Sequencing: What is it good for? • Transcript Annotation – Mutation identification – Isoform determination – Alternative Splice Variation • Differential Gene Expression – Phenotypically segregating experiments – Allows us to get at the How in looking at the response of an organism within a particular cell population to events – Good and careful design will allow us to unfold the dynamics of this response and identify targets for altering disease responses to improve ones chances of surviving 10/2/2013 Wellstein/Riegel Laboratory 55
    • 56. 10/2/2013 Wellstein/Riegel Laboratory 56 Thank-you Dr. Anton Wellstein Dr. Anna Riegel Dr. Marcel Schmidt Dr. Elena Tassi The entire lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Eric the entire Wellstein/Riegel laboratory My Committee Dr. Yuri Gusev Dr. Anatoly Dritschilo Dr. Michael Johnson Dr. Christopher Loffredo Dr. Habtom Ressom Dr. Terry Ryan (external committee member) High Performance Core Group, Steve Moore, especially Woonki Chung Amazon Cloud Services Dr. Ann Loraine, UNC, IGB Developer Brian Haas, Author Trinity Suite Keygene
    • 57. Some Resources • http://personalgenome.org • http://rnaseq.uoregon.edu/index.html • http://dx.doi.org/10.1038/npre.2010.4282.1 (DESeq) • http://galaxy.psu.edu/ • http://seqanswers.com/ • http://www.broadinstitute.org/igv/ • http://bioviz.org/igb/index.html • http://www.illumina.com • http://www.otogenetics.com • http://www.dnanexus.com • http://bioconductor.org/packages/2.12/bioc/html/limma.html • http://trinityrnaseq.sourceforge.net/ • http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html • http://cufflinks.cbcb.umd.edu/ • http://brb.nci.nih.gov/BRB-ArrayTools.html • http://www.modernatx.com/ 10/2/2013 Wellstein/Riegel Laboratory 57
    • 58. Systems Biology History (wikipedia) • Systems biology roots found in – Quantitative modeling of enzyme kinetics – Mathematical modeling of population growth – Simulations to study neurophysiology – Control theory and cybernetics • Theorists – Ludwig von Bertalanffy – General Systems Theory – Alan Lloyd Hodgkin and Andrew Fielding Huxley – constructed a mathematical model that explained potential propagating along the axon of a neuron cell – Denis Nobel – first computer model of the heart Pacemaker 10/2/2013 Wellstein/Riegel Laboratory 58
    • 59. Scientific knowledge is limited (and advanced) by the limits (and advancements) of measurement 10/2/2013 Wellstein/Riegel Laboratory 59 • Ilya Shmulevich Genomic Signal Processing “Validity of the model involves observation and measurement, scientific knowledge is limited by the limits of measurement” • Erwin Shrödinger Science Theory and Man: “It really is the ultimate purpose of all schemes and models to serve as scaffolding for any observations that are at all means observable”