• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    LectureW2_Intro_Array LectureW2_Intro_Array Presentation Transcript

    • Friday (1/15) computer lab session: Location: 3073 (3rd floor), Department of Computational Biology, BST3, 3501 Fifth Avenue. Time: 9:30-10:45AM Play with R (tutorial) at home before the lab session.
    • Agenda
      • Introduction to microarray
        • Motivation & previous techniques
          • Concept of biological pathway
          • Northern blot, RT-PCR and real time RT-PCR
        • Affymetrix microarray experiment
        • cDNA microarray experiment
        • Comparison of the two
        • Codelink, Illumina & Agilent
        • MAQC (Microarray Quality Control) Project
      • Introduction to next generation sequencing (RNA-seq, ChIP-seq etc)
    • Review
    • The central dogma of molecular biology: DNA mRNA (messenger) rRNA (ribosomal) tRNA (transfer) Protein Ribosome transcription transcription transcription translation Microarray is a technology to globaly ( simultaneously detecting thousands of genes ) detect mRNA expression level.
    • Why detect expression level of protein or mRNA?
    • Cell cycle Cancer cells are malignant cells who don’t die but reproduce rapidly instead. Important to repair problematic mutations during cell division.
    • Example 1: p53 Pathway (an important tumor suppressor) Cancer cells are malignant cells who don’t die but reproduce rapidly instead. (DNA damaged) http://breast-cancer-research.com/content/pdf/bcr426.pdf
    • Example 2: KRas Pathway (an oncogene) (      upregulation;      downregulation) normal cell -- P53 properly suppress cell replication. -- Ras genes properly activate cell replication. From http://www.icnet.uk/axp/mphh/biomed/lemoine.html cancerous cell -- P53 doesn’t suppress cell replication. -- Ras genes overly expressed. Cells are overly replicated.
    • Prediction of a disease: If mechanism known, detecting expression level can help identifying cancer patients (e.g. unusual p53 or Kras expression activity). Exploratory: In general, microarray can help identify candidate genes that contribute to tumor progression and propose hypothesis of the underlying genetic network. Why detect expression level of protein or mRNA?
    • http://www.escience.ws/b572/L13/north.html Northern Blot (an old technique for measuring mRNA expression) mRNA extracted and purified. mRNA loaded for electrophoresis. Lane 1: size standards. Lane 2: RNA to be tested. The gel is charged and RNA “swim” through gel according to weight. - mRNA are transferred from the gel to a membrane. A labelled probe specific for the RNA fragment is incubated with the blot. So the RNA of interest can be detected. See next page for the details of this step. +
    • http://www.escience.ws/b572/L13/northupclose.html Norther Blot closeup (color staining) In this simplified cartoon, two mRNAs are bound on the membrane. The complement DNAs of A are prepared with label and are hybridized to all the mRNA on the membrane. The labeled complement DNA will bind to A but not B. After washing and detecting, abundance of the target mRNA can be seen.
    • See animation of RT-PCR: http://www.bio.davidson.edu/courses/Immunology/Flash/RT_PCR.html RT-PCR (reverse transcription-polymerase chain reaction) http://www.ambion.com/techlib/basics/rtpcr/ real-time RT-PCR
      • RNA is reverse transcribed to DNA.
      • PCR procedures can be used amplify DNA at exponential rate.
      • Gel quantification for the amplified product.
      • ---- an semi-quantitative method. Smaller amount of sample needed.
      • The PCR amplification can be monitored by fluorescence in “real time”.
      • The fluorescence values recorded in each cycle represent the amount of amplified product.
      • ---- a quantitative method. The current most advanced and accurate analysis for mRNA abundance. Usually used to validate microarray result.
      Often used to validate microarray
    • Limitation of the old techniques
      • Labor intensive
      • Can only detect up to dozens of genes.
        • (gene-by-gene analysis)
      • Need to know the target sequences. For RT-PCR, at least need to know the primer to start the PCR.
    • Various microarrays A new view on genomic level
    • Affymetrix GeneChip
    • from Affymetrix Inc . Overview of the Affymetrix GeneChip technology
    • From experiments to analysis
    • Details of labeling and hybridization TACGTATTGCAAAA TTTTGCAATACGTA TACGTATTGCAAAA (at C and T)
    • Notes
      • Only Pyrimidines (C and T) have biotin labeled. This is where the color intensities come from.
      • The fragmentation makes the biotin-labeled cRNA shorter and helps efficiency of hybridization.
      • Sequence info of the target mRNA should be known so the complementary sequence can be prepared on the array.
    • multiple probes (11~16) for each gene from Affymetrix Inc . Array Design 25-mer unique oligo mismatch in the middle nuclieotide
    • from Affymetrix Inc . Needs at most 4  25=100 masking and coupling. Technology adapted from semiconductor industry. ( photolithography and combinatorial chemistry) Array Manufacturing                                                              
    • Chip Advances HG-U95 HG-U133 Set HG-U133 Plus 2.0 Array sequence source Build 95 UniGene database (Oct, 2, 1999??) Build 133 UniGene database (April, 20, 2001) Build 133 UniGene database (April, 20, 2001) Probe uniqueness 21/25 bases Two 8-mers including at least one 12-mer Two 8-mers including at least one 12-mer # of probes ~16 11 11 # of arrays 5 2 1 # of transcripts ~54000 genes HG-U95Av2: ~12000 HG-U95B-E: ~44000 EST ~33,000 genes ~38500 genes Feature size 20 µm 18 µm 11 µm
      • Few years ago, U95 set had 5 arrays. Normally only U95Av2 is used.
      • Improved probe selection algorithm to avoid non-specific binding.
        • Decreased # of probes in each probe set (20 => 11)
      • Smaller probe size
      • 20 µm => 11 µm
      • More genes on each array and less cost
        • (Only one array for HG-U133 Plus )
      Chip Advances
    • Background adjustment Normalization Summarization
      • Give an expression measure for each probe set on each array
      • The result will greatly affect subsequent analysis (e.g. clustering and classification). If not modeled properly,
      • => “Garbage in, garbage out”
      Array Probe Level Analysis Normalization Background adjustment Summarization Details will be discussed in the next lecture.
    • Spotted cDNA microarray
    • From experiments to analysis
      • 48 grids in a 12x4 pattern.
      • Each grid has 12x16 features (spots).
      • Total 9216 features (spots).
      • Each pin prints 3 grids.
      Probe (array) printing
    • Probe design and printing
    • From Y. Chen et al. 1997 The experiment
    • From: http://www.techfak.uni-bielefeld.de/ags/ai/projects/microarray/ An image example Image analysis is more difficult than Affy array. The probes are spotted by robot instead of synthesized and the exact physical location is not known.
    • Comparison of cDNA array and GeneChip cDNA GeneChip Probe preparation Probes are cDNA fragments, usually amplified by PCR and spotted by robot. Probes are short oligos synthesized using a photolithographic approach. colors Two-color (measures relative intensity) One-color (measures absolute intensity) Gene representation One probe per gene 11-16 probe pairs per gene Probe length Long, varying lengths (hundreds to 1K bp) 25-mers Density Maximum of ~15000 probes. 38500 genes * 11 probes = 423500 probes
    • Affymetrix GeneChip One color design cDNA microarray Two color design Why the difference?
    • Affymetrix GeneChip Photolithography (The amount of oligos on a probe is well controlled) cDNA microarray Robotic spotting (The amount of cDNA spotted on a probe may vary greatly)
    • Advantage and disadvantage of cDNA array and GeneChip cDNA microarray Affymetrix GeneChip The data can be noisy and with variable quality Specific and sensitive. Result very reproducible. Cross(non-specific) hybridization can often happen. Hybridization more specific. May need a RNA amplification procedure. Can use small amount of RNA. More difficulty in image analysis. Image analysis and intensity extraction is easier. Need to search the database for gene annotation. More widely used. Better quality of gene annotation. Cheap. (both initial cost and per slide cost) Expensive (~$400 per array+labeling and hybridization) Can be custom made for special species. Only several popular species are available Do not need to know the exact DNA sequence. Need the DNA sequence for probe selection.
    • Other platforms of microarray
      • GE Codelink (out of market now)
      • Illumina
      • Agilent
    • Codelink
    • Fig. End-point attachment orients the DNA while the polymeric coating holds it away from the surface of the slide, making the DNA readily available for hybridization. Codelink’s Gel-matrix
    • Comparisons cDNA GeneChip Codelink Agilent Probe preparation Probes are cDNA fragments, usually amplified by PCR and spot ted by robot. Probes are short oligos synthesized using a photolithographic approach. 3-D aqueous gel matrix Probes are print ed by Inkjet technology from HP colors Two-color (measures relative intensity) One-color (measures absolute intensity) One-color One- or two-color Gene representation One probe per gene 11-16 probe pairs per gene One probe per gene One probe per gene Probe length Long, varying lengths (hundreds to 1K bp) 25-mers 30-mers 60-mers Density Maximum of ~15000 probes. 38500 genes * 11 probes = 423500 ~57000 ~22000 probes Manufacturer Stanford and many labs. Affymetrix company GE company Agilent company
    • Mechanisms in microarray
      • Important mechanisms that make microarray work:
      • Reverse transcription: mRNA => cDNA. This is usually also the step to label dyes.
        • (Protein can not be reverse translated to mRNA or to another form. So difficult to label dyes.)
      • Double strand binding of complimentary DNA sequences.
      • (Protein does not enjoy such a good property; there are 20 amino acids without complementary binding)
    • Microarray Quality Control (MAQC) Project a series of papers published in Nature Biotechnology (Sep 2006)
    • Previous paper in NAR 2003
      • Evaluation of gene expression measurements from commercial microarray platforms. Tan et al. Nucleic Acids Research. 2003. 31:5676-5684.
      • Poor consistency made it a concern for precise science and routine clinical use.
      • Three commercial platforms were compared.
      • Inconsistent result found across platforms
    • Experiment Design
      • 7 microarray platforms; each platform implemented in 3 test sites; 4 pools of RNA each with 5 replicates were performed. (3*4*5=60 arrays for each platform)
      • The 4 pools of RNA are: A. 100%UHRR; B. 100%HBRR; C. 75%UHRR + 25%HBRR; D. 25%UHRR + 75%HBRR.
      • UHRR: Universal Human Reference RNA from Stratagene
      • HBRR: Human Brain Reference RNA from Ambion
      • 3 RT-PCR based alternative gene expression platforms are also tested: TaqMan, StaRT-PCR and QuantiGene Assays.
    • Experiment Design
      • NCI has only 2 test site. AGL has only 2 samples. Some problematic arrays are removed.
      • AGL is not included in this paper. A total of 386 arrays are analyzed.
    • Difficulties in comparing multiple platforms
      • Each platform has different probe design
      • Sensitivity and specificity of the probes. (some variability of cross-platform may be due to this annotation problem)
      • Database (NCBI RefSeq) often change, making it difficult to match.
      • Probes may bind to multiple alternative spliced transcripts, which may have different functions and expression patterns.
    • Kuo(2006): probe matching within one exon for Gas1 Gene matching across different platforms is not easy. Essentially each platform detects different targets.
    • Match genes across platforms
      • All probes mapped to RefSeq and AceView database.
      • Each platform assayed 15,429-16,990 Entrez genes.
      • 23,971 in 24,157 RefSeq NM accessions assayed in at least on platform. Among them, 15,615 accessions (which correspond to 12,091 Entrez genes) were assayed in all platforms.
      • When multiple probes match to one RefSeq, only the probe closest to the 3’ end is used.
      • Finally each platform has 12,091 probes matching to a common set of 12,091 RefSeq from 12,091 different genes.
    • Number of detected genes called by manufactures’ software CV of 5 technical replicates
    • Blue: CV of 5 technical replicates Red: CV of all 15 replicates (5 technical replicates X 3 test sites)
    • Blue dot: percentage of genes concordantly called detected in each test site. Blue bar: percentage of genes concordantly called detected in all three test site.
    • Conclusions
      • Microarray provides an opportunity to measure thousands of genes simultaneously and make the global monitoring of cellular activities possible.
      • The method produces more noisy data and the choice of an adequate design and analysis is the key.
      • RT-PCR for validation of small number of genes.
      • Data obtained from different platforms and centers are consistent. Ready for routine clinical use.
    • Limitation
      • The method measures mRNA instead of proteins. The actual protein abundance and post-translation modification can not be detected.
      • The method usually does not measure spatial or temporal dynamics of the cellular activity.
      • The method is suitable for global monitoring and should be used to generate further hypothesis or should combine with other carefully designed experiments.
    • Introduction to next generation sequencing
    • Introduction
      • What is next generation sequencing?
        • Short reads (35~70 bps)
        • Higher throughput
        • Faster
        • Cheaper
    • Introduction
      • Comparing to traditional sequencing
        • Traditional Sequencing
          • No reference sequence available (ab initio)
          • Longer reads and additional linkage information required to assemble the entire sequence
        • Next Generation Sequencing
          • Reference sequence available (Sequenced by traditional sequencing)
          • No need of assembly, just map the short reads back to the reference sequence.
    • Technology
    • Technology
    • Technology
    • Technology
    • Technology
    • Technology
    • Major Applications
        • ChIP-Seq (Chromosome Immunoprecipitation)
        • A substitute for ChIP-chip
        • To find the binding sequence of proteins (TFBS)
      • RNA-Seq
        • A substitute for Microarray
        • To measure the amount of RNA expressed
    • RNA-Seq
      • Comparing to microarray
        • Microarray
          • Closed technology: Prior knowledge required
          • Affected by pseudo-genes (homologous of real genes)
          • Cheap and mature
        • RNA-Seq
          • Open technology: No prior knowledge required
          • Not affected by pseudo-genes because exact sequence is measured
          • Other information could be yielded (SNP, Alternative splicing)
          • Still more expensive than microarray
      • See also the following introduction slides: