Theme: Transcriptional Program in the Response of HumanFibroblasts to Serum.Etienne Z. GnimpiebaBRIN WS 2013Mount Marty Co...
Data manipulation Gene expression data analysisOMIC WorldDNAEDNAmRNAEDegradationDegradationTranslationTranscriptionGeneRep...
Data manipulation Gene expression data analysisOMIC WorldGENOMICSEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – Jun...
Data Manipulation Gene Expression Data AnalysisOMIC WorldGenomics is the sub discipline of genetics devoted to themapping,...
Data Manipulation Gene Expression Data AnalysisProcessBiological questionDifferentially expressed genesSample class predic...
Data Manipulation Gene Expression Data AnalysisProcessEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
Data Manipulation Gene Expression Data AnalysisMicroarray Production ProcessHigh densityfilters(macroarrays)Glass slides (...
Data Manipulation Gene Expression Data AnalysisMicroarray Production Process• Frouin, V. & Gidrol, X. (2005)• CBB group (B...
Data Manipulation Gene Expression Data AnalysisMicroarray Production Process• Frouin, V. & Gidrol, X. (2005)• CBB group (B...
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)•...
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics: Pre-Treatment• Frouin, V. & Gidrol, X. (2005)• CBB ...
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics : Differential Expression Analysis (1)• Frouin, V. &...
Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics : Differential Expression Analysis (2)• Frouin, V. &...
Data Manipulation Gene Expression Data AnalysisGEPAS: Gene Expression Pattern Analysis Suite• Frouin, V. & Gidrol, X. (200...
Microarray Dataset: Mining and GeneProfile Analysis using online ToolsKruer Lab
• Gene Expression Measurement• Microarray Process• Gene Expression Data Stores• Data Mining / Querying• Data Analysis• Exa...
Gene Expression Measurement Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /queri...
DatabaseMicroarrayExperimentSetsSampleProfilesDate ReportedArrayExpress at EBI 24,838 708,914 October 28, 2011ArrayTrack™ ...
Data Mining / Querying• Problem specification• Query• Extraction• Storage• Load• Pretreat / prepare for analysis Geneexpr...
Data Analysis• Question-Answer– Experimental condition profile: group comparison– Annotation profile: systems biological i...
• 3 Questions– What is the right dataset (experimentalcondition)?– Is dataset is ready for analysis (quality)?– What is th...
Boxplot Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extract...
Example: ATP13A2 Profilein Stress Conditions• Specification: ATP13A2 profile in stressconditions• Data querying:– GEO– Arr...
Resolution ProcessContextSpecification & AimsLab #2 Preprocessing Viewing Clustering Differential expression Classifi...
Upcoming SlideShare
Loading in …5
×

Session ii g3 overview behavior science mmc

344 views
262 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
344
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • During this lab, we have:A brief review Lab’s templateGenome exploration practice…
  • DNA fragments amplified by PCR technique are spotted on a microscopic glass slide coated with polylysine prior to spotting process. The polylysine coating goal is to ensure DNA fixation through electrostatic interactions. PCR fragments are in our case the expressed part (ORF) of the 6200 Saccharomyces cerevisae genes (baker yeast). Slide preparation is achieved by blocking the polylysine not fixed to DNA in order to avoid target binding. Prior to hybridisation, DNA is denatured to obtained a single strand DNA on the microarray, this will allow the probe to bind to the complementary strand from the target. Apart from glass slide microarray other types of chips exist
  • Target preparation:RNA are extracted from two yeast cultures from which we want to compare expression level. Messengers RNA are then transformed in cDNA by reverse transcription. On this stage, DNA from the first culture with a green dye, whereas DNA from the second culture is labelled with a red dye.Hybridisation:Green labelledcDNA and red labelled ones are mixed together (call the target) and put on the matrix of spotted single strand DNA (call the probe). The chip is then incubated one night at 60 degrees. At this temperature, a DNA strand that encounter the complementary strand and match together to create a double strand DNA. The fluorescent DNA will then hybridise on the spotted onesSlide scanning:A laser excites each spot and the fluorescent emission gather through a photo-multiplicator (PMT) coupled to a confocal microscope. We obtained two images where grey scales represent fluorescent intensities read. If we replace grey scales by green scales for the first image and red scales for the second one, we obtained by superimposing the two images one image composed of spots going from green ones (where only DNA from the first condition is fixed) to red (where only DNA from the second condition is fixed) passing through the yellow colour (where DNA from the two conditions are fixed on equal amount).Data analysis:We have now two microarray images from which we have to calculate the number of DNA molecules in each experimental condition. To dos o, we measure the signal amount in the green dye emission wavelength and the signal amount in the red dye emission wavelength. Then we normalise these amount according to various parameters (yeast amount in each culture condition, emission power of each dye, …). We suppose that the amount of fluorescent DNA fixed is proportional to the mRNA amount present in each cell at the beginning and we calculate the red/green fluorescence ratio. If this ratio is greater than 1 (red on the image), the gene expression is greater in the second experimental condition, if this ration is smaller than 1 (green on the image), the gene expression is greater in the first condition. We can visualize these differences in expression using software as the one developed in the laboratory call ArrayPlot (cf below image). This software allows from the intensities list of spot to display the red intensities of each spot as a function of the green intensities.Expression profile clustering:Then we can try to gather genes that share the same expression profile on several experiments. This clustering can be done gradually as for phylogenetic analysis, which consist in calculating similarity criteria between expression profiles and gather the most similar ones. We can also use more complex techniques as principal component analysis or neuronal networks.At the end hierarchical clustering is usually displayed as a matrix where each column represent one experiment and each row a gene. Ratios are displayed thanks to a colour scale going from green (repressed genes) to red (induced genes).
  • Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  • Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  • I can not say that I'm into Statistician 20 min. I give you just a few items to give rapid analysis of microarray.
  • The following experimental techniques are used to measure gene expression and are listed in roughly chronological order, starting with the older, more established technologies. They are divided into two groups based on their degree of multiplexity.
  • ArrayTrack™ provides an integrated solution for managing, analyzing, and interpreting microarray gene expression data. Specifically, ArrayTrack™ is MIAME (Minimum Information About A Microarray Experiment)-supportive for storing both microarray data and experiment parameters associated with a pharmacogenomics or toxicogenomics study. Many statistical and visualization tools are available with ArrayTrack™ which provides a rich collection of functional information about genes, proteins, and pathways for biological interpretation.  The primary emphasis of ArrayTrack™ is the direct linking of analysis results with functional information to facilitate the interaction between the choice of analysis methods and the biological relevance of analysis results. Using ArrayTrack™, users can easily select a statistical method applied to stored microarray data to determine a list of differentially expressed genes. The gene list can then be directly linked to pathways and gene ontology for functional analysis.
  • Boxplots are useful for determining where the majority of the data lies
  • Session ii g3 overview behavior science mmc

    1. 1. Theme: Transcriptional Program in the Response of HumanFibroblasts to Serum.Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013Etienne.gnimpieba@usd.edu
    2. 2. Data manipulation Gene expression data analysisOMIC WorldDNAEDNAmRNAEDegradationDegradationTranslationTranscriptionGeneRepressionS PCatalyseGenomicsFunctionalGenomicsTranscriptomicsProteomicsMetabolomicsEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    3. 3. Data manipulation Gene expression data analysisOMIC WorldGENOMICSEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    4. 4. Data Manipulation Gene Expression Data AnalysisOMIC WorldGenomics is the sub discipline of genetics devoted to themapping,sequencing ,and functionalanalysis of genomicsGenomics can be said to have appeared in the 1980s, and took off in the 1990swith the initiation of genome projects for several biological species.The most important tools here are microarrays and bioinformaticsDNA microarrays allow for rapid measurement and visualization of differentialexpression between genes at the whole genome scale. If technique implementation isquite complicated, it’s principle is very easy. Here are described the major stepsinvolved in this processEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    5. 5. Data Manipulation Gene Expression Data AnalysisProcessBiological questionDifferentially expressed genesSample class prediction etc.TestingBiological verificationand interpretationMicroarray experimentEstimationExperimental designImage analysisNormalizationClustering DiscriminationEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    6. 6. Data Manipulation Gene Expression Data AnalysisProcessEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    7. 7. Data Manipulation Gene Expression Data AnalysisMicroarray Production ProcessHigh densityfilters(macroarrays)Glass slides (microarrays) Oligonucleotides chipsDetail: Detail: Detail:Size: 12cm x 8cm Size: 5,4cm x 0,9cm Size: 1,28cm x 1,28cm•2400 clones by membrane•radioactive labelling•1 experimental condition bymembrane•10000 clones by slide•fluorescent labelling•2 experimental conditionsby slide•300000 oligonucleotides byslide•fluorescent labelling•1 experimental condition byslideEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    8. 8. Data Manipulation Gene Expression Data AnalysisMicroarray Production Process• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France) Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013Expression Profile Clustering:Slide Scanning:Target Preparation:Hybridization:
    9. 9. Data Manipulation Gene Expression Data AnalysisMicroarray Production Process• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France)• Image analysis (genepix)• Normalization (R)• Pre-treatment• Differential expression• Clustering• Data mining• AnnotationEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    10. 10. Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France)• How to select columns• How to use functions• How to anchor a cell value in a function• How to copy the function result and not thefunction itself• How to sort data by columns• How to search and replaceEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    11. 11. Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics: Pre-Treatment• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France)1. Open the file containing the experiment series (your expression matrix)in Excel software, using the tabulation character as the column separator.2. For one column (corresponding to one DNA microarray experiment),calculate the mean value, using the MEAN Excel function. Verify that thevalue obtained is equal to zero.3. If it is not the case, remove from each experiment log2(Ratio) value thecorresponding mean value. Be careful, for missing values (empty cells),replace empty contents by the NULL or NA string, in order to avoidintroducing a zero value in Excel calculation in this cell. Indeed, amissing value is different from a true null one!4. Once this operation has been done, verify that the final mean value isequal to zero, this in order to avoid errors with Excel handling. Be careful,with decimal separator handling in Excel version (dot or coma)!Centering and Scaling DataEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    12. 12. Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics : Differential Expression Analysis (1)• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France)Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheetmakes this tool easier to use for most of microarray users. Using SAM implies several modifications inyour data file: The ratio or intensity values in the Excel sheet must not contain any comas but only points asdecimal separator. The header line depends on the type of analysis you want to perform. You can refer to SAMmanual for more information. So you must duplicate your header if you don’t want to loose theexperiment information (see image below). Two annotation columns are available. SAM always references its calculation to the line numberin the departure sheet.SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressedgenes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    13. 13. Data Manipulation Gene Expression Data AnalysisExcel Used in Genomics : Differential Expression Analysis (2)• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France) When the SAM macro is launched in the tool bar (“SAM”), a setting window appears. For furtherinformation on the various options you can choose, the best is to refer to the SAM manual. However,the first important things to do is to indicate if the data source has been transformed in log2 or not,then, as data bootstrapping uses a random generator, you need to initialize it several times bycreating a various number of seeds. Once all the chosen iterations have been done, SAM displays a plot representing each gene thanks toits score in the real distribution compared to the random distributions. Therefore, the differentiallyexpressed genes are the ones moving away from the 45° slope line. First, display the delta table. This table indicates for each delta value, the number of putativedifferentially expressed genes, the significant genes, and the number of false positive genesestimated using the False Discovery Rate (FDR). The user fixes the delta value according to thenumber of false positive or significant genes he wants to obtain. To choose the delta value, get back to the SAM plot sheet and display the “SAM plot controller” byclicking on the SAM macro button. The SAM Plot Controller window lets you fix the delta value you want: “Manually Enter Delta”. Then ifyou select the “List Significant Genes” button, SAM displays the list of differentially expressed genesin the “SAM output” sheet according to the delta value you chose. This sheet summarizes the selected parameters and gives you the list of induced and repressedgenes.Etienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    14. 14. Data Manipulation Gene Expression Data AnalysisGEPAS: Gene Expression Pattern Analysis Suite• Frouin, V. & Gidrol, X. (2005)• CBB group (Berlin)• Transcriptome ENS (France) Verify the availability of the data file in your folder nameFibroGEPAS.txt Open the dataset for description Open GEPAS portal onhttp://www.transcriptome.ens.fr/gepas/index.html Click on “Tools” Preprocessing- Preprocess DNA array data files: log-transformation,replicate handling, missing value imputation, filtering andnormalization- Filtering Viewing Clustering Differential expression Classification Data miningEtienne Z. GnimpiebaBRIN WS 2013Mount Marty College – June 24th 2013
    15. 15. Microarray Dataset: Mining and GeneProfile Analysis using online ToolsKruer Lab
    16. 16. • Gene Expression Measurement• Microarray Process• Gene Expression Data Stores• Data Mining / Querying• Data Analysis• Example: ATP13A2 Profile in StressConditions
    17. 17. Gene Expression Measurement Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditionsHigher-plex techniques:SAGEDNA microarrayTiling arrayRNA-SeqNGSLow-to-mid-plex techniques:Reporter geneNorthern blotWestern blotFluorescent in situ hybridizationReverse transcription PCR
    18. 18. DatabaseMicroarrayExperimentSetsSampleProfilesDate ReportedArrayExpress at EBI 24,838 708,914 October 28, 2011ArrayTrack™ 1,622 50,953 February 11, 2012caArray at NCI 41 1,741 November 15, 2006Gene Expression Omnibus - NCBI 25,859 641,770 October 28, 2011Genevestigator database 2,500 65,000 January 2012MUSC database ~45 555 April 1, 2007Stanford Microarray database 82,542 Not reported October 23, 2011UNC Microarray database ~31 2,093 April 1, 2007UNC modENCODE Microarraydatabase~6 180 July 17, 2009UPenn RAD database ~100 ~2,500 September 1, 2007UPSC-BASE ~100 Not reported November 15, 2007SAGE GEOGUDMAP (421) MGIBIOGPS Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditionsGene Expression Measurement
    19. 19. Data Mining / Querying• Problem specification• Query• Extraction• Storage• Load• Pretreat / prepare for analysis Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditions
    20. 20. Data Analysis• Question-Answer– Experimental condition profile: group comparison– Annotation profile: systems biological involved– Clustering profile: co-regulation– Time course profile: time variation– …• Descriptive– Boxplot (SD, MEAN, MEDIAN, )– Scatter plot• Predictive / inference (clustering)• Modeling (machine learning, simulation) Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditions
    21. 21. • 3 Questions– What is the right dataset (experimentalcondition)?– Is dataset is ready for analysis (quality)?– What is the expression profile for a given gene?– Significant differential expression in groupscomparison• Tools– ArrayExpress (EBI)– Boxplot– GEO2R (LIMMA, profile graph,) Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditionsData Analysis
    22. 22. Boxplot Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditionsData Analysis
    23. 23. Example: ATP13A2 Profilein Stress Conditions• Specification: ATP13A2 profile in stressconditions• Data querying:– GEO– Array Express– Gene Atlas• Data analysis:– Online: GEO2R, Genospace, …– Desktop: R, ArrayTrack, … Geneexpressiontechnologies Microarrayprocess Gene expressiondata stores Data mining /quering (pb-query-extraction-load-store-pretreat) Data analysis(Question-Answer,descriptive,predictive,modeling) Example:ATP13A2 profilein stressconditions
    24. 24. Resolution ProcessContextSpecification & AimsLab #2 Preprocessing Viewing Clustering Differential expression Classification Data mining24Statement of problem / Case study:The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with acomplementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression inthis program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role inthis complex multicellular response than had previously been appreciated.Gene Expression Data Analysis16 Vishwanath R. Iyer, Scince, 1999Conclusion: ?Aim:The purpose of this lab is to initiate on gene expression data analysis process.We simulated the application on “Transcriptional Program in the Response ofHuman Fibroblasts to Serum” . Now we can understand how a researcher cancome to identify a significant expressed gene from microarray dataset.T1. Gene expression overviewT2. Excel used in GenomicsObjective: used of basic excel functionalities to solve some geneexpression data analysis needsAcquired skills- Gene expression data overview- Excel Used for genomics- Microarray data analysis using GEPAST1.1. Review of genomics place in OMIC- worldT1.2. Microarray data technics and processT1.3. Data analysis cycle and toolsT2.1. Colum manipulation, functions used, anchor, copy withfunction, sort data, search and replaceT2.2. Experiment comparison: Data pre-treatmentT1.3. Differential expressed gene from replicate experiments (SAM)T2. GEPAS: Gene expression analysis pattern suiteObjective: used of the GEPAS suite to apply the whole microarray dataanalyzing process on fibroblast data.http://www.transcriptome.ens.fr/gepas/index.htmlExpression Profile Clustering:Slide Scanning:Target Preparation:Hybridization:

    ×