Lab Gene Expression Data Analysis

1,753 views
1,586 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,753
On SlideShare
0
From Embeds
0
Number of Embeds
267
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • During this lab, we have:A brief review Lab’s templateGenome exploration practice…
  • Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  • Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  • Lab Gene Expression Data Analysis

    1. 1. Lab#.Data manipulation: Biostatistic & Gene expression data analysis (Microarray, NGS & qRT-PCR) Theme: Transcriptional Program in Response of Human Fibroblasts to Serum. Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 30 2012 Etienne.gnimpieba@usd.edu
    2. 2. Data manipulation Gene expression data analysis OMIC World Genomics DNA DNA E Transcription Degradation mRNA Transcriptomics TranslationFunctional Gene RepressionGenomics Degradation Proteomics E Catalyse Metabolomics S P
    3. 3. Data manipulation Gene expression data analysis OMIC World GENOMICS
    4. 4. Data manipulation Gene expression data analysis Excel used in genomics • How to select columns • How to use functions • How to anchor a cell value in a function • How to copy the function result and not the function itself • How to sort data by columns • How to search and replace • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2012 • CBB group (Berlin) Sioux Falls, May 31 2012
    5. 5. Data manipulation Gene expression data analysis Excel used in genomics: Pre-treatmentCentering and scaling data 1. Open the file containing the experiment series (your expression matrix) in Excel software, using the tabulation character as the column separator. Click on the second spreadsheet named Fibroblast real. Look over this spreadsheet quickly. It is a realistic data set from a microarray experiment. Click back on the first spreadsheet named Fibroblast lab. We will be using a condensed version. 2. For one column (corresponding to one DNA microarray experiment), calculate the mean value, using the AVERAGE Excel function. Verify that the value obtained is equal to zero. 3. If it is not the case, from each experiment (15MIN, 30MIN, 2HR, etc…) remove the log2(Ratio) value from the corresponding mean value by: - subtract the average value for each column from the corresponding individual values (for the first example, B2-$B$37). Place these values in the corresponding table on the right (R2). Use the drag down box to quickly finish a column. - Continue to center the data for each column (each DNA microarray experiment), filling in the blank table to the right. Again use the AVERAGE function to find mean values for each column in the new table. Each average should now be zero. - Be careful, if there are missing values (empty cells), replace empty contents with the NULL or NA command, in order to avoid introducing a zero value in Excel calculations in this cell. Indeed, a missing value is different from a true null one! - Be careful with decimal separator handling in Excel (dot or coma)! • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) • CBB group (Berlin)
    6. 6. Data manipulation Gene expression data analysis Excel used in genomics : Differential expression analysis (1)SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressedgenes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheetmakes this tool easier to use for most microarray users. Using SAM implies several modifications in yourdata file: The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator. The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. You must highlight your header if you don’t want to loose the experiment information. Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet. • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) • CBB group (Berlin)
    7. 7. Data manipulation Gene expression data analysis Excel used in genomics : Differential expression analysis (2) Under the Add-Ins tab, view the “SAM” toolbar Command. Highlight from R2 to AF37. Now select SAM. When SAM macro is launched in the tool bar, a setting window appears. For further information on the various options you can choose, it is best to refer to the SAM manual. However, the first important thing to do is to indicate if the data source has been transformed in log2 or not. In this case we will select Unlogged. Then, as data bootstrapping uses a random generator, you need to initialize it several times by selecting “Generate Random Seed”. Click “OK”. Once all the chosen iterations have been done, SAM displays a plot representing each gene in reference to its score in the real distribution compared to the random distributions. Therefore, the differentially expressed genes are the ones moving away from the 45° slope line. The table that appears indicates for each delta value, the number of putative differentially expressed genes, the significant genes, and the number of false positive genes estimated using the False Discovery Rate (FDR). The user can change the delta value according to the number of false positive or significant genes he or she wants to obtain. Choose a delta value by selecting “Manually Enter Delta”. Enter your own delta value between 0 and 0.25. Then if you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes in the “SAM output” sheet according to the delta value you chose. This sheet summarizes the selected parameters and gives you the list of induced and repressed genes. • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2012 • CBB group (Berlin) Sioux Falls, May 31 2012
    8. 8. Data manipulation Gene expression data analysis GEPAS: Gene Expression pattern Analysis suite Review this section. Become familiar on your own by reviewing each section listed under tools.  Verify that the data file FibroGEPAS.txt is in your folder  Open the file  Open GEPAS portal on http://www.transcriptome.ens.fr/gepas/index.html  Click on “Tools”  Preprocessing  Preprocess DNA array data files: log-transformation, replicate handling, missing value imputation, filtering and normalization  Filtering  Viewing  Clustering  Differential expression  Classification  Data mining • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2012 • CBB group (Berlin) Sioux Falls, May 31 2012
    9. 9. Gene Expression Data Analysis Context Statement of problem / Case study: The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with a complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in this complex multicellular response than had previously been appreciated. Specification & aims Resolution processAim:The purpose of this lab is to initiate a gene expression data analysis process. T1. Gene expression overviewWe simulated the application on “Transcriptional Program in the Response ofHuman Fibroblasts to Serum” . Now we can understand how a researcher can T1.1. Review of genomics place in OMIC- worldcome to identify a significant expressed gene from microarray datasets. T1.2. Microarray data technics and process T1.3. Data analysis cycle and tools T2. Excel used in Genomics Objective: use of basic excel functionalities to solve some gene expression data analysis needs T2.1. Column manipulation, functions used, anchor, copy with function, sort data, search and replace T2.2. Experiment comparison: Data pre-treatment T1.3. Differential expressed gene from replicate experiments (SAM)Target preparation Hybridization Slide scanning T2. GEPAS: Gene expression analysis pattern suite Objective: use of the GEPAS suite to apply the whole microarray data analyzing process on fibroblast data.  Preprocessing  Viewing  Clustering  Differential expression Expression profile clustering Data analysis  Classification Acquired skills  Data mining - Gene expression data overview - Excel Used for genomics Conclusion: ? - Microarray data analysis using GEPAS 16 Vishwanath R. Iyer, Scince, 1999 9
    10. 10. END.

    ×