Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

功能性基因體分析及基因註解 (Functional Genomics and Annotation)


Published on

介紹基因本體 (Gene Ontology)及功能性基因體分析方法與工具,也進一步說明註解 (annotation) 的分層及其重要性。此外,介紹高通量實驗資料之通用分析方式。

日期 : 2015/06/04

Introduce GO (Gene Ontology) and several analyzing methods or tools for functional genomics, including GO, KEGG, Reactome, and DAVID. Second, introduce the annotation levels and its importance. Furthermore, introduce general approach to high throughput experiments.

Date: 06/04/2016

Published in: Science
  • Be the first to comment

功能性基因體分析及基因註解 (Functional Genomics and Annotation)

  1. 1. Bioinformatics Functional Genomics • Date: 2015/06/04 • Jian-Kai Wang, GSB`s MD
  2. 2. Content 2 Gene ontology 1. GO term 2. BP/MF/CC ontology 3. GO hierarchy 4. Enrichment analysis 5. P-value 6. Multiple testing KEGG 1. Pathway maps in 7 areas 2. Pathway searching 3. Customized pathway Reactome 1. Pathway enrichment DAVID 1. Function annotation 2. Gene ID conversion HW Introduction
  3. 3. Requirement for the class and homework 3 Download data Class (3) class_UniProtID.txt, class_entrezID.txt, class_KEGG_color.txt Homework (2) HW_entrezID.txt, HW_UniProtID.txt
  4. 4. Definition 4 Jonathan Pevsner (2009) BIOINFORMATICS AND FUNCTIONAL GENOMICS, 2nd Ed., pp.461
  5. 5. General approach to high throughput experiments 5 Biological experiment High throughput technologies Comparison (Control vs. Case) Differences between two conditions (genes, proteins) Functional analysis Interpretation Microarray, MS/MS, NGS Normal vs. Cancer sample No treatment vs. drug treatment Fold change, t-test, clustering GO enrichment analysis Pathway searching/enrichment Interaction network Functional annotation Between-subject, within-subject (ref. Ch9)
  6. 6. Four levels of annotation 6 Jennifer L. Reed, Iman Famili et al. (2006) Nature 7, 130-141 KEGGGene ontology Reactome DAVID
  7. 7. Gene ontology (GO) project 7 produce dynamic, controlled vocabulary GO terms with their definitions applied to gene and protein roles Gene product to GO term(s)
  8. 8. Three categories of GO 8 Biological Process Biological process refers to a biological objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions. cell growth and maintenance, signal transduction BP Molecular function Molecular function is defined as the biochemical activity (including specific binding to ligands or structures) of a gene product. enzyme, transporter, ligand MF Cellular component Cellular component refers to the place in the cell where a gene product is active. ribosome, proteasome CC
  9. 9. Hierarchy of GO terms 9 A GO term’s semantics inherits the biological meanings of all its parent terms. GO:0055132 GO:0055133
  10. 10. Enrichment analysis (official) 10 quick submission (less options) STEP.1 (Advanced options)
  11. 11. Upload a set of Uniprot IDs & Set parameters 11 STEP.2 paste Uniprot IDs (class_UniProtID.txt) STEP.3 STEP.4 STEP.5 STEP.6
  12. 12. Enrichment result summary 12 Related to p-value Output the enrichment result
  13. 13. Enrichment result interpretation 13 GO term Definition adj. P-value (Click to rank) Number of protein In the reference Number of protein In the upload list
  14. 14. GO terms in enrichment analysis 14 Question What GO terms are important ? Case.1 Case.2 10 50 GO1 10 20 GO2 10 50 GO1 8 50 GO2 Case.3 10 50 GO1 6 20 GO2 Matched proteins Query proteins Solution Hypothesis testing is necessary.
  15. 15. P-value in enrichment analysis 15 N n m k Reference set (e.g. all genes in human database) Set of genes of interest (e.g. all genes in each GO terms) Genes in a specific gene set (e.g. genes in upload list) Null hypothesis 𝑘 (𝑛 − 𝑘) = (𝑚 − 𝑘) (𝑁 − 𝑚 − 𝑛 + 𝑘) (𝑠𝑎𝑚𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛) Genes of interest Non Genes of interest Total In gene set k m-k m Non in gene set n-k N-m-n+k N-m Total n N-n N hypothesis testing Fisher’s exact test, hypergeometric test, …
  16. 16. Multiple testing problem 16 Actual situation (or Truth) H0 True H0 False Decision (prediction) Accept H0 Correct decision (1 - α) Incorrect decision Type II error (β) Reject H0 Incorrect decision Type I error (α) Correct decision (1 - β) Question If we perform m hypothesis tests, what is the probability of at least 1 false positive (Type I error)? Answer P(Making an error) = α (usually is 0.05) P(Not making an error) = 1 - α P(Not making an in m tests) = (1 - α)m P(Making at least 1 error in m tests) = 1 - (1 - α)m Method Bonferroni, Benjamini, Hochberg, FDR, etc. are methods to control the false discovery rate. Bonferroni 𝛼′ = 𝛼 𝑚
  17. 17. KEGG (Kyoto Encyclopedia of Genes and Genomes) 17 STEP.1
  18. 18. KEGG pathway database 18 molecular interaction and reaction networks for seven areas detailed pathways based on different areas
  19. 19. Search pathway 19 STEP.2 Click the Organism STEP.3 STEP.4
  20. 20. Pathway searching result 20
  21. 21. Color codes 21
  22. 22. KEGG pathway enrichment 22 STEP.1 STEP.2 STEP.3 STEP.4 STEP.5 NCBI Entrez Gene ID (class_entrezID.txt) STEP.6
  23. 23. Enrichment result 23 Pathway name (the number of proteins in the pathway) Click
  24. 24. Detailed pathway 24 Download Right click -> save image
  25. 25. Customized pathway 25 STEP.1 STEP.2 STEP.3STEP.4 STEP.5 Format: id bgcolor, fgcolor Up-regulated down-regulated File: class_KEGG_color.txt
  26. 26. Customized pathway representation 26 Selection hsa04151 PI3K-Akt signaling pathway - Homo sapiens (human)
  27. 27. Reactome (Pathway database) 27 STEP.1
  28. 28. Upload a set of Uniprot IDs 28 STEP.2 Uniprot id (class_UniProtID.txt) STEP.3
  29. 29. Pathway enrichment 29 Pathway enrichment P-value color legendComplete pathway (viewport)
  30. 30. Detailed information of pathways 30 Step.1 Click Metabolism Step.2 click the diagram Detailed info. of each sub-pathways
  31. 31. DAVID 31 STEP.1
  32. 32. Upload Uniprot IDs 32 STEP.1 Uniprot ID (class_UniProtID.txt) STEP.2 STEP.3 STEP.4
  33. 33. DAVID gene conversion tool (useful) 33 STEP.5 STEP.6 STEP.7
  34. 34. Analyze the gene list with DAVID tools 34 STEP.8
  35. 35. Annotation Summary Results 35 STEP.9
  36. 36. Functional annotation clustering 36
  37. 37. End 37 • feel free to contact me