Your SlideShare is downloading. ×
0
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Network Biology: from lists to underpinnings of molecular behaviour
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Network Biology: from lists to underpinnings of molecular behaviour

1,908

Published on

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] …

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Published in: Health & Medicine, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,908
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
90
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • GeneMANIA uses query specific weights for multifaceted function queries.Let’s say you have a co-expression network that was generated from microarray data. You know there is a cluster of cell cycle genes, and a cluster of DNA repair genes, and a few unknown genes between or within those clusters.This tells you a little bit about your genes of interest.But you want to add in a genetic interaction network, which is considerably more complex.And a protein interaction network, which is even more complex.How do you know what network contains the most relevant information about your query genes?The GeneMANIA algorithm weights the networks based on how connected your query genes are. A network is weighted more heavily if your query genes are more connected within that network.GeneMANIA produces a composite network showing the weights of the genetic and protein interaction, and co-expression networks used to generate the composite network.
  • Transcript

    • 1. Network Biology:from lists to underpinnings of molecular behaviour
      Michel Dumontier, Ph.D.
      Associate Professor of Bioinformatics
      Carleton University
      1
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 2. 2
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 3. Provenance
      This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca)
      http://bioinformatics.ca/workshops/2009/course-content
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
      3
    • 4. So you did some mass spectrometry?
      Protein Identification
      4
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 5. database search vs de novo
      W
      R
      V
      A
      L
      T
      Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN..
      G
      E
      P
      L
      K
      C
      W
      D
      T
      W
      R
      V
      A
      L
      T
      G
      E
      P
      L
      K
      C
      W
      D
      T
      Database Search
      de novo
      AVGELTK
      5
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 6. 6
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 7. My experiment worked and I have dozens, hundreds, or thousands of hits…. now what?
      Protein
      Identification
      ?
      7
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 8. Use the list to explore Biology
      Determine significant shared attributes
      Explore putative mechanisms of actions
      Test hypotheses
      Protein
      Identification
      Network
      Biology
      Eureka!
      Hypothesis on the
      molecular basis
      of disease/process
      8
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 9. Detoxification
      Oxidative Metabolism
      # in list having attribute
      Enriched in smokers =
      UP-regulated in smokers
      # in list sharing
      these attributes
      9
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 10. Outline
      Explore identified proteins
      Attribute enrichment
      Networks
      Pathways
      Lab
      10
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 11. A hypothesis underlies the list of identified proteins
      An initial question was posed, an experiment performed and a list of candidates obtained.
      The question is, what are the roles of these entities in the biological process being investigated.
      Normal vs pathological
      Response to stimulus
      Interactions and complexes
      11
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 12. Biological Answers
      Computational systems biology
      Information retrieval and summary
      Interaction network analysis
      Pathway analysis
      Function prediction
      12
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 13. Molecular Attributes
      An attribute provides information about to the entity in question (e.g. shape, function, process)
      Sequence and structure provides information about
      Motifs, domains, interaction/binding sites, post-translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution
      Functions, localization, biological / pathological processes
      13
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 14. Gene Ontology
      Captures terminology related to three aspects
      biological processes
      molecular functions
      cellular components
      Relationships between terms are largely defined with “is a” and “part of” relations
      Cell division
      Isomerase activity
      14
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 15. cell
      membrane chloroplast
      mitochondrial chloroplast
      membrane membrane
      is-a
      part-of
      GO Structure
      Species independent. Some lower-level terms are specific to a group, but higher level terms are not
      15
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 16. Gene Ontology
      30,393 terms, 99.2% with definitions
      • 18,939 biological processes
      • 17. 2,735 cellular components
      • 18. 8,719 molecular functions
      GO Slim is an official reduced set of GO terms
      • Generic, plant, yeast
      • 19. Good for making pie charts
      16
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 20. Annotation
      Manual annotation
      Created by scientific curators
      High quality
      Small number (time-consuming to create)
      Electronic annotation
      Annotation derived without human validation
      Computational predictions (accuracy varies)
      Lower ‘quality’ than manual codes
      Key point: be aware of annotation origin
      17
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 21. Evidence Type(provenance of facts)
      • ISS: Inferred from Sequence/Structural Similarity
      • 22. IDA: Inferred from Direct Assay
      • 23. IPI: Inferred from Physical Interaction
      • 24. IMP: Inferred from Mutant Phenotype
      • 25. IGI: Inferred from Genetic Interaction
      • 26. IEP: Inferred from Expression Pattern
      • 27. TAS: Traceable Author Statement
      • 28. NAS: Non-traceable Author Statement
      • 29. IC: Inferred by Curator
      • 30. ND: No Data available
      • 31. IEA: Inferred from electronic annotation
      18
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 32. Variable Coverage
      Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304.
      19
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 33. GO Software Tools
      GO resources are freely available to anyone without restriction
      Includes the ontologies, gene associations and tools developed by GO
      Other groups have used GO to create tools for many purposes
      http://www.geneontology.org/GO.tools
      20
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 34. Accessing GO: QuickGO
      http://www.ebi.ac.uk/ego/
      21
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 35. Explore Ontologies
      http://www.ebi.ac.uk/ontology-lookup
      22
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 36. Databases of Molecular Annotation
      NCBI
      Genbank / RefSeq
      Entrez Gene
      EBI
      UniProt
      Ensembl BioMart (eukaryotes)
      Model Organism Databases
      Berkeley Drosophila Genome Project (BDGP)
      dictyBase (Dictyostelium discoideum)
      FlyBase (Drosophila melanogaster)
      GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei)
      UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases
      Gramene (grains, including rice, Oryza)
      Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus)
      Rat Genome Database (RGD) (Rattus norvegicus)
      Reactome
      Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae)
      The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana)
      The Institute for Genomic Research (TIGR): databases on several bacterial species
      WormBase (Caenorhabditis elegans)
      Zebrafish Information Network (ZFIN): (Danio rerio
      23
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 37. 24
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 38. Identifiers
      Identifiers (IDs) are ideally unique, stable names or numbers that help track database records
      E.g. Social Insurance Number, Entrez Gene ID 41232
      Gene and protein information stored in many databases
       Genes have many IDs
      Records for: Gene, DNA, RNA, Protein
      Important to recognize the correct record type
      E.g. Entrez Gene records don’t store sequence. They link to DNA regions, RNA transcripts and proteins.
      25
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 39. NCBI Database Links
      NCBI:
      U.S. National Center for Biotechnology Information
      Part of National Library of Medicine (NLM)
      http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf
      26
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 40. Common Identifiers
      Species-specific
      HUGO HGNC BRCA2
      MGI MGI:109337
      RGD 2219
      ZFIN ZDB-GENE-060510-3
      FlyBase CG9097
      WormBase WBGene00002299 or ZK1067.1
      SGD S000002187 or YDL029W
      Annotations
      InterPro IPR015252
      OMIM 600185
      Pfam PF09104
      Gene Ontology GO:0000724
      SNPs rs28897757
      Experimental Platform
      Affymetrix 208368_3p_s_at
      Agilent A_23_P99452
      CodeLink GE60169
      Illumina GI_4502450-S
      Gene
      Ensembl ENSG00000139618
      Entrez Gene 675
      Unigene Hs.34012
      RNA transcript
      GenBank BC026160.1
      RefSeq NM_000059
      Ensembl ENST00000380152
      Protein
      Ensembl ENSP00000369497
      RefSeq NP_000050.2
      UniProt BRCA2_HUMAN or A1YBP1_HUMAN
      IPI IPI00412408.1
      EMBL AF309413
      PDB 1MIU
      Red = Recommended
      27
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 41. Identifier Mapping
      So many IDs!
      Mapping (conversion) is a headache
      Four main uses
      Disambiguate similarly named entities
      Used to reference related information
      Biological and informational provenance
      E.g. Genes to proteins, Entrez Gene to Affy
      Unification during dataset merging
      Equivalent entities
      28
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 42. ID Mapping Services
      Synergizer
      http://llama.med.harvard.edu/synergizer/translate/
      Ensembl BioMart
      http://www.ensembl.org
      UniProt
      http://www.uniprot.org/
      29
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 43. Outline
      Explore identified proteins
      Attribute enrichment
      Networks
      Pathways
      30
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 44. Attribute Enrichment (AE)
      Given:
      list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42
      attributes: e.g. function, process, localization, interactions
      AE Question: Are any of the attributes surprisingly enriched in the list?
      Details:
      How to assess “surprisingly” (statistics)
      How to correct for repeating the tests
      31
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 45. What is a P-value?
      The P-value is (a bound) on the probability that the “null hypothesis” is true,
      Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis,
      Intuitively: P-value is the probability of a false positive result (aka “Type I error”)
      32
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 46. How likely are the observed differences between the two distributions due to chance?
      0
      1
      7
      1
      5
      6
      6
      0
      1
      1
      0
      7
      2
      0
      1
      2
      1
      0
      value
      value distribution
      33
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 47. AE using the T-test
      Answer: Two-tailed T-test
      Black: N1=500
      Mean: m1 = 1.1
      Std: s1 = 0.9
      Red: N2=4500
      Mean: m1 = 4.9
      Std: s1 = 1.0
      T-statistic =
      Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same?
      = -88.5
      34
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 48. AE using the T-test
      P-value = shaded area * 2
      -88.5
      T-distribution
      Probability density
      0
      T-statistic
      T-statistic =
      Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same?
      = -88.5
      35
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 49. T-test limitations
      Values are positive and have increasing density near zero, e.g. sequence counts
      Bimodal “two-bumped” distributions.
      Distributions with outliers, or “heavy-tailed” distributions
      Probability density
      0
      score 
      Probability density
      Probability density
      score 
      score 
      Assumes distributions are both approximately Gaussian (i.e. normal)
      Score distribution assumption is often true for:
      Log ratios from microarrays
      Score distribution assumption is rarely true for:
      Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor binding sites hits
      Tests for significance of difference in means of two distribution but does not test for other differences between distributions.
      36
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 50. Kolmogorov-Smirnov (K-S) test
      Probability density
      0
      score 
      Cumulative distribution
      1.0
      Cumulative probability
      0.5
      Length = 0.4
      0
      Question: Are the red and black distributions significantly different?
      score 
      Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant?
      Calculate cumulative distributions of red and black
      37
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 51. What is the probability of finding 4 or more proteins with feature X in a random sample of 5 proteins
      list
      RRP6
      MRD1
      RRP7
      RRP43
      RRP42
      Background population:
      500 X proteins,
      5000 proteins
      38
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 52. Fisher’s exact test
      Null distribution
      P-value
      Answer = 4.6 x 10-4
      list
      RRP6
      MRD1
      RRP7
      RRP43
      RRP42
      P-value for Fisher’s exact test
      is “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”,
      depends on size of the list, # with features (in list, background), and the background population.
      Background population:
      500 X proteins,
      5000 proteins
      39
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 53. Important details
      To test for under-enrichment of “black”, test for over-enrichment of “red”.
      Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background.
      To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type.
      The hypergeometric test is equivalent to a one-tailed Fisher’s exact test.
      40
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 54. How to win the P-value lottery, part 1
      Random draws
      Expect a random draw with observed enrichment once every 1 / P-value draws
      … 7,834 draws later …
      Background population:
      500 X
      5000 Y
      41
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 55. How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations
      Different annotations
      Observed draw
      RRP6
      MRD1
      RRP7
      RRP43
      RRP42
      RRP6
      MRD1
      RRP7
      RRP43
      RRP42
      42
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 56. Correcting for multiple tests
      The Bonferroni correction controls the probability any one test is due to random chance akaFamily-Wise Error Rate (FWER)
      If M = # of annotations tested: Corrected P-value = M x original P-value
      The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives akaFalse Discovery Rate (FDR)
      FDR is the expected proportion of the observed enrichments that are due to random chance.
      Less stringent than the Bonferroni
      43
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 57. Reducing multiple test correction stringency
      The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be
      Can control the stringency by reducing the number of tests:
      e.g. use GO slim or restrict testing to the appropriate GO annotations.
      44
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 58. AE tools
      Web-based tools
      Funspec:
      easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes)
      YeastFeatures
      Similar to Funspec, different datasets and presentation
      GoMiner:
      Uses GO annotations, covers many organisms, needs a background set of genes
      Cytoscape-based tools
      BINGO:
      Does GO annotations and displays enrichment results graphically and visually organizes related categories
      45
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 59. Funspec: Simple ORA for yeasthttp://funspec.med.utoronto.ca/
      Choose sources of annotation
      Bonferroni correct? YES!
      Paste list here
      Cavaets:
      • yeast only,
      • 60. last updated 2002
      46
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 61. http://software.dumontierlab.com/yeastfeatures
      47
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 62. 48
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 63. GoMiner, part 1http://discover.nci.nih.gov/gominer
      1. Click “web interface”
      2. Upload background
      3. Upload list
      4. Choose organism
      5. Choose evidence code (All or Level 1)
      49
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 64. GoMiner, part 2
      6. Restrict # of tests via category size
      7. Restrict # of tests via GO hierarchy
      8. Results emailed to this address, in a few minutes
      50
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 65. DAVID, part 1 http://david.abcc.ncifcrf.gov/
      Paste list here
      DAVID automatically detects organism
      Choose ID type
      List type: list or background?
      51
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 66. DAVID, part 2http://david.abcc.ncifcrf.gov/
      52
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 67. BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm
      Links represent parent-child relationships in GO ontology
      Colours represent significance of enrichment
      Nodes represent GO categories
      53
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 68. 54
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 69. Outline
      Explore identified proteins
      Attribute enrichment
      Networks
      • Physical networks
      • 70. Genetic networks
      • 71. Functional networks
      Pathways
      55
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 72. Why Network and Pathway Analysis?
      Intuitive to Biologists
      • Provide a biological context for results
      • 73. More efficient than searching databases gene-by-gene
      • 74. Intuitive display for sharing data
      Computation on Pathway Content
      • Visualize multiple data types on a pathway or network
      • 75. Find active pathways
      • 76. Identify potential regulators
      56
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 77. network
      In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive or associative relations between entities.
      Vertex (node)
      Cycle
      Edge
      -5
      Directed Edge (Arc)
      Weighted Edge
      10
      7
      57
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 78. Integration in a Network Context
      58
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 79. Integration in a Network Context
      Expression data mapped
      to node colours
      59
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 80. Mapping Biology to a Network
      A simple mapping: Protein-protein interactions
      one protein/node, one interaction/edge
      Edges can represent other relationships
      Physical e.g. protein-protein interaction
      Regulatory e.g. kinase activates target
      Genetic e.g. epistasis
      Similarity e.g. protein sequence similarity
      Critical: understand the mapping for network analysis
      60
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 81. Protein Sequence Similarity Network
      http://apropos.icmb.utexas.edu/lgl/
      61
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 82. Literature Network
      Computationally extract gene relationships from text, usually PubMed abstracts
      Useful if network is not in a database
      Literature search tool
      BUT not perfect
      Problems recognizing gene names
      Natural language processing is difficult
      Agilent Literature Search Cytoscape plugin
      iHOP (www.ihop-net.org/UniPub/iHOP/)
      62
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 83. Agilent Literature Search
      63
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 84. Cytoscape Network produced by Literature Search.
      Abstract from the scientific literature
      Sentences for an edge
      64
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 85. Enrichment Map
      Overlap
      A
      B
      65
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 86. Nodes represent
      gene-sets
      66
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 87. Muscle Contraction
      Olfactory Receptor
      Ubiquitin Processes
      Ubiquitin-dependent Proteolysis
      Ectodermal Dev. &
      Keratinocyte Diff.
      DNA Repair
      Mitotic Cell Cycle
      Ubiquitin Ligase
      DNA Processes
      Cytoskeleton
      DNA Replication
      Intermediate Filament Cytoskeleton
      Microtubule Cytoskeleton
      Ras GTPase
      mRNA Transport
      Chromosome
      RNA Processes
      Serine Endopeptidase
      Chromatin Remodeling
      RNA Splicing
      Fatty Acid Metabolism
      Ion Channel
      Transcription
      Calcium
      rRNA Processing
      Mitochondrial Oxidative Metabolism
      Ribonucleotide Metabolism
      Potassium Sodium
      Translation
      67
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 88. 68
      Physical Networks
      B
      A
      Between two molecular objects
      DNA, RNA, gene, protein, complex, small molecule, photon
      Requires a site of interaction / binding
      Biologically relevant:
      Present/expressed at the same time
      Share a cellular location
      Leads to some biologically relevant outcome
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 89. Molecular Interactions
      RAS interacting with RALGDS
      (PDB: 1LFD)
      Synthetic protein interacting with ATP and Zinc
      (PDB: 2P0X)
      69
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 90. 70
      Experimental Interaction Discovery
      MassSpectrometry
      Genetics
      Two-Hybrid
      Direct, Physical
      Indirect, Physical
      Indirect, Genetic
      Microarray
      X-Ray
      NMR
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 91. 71
      Experimental Considerations
      How do you know if the interaction really exists?
      Each method has its advantages and disadvantages.
      Be aware of systematic errors
      Be aware of contaminants.
      Each method observes interactions from a slightly different experimental condition.
      Support from many different sources is certainly better (necessary) than just one.
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 92. 72
      Some affinity purification caveats
      First and most importantly, this is only a representation of the observation.
      You can only tell what proteins are in the eluate;
      you can’t tell how they are connected to one another.
      If there is only one other protein present (B), then its likely that
      A and B are directly interacting.
      But, what if I told you that two other proteins (B and C) were
      present along with A….
      A
      B
      A
      C
      B
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 93. 73
      Complexes with unknown topology
      A
      A
      A
      B
      C
      B
      C
      B
      C
      Which of these models is correct?
      The complex described by this experimental result is
      said to have an Unknown Topology.
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 94. 74
      Complexes with unknown stoichiometry
      A
      A
      B
      B
      B
      Here’s another possibility?
      The complex described by this experimental result is
      also said to have Unknown Stoichiometry.
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 95. 75
      Interaction Models
      Actual
      Topology
      Spoke
      Matrix
      Simple model, useful for data navigation
      More accurate
      Theoretical max. number of interactions
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 96. 76
      High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI)
      Mike Tyers, SLRI
      Ste12
      Ho et al. Nature. 2002 Jan 10;415(6868):180-3
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 97. 77
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 98. 78
      k-core analysis
      A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...)
      Highest k-core is a central most densely connected region of a graph
      Regions of dense connectivity may represent molecular complexes
      Therefore, high k-cores may be molecular complexes
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 99. 79
      Pre MS
      Ho
      6-core
      6-core
      Interaction can define function
      Gavin
      Union
      6-core
      9-core
      MCODE plugin for Cytoscape
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 100. 80
      http://pathguide.org
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 101. Interaction Databases
      Experiment (E)
      Structure detail (S)
      Predicted
      Physical (P)
      Functional (F)
      Curated (C)
      Homology modeling (H)
      *IMEx consortium
      81
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 102. Network Classification of Disease
      Traditional: Gene association
      Limitations: Too many genes reduces statistical power
      New: Active cell map based approaches combining network and molecular profiles
      Chuang HY, Lee E, Liu YT, Lee D, Ideker T
      Network-based classification of breast cancer metastasis
      Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16
      Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif S
      Network-based analysis of affected biological processes in type 2 diabetes models
      PLoS Genet. 2007 Jun;3(6):e96
      Efroni S, Schaefer CF, Buetow KH
      Identification of key processes underlying cancer phenotypes using biologic pathway analysis
      PLoS ONE. 2007 May 9;2(5):e425
      82
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 103. Network-Based Breast Cancer Classification
      57k intx from Y2H, orthology, co-citation, HPRD, BIND, Reactome
      2 breast cancer cohorts, different expression platforms
      Chuang HY, Lee E, Liu YT, Lee D, Ideker T
      Network-based classification of breast cancer metastasis
      Mol Syst Biol. 2007;3:140. Epub 2007 Oct 16
      83
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 104. Similar network markers across 2 data sets (better than original overlap)
      Increased classification accuracy
      Better coverage of known cancer risk genes (*)
      84
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 105. PIPE
      Predicts yeast PPI from sequence
      Uses interaction databases to find similar interacting proteins
      Estimates the site of interaction
      75% accuracy (61% sensitivity, 89% specificity)
      Finds new interactions among complexes
      85
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 106. 86
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 107. 87
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 108. PIPE2
      First all-to-all sequence-based computational screen of PPIs in yeast
      29,589 high confidence interactions of ~ 2 x 107 possible pairs
      16,000x faster than PIPE
      99.95% specificity
      88
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 109. 89
      Synthetic Genetic Interactions
      Synthetic genetic interactions (lethal, slow growth)
      Mate two mutants without phenotypes to get a daughter cell with a phenotype
      Synthetic lethal (SL), slow growth
      robotic mating using the yeast deletion library
      Genetic interactions provide functional data on protein interactions or redundant genes
      About 23% of known SLs (1295 - YPD+MIPS) were known protein interactions in yeast
      Tong et al. Science. 2001 Dec 14;294(5550):2364-8
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 110. 90
      Cell Polarity
      Cell Wall Maintenance
      Cell Structure
      Mitosis
      Chromosome Structure
      DNA Synthesis
      DNA Repair
      Unknown
      Others
      Synthetic Genetic Interactions in Yeast
      Tong, Boone
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 111. Validation: Protein Localization
      A – A3: Y2H
      B: physical methods
      C: genetic
      E: immunological
      True positives:
      • Localized in the same cellular compartment
      • 112. Have common cellular role
      Sprinzak, Sattath, Margalit, J Mol Biol, 2003
      91
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 113. Comparisons
      All methods except for Y2H and synthetic lethality technique are biased toward abundant proteins.
      PPI bias toward certain cellular localizations.
      Evolutionarily conserved proteins have much better coverage in Y2H than the proteins restricted to a certain organism.
      C. Von Mering et al, Nature, 2002:
      92
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 114. Functional Associations
      Molecular Interactions
      Regulatory Interactions
      Genetic Interactions
      Similarity relationships
      Co-expression
      Protein sequence
      Domain architecture
      Phylogenetic profiles
      Gene neighborhood
      Gene fusion

      93
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 115. http://string.embl.de/
      von Mering et al., Nucleic Acids Res., 2005
      94
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 116. 95
      95
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 117. 96
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 118. Gene Function Prediction using a
      Multiple Association Network Integration Algorithm
      Query-specific weights for multifaceted function queries
      w1x
      w2x
      w3x
      weights
      CDC27
      Cell
      cycle
      CDC23
      +
      +
      APC11
      UNK1
      Co-complexed
      Durrett 2006
      Genetic
      Tong et al. 2001
      RAD54
      XRS2
      DNA
      repair
      =
      MRE11
      UNK2
      Co-expression
      Pavlidis et al, 2002, Lanckriet et al, 2004
      Mostafavi et al, 2008
      97
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 119. GeneMANIA Cytoscape Plugin
      98
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 120. Outline
      Explore identified proteins
      Attribute enrichment
      Networks
      Pathways
      Lab
      99
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 121. pathway
      In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity.
      100
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 122. Using Pathway Information
      Expert knowledge
      Experimental Data
      Find active processes
      underlying a phenotype
      Databases
      Literature
      Pathway
      Information
      Pathway
      Analysis
      101
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 123. >290 Pathway
      Databases!
      http://pathguide.org
      • Varied formats, representation, coverage
      • 124. Pathway data extremely difficult to combine and use
      Vuk Pavlovic
      Sylva Donaldson
      102
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 125. Aim: Convenient Access to Pathway Information
      http://www.pathwaycommons.org
      Facilitate creation and communication of pathway data
      Aggregate pathway data in the public domain
      Provide easy access for pathway analysis
      103
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 126. Access From Cytoscape
      104
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 127. cardiomyopathy: downregulated genes
      Fatty Acid Degradation?
      Other pathways / processes?
      GenMAPP.org
      105
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 128. Fatty Acid Degradation Pathway
      106
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 129. Cardiomyopathy Data on Fatty Acid Degradation Pathway
      107
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 130. Visualizing Time Course Data on Pathways: Multiple Comparison View
      108
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 131. Outline
      Explore identified proteins
      Attribute enrichment
      Networks
      Pathways
      Lab
      109
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 132. 110
      Network Analysis
      Cytoscape
      Visualize molecular interaction networks and integrate interactions with gene expression profiles and other state data. Data filters & custom plug-in architecture.
      http://www.cytoscape.org
      Biolayout Express 3D
      Large networks
      Gene expression
      www.sanger.ac.uk/Teams/Team101/biolayout/b3d.html
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 133. Expert knowledge
      Experimental Data
      Network Analysis using Cytoscape
      Find biological processes
      underlying a phenotype
      Databases
      Literature
      Network
      Information
      Network
      Analysis
      111
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 134. http://cytoscape.org
      Network visualization and analysis
      Pathway comparison
      Literature mining
      Gene Ontology analysis
      Active modules
      Complex detection
      Network motif search
      UCSD, ISB, Agilent, MSKCC, Pasteur, UCSF, Unilever, UToronto, U Texas
      112
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 135. Manipulate Networks
      Filter/Query
      Interaction Database Search
      Automatic Layout
      113
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 136. Overview
      Zoom
      Focus
      PKC Cell Wall Integrity
      114
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 137. Active Community
      http://www.cytoscape.org
      Help
      8 tutorials, >10 case studies
      Mailing lists for discussion
      Documentation, data sets
      Annual Conference: Houston Nov 6-9, 2009
      10,000s users, 2500 downloads/month
      >40 Plugins Extend Functionality
      Build your own, requires programming
      Cline MS et al. Integration of biological networks and gene expression data using Cytoscape Nat Protoc. 2007;2(10):2366-82
      115
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 138. LAB
      Objective
      Create a map of the functional enrichments from the 14 input proteins
      Methods
      Use HGNC to obtain the gene symbols from the names
      Submit the gene symbols to a tool that already has datasets loaded.
      Get Attributes and do analysis on network
      116
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 139. 14 Proteins
      ISOFORM of APOPTOSIS-INDUCING FACTOR 1, MITOCHONDRIAL
      QUINONE OXIDOREDUCTASE.; 26 KDA PROTEIN.;22 KDA PROTEIN.; 32 KDA PROTEIN.
      14-3-3 PROTEIN EPSILON.
      ELONGATION FACTOR 1-GAMMA.; 50 KDA PROTEIN.
      AFG3-LIKE PROTEIN 2.
      3-KETOACYL-COA THIOLASE, MITOCHONDRIAL
      IMPORTIN BETA-1 SUBUNIT.
      FH1/FH2 DOMAIN-CONTAINING PROTEIN
      ANNEXIN VI ISOFORM 2.; ANNEXIN A6.
      2,4-DIENOYL-COA REDUCTASE, MITOCHONDRIAL
      HYDROXYACYL GLUTATHIONE HYDROLASE ISOFORM 1.; HYDROXYACYLGLUTATHIONE HYDROLASE.
      ISOFORM 1 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA.; ISOFORM 2 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA
      ISOFORM 1 OF LONG-CHAIN-FATTY-ACID--COA LIGASE 1
      PHOSPHOLIPASE C DELTA 4.
      117
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 140. Get their gene symbol/identifiersHGNC - http://www.genenames.org
      Provide a table of mappings
      What challenges did you face when trying to identify the symbols from textual descriptions?
      118
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 141. Identify functional enrichments
      Discuss and provide a plot for the enrichment of Gene Ontology categories
      119
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 142. Build an attribute enrichment network
      Which new proteins are functionally linked?
      What datasets were used in the network construction?
      120
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 143. Attribute Enrichment with a custom data set
      Use BioMart to
      convert HGNC identifiers to Ensembl Identifiers
      Obtain the Gene Ontology categories for the target proteins and the background proteins.
      Use FUNC to do the enrichment analysis
      121
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 144. 122
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 145. 123
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 146. 124
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 147. 125
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 148. Collect the Gene Ontology attributes for the list, then for all the human genes
      126
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
    • 149. Next steps are harder…
      http://func.eva.mpg.de/
      To use FUNC, you need to convert the BioMART output to the file format above. This is pretty easy to do in excel for the protein list, but excel can’t handle the results for all the human proteins. Need to write a small script… take BIOC3008 and become a competent in simple data manipulation 
      127
      BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

    ×