• Like
Characterization of genes and proteins of cross-species biological pathways
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Characterization of genes and proteins of cross-species biological pathways

  • 1,045 views
Published

Presented at the 2010 UMUC Biotechnology Symposium, May 21, 2010, Rockville, MD.

Presented at the 2010 UMUC Biotechnology Symposium, May 21, 2010, Rockville, MD.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,045
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Characterization of genes and proteins of cross-species biological pathways Jennifer Ivy Dong, Douglas James Joubert–NIH Library, Raina Kumar, & Robert Stephen–ABCC/NCI Introduction Materials and methods Results The process has four major modules: The new era of genomics and proteomics, with the advent of Six pathways, three each from BioCarta and KEGG, were 1. Identify homologous proteins using the Homologene database 3. Find variations using multiple sequence alignments high throughput technologies such as microarrays and next analyzed using this process and the results for these pathways generation sequencing, has opened up great opportunities for 2. Identify homologous proteins using similarity search 4. Find all known variations from the UniProt database are presented below. The matrices for one of the pathways are the life science research community to better understand also shown for illustration. biological processes. The gene lists obtained from data through BioCarta Pathways these experiments are generally analyzed further in the context of biological pathways as well as with available biological Interferon Gamma (IGP): The IGP pathway has a significant knowledge sets such as specifically described gene ontologies, role in the body's immune response. It has 6 genes, all well gene sets and gene enrichments. Efforts are underway to conserved among mammals except for JAK1 and STAT1 in develop new methods to derive biologically meaningful Pan troglodyte. information from the gene lists obtained from such technologies. Start with a BioCarta/KEGG pathway name Nerve Growth Factor (NGF): NGF is important for the survival Although there has been considerable effort extended at the Identify homologous proteins by similarity search of neurons during embryonic development and has an effect on level of building, maintaining and distributing these gene sets, a Map Sequence Id to protein Id using Retrieve gene list from CGAP with gene sequence IDs the growth of sensory and sympathetic ganglia. It has 20 genes BioDbnet system allowing visualization of their conservation across and most are well-conserved. Across species the exceptions Identify homologous proteins in mammalian species has not been developed. We have Perform BlastP for Proteins homologene database include DPM2 and ELK1, and KLK2. Within species, only Canis Retrieve homolog group ID for each gene from developed a process to retrieve information from two pathway Homologene database at NCBI lupus familiaris had NGF genes that were less conserved. Populate matrices with best hits using databases, KEGG and BioCarta, and combine it with information taxonomy report Protein Kinase C through G-protein coupled receptor from other biological databases such as Homologene and Report value 1 for species from homolog for each gene for mammals (PKC): GPCRs are involved in signal transduction and play a Uniprot to characterize cross-species conservation of genes and Fetch sequences using protein seq ID for all the Find variations homologous genes for each pathway gene for Perl scripts role in various cellular functions. There are 9 genes in this proteins and gain insights into new biological knowledge. from MSA mammals Populate matrices (heat map), where genes are at X- axis and species at Y-axis pathway, and all the genes are extremely well-conserved. Specifically, we are trying to understand which genes and proteins are common in given pathways across species among Perform MSA by ClustalW Find known variations KEGG Pathways mammals such as human (Homo sapiens), mouse (Mus Perl scripts Identify protein IDs of all the proteins for same species in NCBI database using Sequence Id or Hedgehog Pathway: The hedgehog signaling pathway is musculus), rat (Rattus norvegicus), dog (Canis lupus familiaris), Use *.dnd to make cladogram Map sequence id to UniProt Id using BioDbnet believed to govern the growth of embryonic stem cells as well cow (Bos taurus), and chimpanzee (Pan troglodytes). We also Search for variations in *.aln files as metamorphosis in general. It has 44 genes, of which 23 are explore the problem of finding the variations or mutations in For each protein search for UniProt entry from files Perl scripts Report variations in tab-delimited files derived from UniProt conserved among all represented mammals. Three genes these genes and proteins that are well tolerated across these SPA18, DRYK1A, and BTRC are common in all mammals. species. Read known variation in flat file and return annotation in tab delimited file Basal Transcription Factors (BTF): BTF is a major control point for gene expression in eukaryotes and it contains 34 genes. Most genes in this pathway are well-conserved except GTF2AIL and STON1. Dorsal-ventral axis formation (DVF): The DVF pathway is controlled by GRK and EGFR and is important in limb development. It has 29 genes and most of the genes are well- Objectives conserved, the exception being FMN2. Matrices obtained This project focused on developing methods for deriving the through homologene and similarity method are shown below: cross-species annotations for genes and protein groups Cynomolgus monkey Sumatran Orangutan Rhesus Macaque European Rabbit Western baboon Domestic Sheep Syrian Hamster Gene Symbol Gene Symbol White Bear Opposum Wild boar Platypus Human Bonobo Mouse Chimp Human Horse Mouse Gorilla Chimp identified in candidate pathways. The project had three primary Cow Cow Dog mice Rat Rat Cat Dog goals: Conclusions Future work BRAF CPEB1 EGFR 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 BRAF CPEB1 EGFR 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1. Produce a matrix containing genes in a particular biological ERBB2 1 1 1 0 1 0 ERBB2 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 ERBB4 1 1 1 0 1 1 We developed a process for characterizing cross-species Future work includes: ERBB4 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 ETS1 1 1 1 1 1 0 ETS1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 1 0 0 0 0 Similarity Search ETS2 1 1 1 1 1 1 pathway ETS2 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 ETV6 1 1 1 1 1 1 ETV6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 conservation of gene and proteins for mammals, and finding ETV7 1 0 1 0 0 1 ETV7 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Homologene 1. Fully automate the process FMN2 GRB2 1 1 0 1 0 1 0 1 0 1 0 1 FMN2 GRB2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 2. Construct a list of known protein variations associated with variations within these genes and proteins. This projects KRAS MAP2K1 1 1 1 1 1 1 1 1 1 1 1 1 KRAS MAP2K1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 2. Visualization MAPK1 1 1 1 1 1 1 MAPK1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 each gene in a pathway highlights the challenges associated with developing MAPK3 NOTCH1 1 1 1 1 1 1 1 0 1 1 1 1 MAPK3 NOTCH1 NOTCH2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 NOTCH2 1 1 1 1 1 1 NOTCH3 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 1 0 0 0 3. Develop a more effective procedure for characterizing cross- meaningful biological networks from disparate computational 3. Develop a database schema to store the results NOTCH3 NOTCH4 1 1 1 1 1 1 0 1 1 1 0 1 NOTCH4 PIWIL1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 PIWIL1 1 1 1 1 1 1 PIWIL2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 tools and databases. This project also emphasizes the PIWIL2 1 1 1 0 1 1 PIWIL3 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 species conservation of genes and proteins PIWIL3 RAF1 RHBDL1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 PIWIL4 RAF1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 limitations of relying exclusively on BioCarta and KEGG in RHBDL1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 RHBDL3 1 1 1 1 1 1 RHBDL3 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 SOS1 1 1 1 1 1 1 SOS1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 SOS2 1 1 1 1 0 0 SOS2 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 1 0 0 0 pathway discovery. - 1 1 1 1 1 1 - 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 University of Maryland University College ABCC/NCI-Frederick 3501 University Boulevard East P.O. Box B, Bldg. 430 Adelphi, MD 20783 Frederick, MD 21702 Disclaimer: The opinions and assertions presented here are the private views of the authors and are not necessarily that of ABCC/NCI.