Bioinfomatics Presentation


Published on

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bioinfomatics Presentation

  1. 1. Stem Cells and Bioinformatics Michelle Previtera Zhenhong Bao
  2. 2. Induced Pluripotent Stem Cell Lines Derived from Human Somatic Cells <ul><li>OCT4 </li></ul><ul><li>SOX2 </li></ul><ul><li>NANOG </li></ul><ul><li>LIN28 </li></ul>
  3. 3. OCT4 <ul><li>Exercise 1: </li></ul><ul><ul><li>Do a Pubmed search </li></ul></ul><ul><ul><li>Refine your search using the Limit tab to include studies only involving humans. </li></ul></ul><ul><ul><ul><li>What other tools do you see that are useful for refining your search? </li></ul></ul></ul><ul><ul><ul><li>What is OCT4 function in humans according to published literature? </li></ul></ul></ul><ul><ul><ul><li>What is the difference in amount of articles found with/without the limit? </li></ul></ul></ul><ul><ul><li>Do a Google Scholar search for OCT4 </li></ul></ul><ul><ul><ul><li>Do the advance search options have the same options as the Limits tabs on Pubmed? </li></ul></ul></ul><ul><ul><ul><li>What can you do in Google scholar to refine your results? </li></ul></ul></ul>
  4. 4. Answer <ul><li>OCT4 is a transcription factor involved to maintain pluripotency in ES cells </li></ul><ul><li>The difference is: </li></ul><ul><ul><li>Non limit 358 </li></ul></ul><ul><ul><li>Limit 154 </li></ul></ul><ul><li>Google and Pubmed do not have the same refinement tools </li></ul><ul><li>Use: </li></ul><ul><ul><li>&quot;+&quot; operator makes sure your results include common words, letters or numbers that Google's search technology generally ignores </li></ul></ul><ul><ul><li>&quot;-&quot; operator excludes all results that include this search term, </li></ul></ul><ul><ul><li>phrase search only returns results that include this exact phrase </li></ul></ul><ul><ul><li>the &quot;OR&quot; operator returns results that include either of your search terms </li></ul></ul><ul><ul><li>the &quot;intitle:&quot; operator only returns results that include your search term in the document's title. </li></ul></ul><ul><ul><li>From </li></ul></ul>
  5. 5. SOX2 <ul><li>Using NCBI Gene Entrez </li></ul><ul><ul><li>What are 2 features that make SOX2 unique? </li></ul></ul><ul><ul><ul><li>HINT: Look in the summary paragraph in Gene Entrex </li></ul></ul></ul><ul><ul><li>What does SOX2 interact with and what is the function of this interaction ? </li></ul></ul><ul><ul><li>Is there a structure for SOX2 on CDD? </li></ul></ul><ul><ul><ul><li>If so what is interesting about the structure when you download it on Cn3d? </li></ul></ul></ul><ul><ul><ul><ul><li>What is the structure aligned to? </li></ul></ul></ul></ul><ul><ul><ul><ul><li>What do the # marks mean on the feature 1 row? </li></ul></ul></ul></ul>
  6. 6. Answer <ul><li>intronless gene and lies within an intron of another gene called SOX2 overlapping transcript (SOX2OT) </li></ul><ul><li>OCT4 to establish first 3 lineages in ES. </li></ul><ul><li>Shows DNA Binding site and those are the residues that are Blue and conserved </li></ul><ul><li>Chain D, Crystal Structure Of A PouHMGDNA TERNARY COMPLEX. </li></ul><ul><li>('#') pinpoint features </li></ul>
  7. 7. NANOG <ul><li>Illustration using EMBOSS Suite </li></ul><ul><li>Exercise: </li></ul><ul><ul><li>Obtain mRNA and genomic sequence of human NANOG in FASTA format from NCBI Entrez . </li></ul></ul><ul><ul><li>According to Infoseq , what’s the length, GC content, accession number of the human NANOG mRNA? </li></ul></ul><ul><ul><li>Align the mRNA sequence to the genomic sequence using dottup , show results. </li></ul></ul><ul><ul><li>Do a local alignment with human mRNA sequence to mouse sequence using Water . </li></ul></ul><ul><ul><li>Use getorf/plotorf to obtain ORF of human NANOG mRNA sequence. </li></ul></ul><ul><ul><li>Translate the mRNA sequence to a protein sequence with Transeq . </li></ul></ul><ul><ul><li>Show the hydropathy plot of the above protein sequence using Pepinfo . </li></ul></ul>
  8. 8. NCBI Entrez <ul><li>Obtain sequence files from NCBI </li></ul>
  9. 9. Infoseq <ul><li>Name, Accession, Type, GI, length, GC %, etc. </li></ul><ul><li>Accession No: NM_024865.2 </li></ul><ul><li>Length: 2098nt </li></ul><ul><li>GC content: 45.28% </li></ul><ul><li>Description: Homo sapiens Nanog homeobox (NANOG), mRNA </li></ul>
  10. 10. dottup <ul><li>Dottup looks for exact matches between sequences </li></ul>Word Size = 10 Word Size = 20
  11. 11. Water <ul><li>local alignment as in water searches for regions of local similarity and need not include the entire length of the sequences </li></ul><ul><li>Results for NANOG mRNA between homo sapiens and Mus Musculus </li></ul>
  12. 12. Plotorf/getorf <ul><li>Finds and plots potential open reading frames. (ORF) </li></ul><ul><li>ORF in plotorf defined as regions between START and STOP codons. </li></ul>
  13. 13. Transeq <ul><li>Transeq translate mRNA sequence to protein sequence. </li></ul><ul><li>ORF obtained from Plotorf/getorf. </li></ul><ul><li>Human NANOG mRNA ORF 217 to 1131 </li></ul><ul><li>Translated Protein Sequence </li></ul>
  14. 14. Pepinfo <ul><li>Pepinfo produces information on amino acid properties (size, polarity, aromaticity, charge etc). </li></ul>
  15. 15. LIN28 <ul><li>Illustration using Blast </li></ul><ul><li>Exercise </li></ul><ul><ul><li>Obtain human protein sequence of LIN28 from NCBI </li></ul></ul><ul><ul><li>Search nucleotide database using a protein query ( tblastn ) </li></ul></ul><ul><ul><li>Search Conserved Domain Database ( CDD ) for conserved domains </li></ul></ul><ul><ul><li>From taxonomy report , find the best match of mouse homolog </li></ul></ul><ul><ul><li>Compare the conserved domains from human and mouse proteins </li></ul></ul>
  16. 16. Blast <ul><li>tblastn: Search translated nucleotide database using a protein query </li></ul><ul><li>Search Result: </li></ul><ul><li>homologs of </li></ul><ul><li>human LIN28 </li></ul>
  17. 17. Conserved Domain Database (CDD) <ul><li>Result for LIN28 homo sapiens </li></ul><ul><li>CSD, Cold-shock DNA-binding domain, 67aa , 95.5% aligned </li></ul><ul><li>AIR1, Arginine methyltransferase-interacting protein, 190aa 34.7% aligned </li></ul>
  18. 18. Taxonomy Report <ul><li>Blast results are categorized in species </li></ul><ul><li>the best match in mus musculus : </li></ul><ul><li>protein accession number: NP_665832 , E value 2e-103 </li></ul>
  19. 19. CD from mouse protein <ul><li>Do the same search with mouse LIN28 protein in CDD </li></ul><ul><li>CSD: 67 aa, 95.5% AIR1: 190aa, 26.84% </li></ul><ul><li>Compared to human LIN28 </li></ul><ul><li>CSD:67aa, 95.5% AIR1: 190aa, 34.7% </li></ul>
  20. 20. Literatures <ul><li>Gene functions related to pluripotency. </li></ul><ul><li>Oct4 is required to maintain the undifferentiated stem cell state, and differentiation to trophectoderm occurs in its absence. </li></ul><ul><li>NANOG plays a crucial role in maintaining the pluripotent state of primate embryonic stem cells. </li></ul><ul><li>… </li></ul>