Data preprocessing Classification problem Clustering problem Microarray analysis Sequence alignment Hidden Markov Model Basic Genetic Mechanisms 4. DNA and Chromosomes 6. From DNA to Protein 7. Control of gene expression Diseases: 23. Cancer 25. Pathogens Others: SNP, NRAi Gene finding Motif finding Bioinformatics/Computational Biology Molecular Biology
Serious in bioinformatics (all HuGe Lab members): Mini (NIH-) proposal project. Besides preliminary results, a proposal for future work (i.e. independent studies, theses). Possible collaborations with UTHSCSA and others.
Specific Aim(s): What do you want to do? Why is it important?
Background: What have been done previously? (What make you approach interesting?) Where do you get your data?
(Preliminary) Result: To elaborate later.
Future Work: To elaborate later.
A project: Same as above except do not need to have future work.
Office hours (for projects): By appointment (send me an email 24 hours before) Tu, Th 10-3, 5-7, 8:30-10. W 10:30-noon.
Inorganic ions: Na + , K + , Mg + , Ca 2+ , Cl - [E. Coli: 1%, Mammal: 1%]
Phospholipids [E. Coli: 2%, Mammal: 3%]
Other lipids [E. Coli: -, Mammal: 0.2%]
Polysaccahrides [E. Coli: 1%, Mammal: 0.25%]
Volume: [E. Coli: 2 x 10 -12 cm, Mammal: 4 x 10 -9 cm]
Relative Volume: [E. Coli: Mammal = 1: 2000]
A.2 Building Blocks: Structure of bases, nucleosides and nucleotides DNA: ‘polymer of A, G, T, C’ RNA: ‘polymer of A, G, U (replace T), C’ sugar base Purines: Pyrimidines:
A.2. Building Blocks: Common bases found in nucleic acids
A.2 Building Blocks : 20 amino acids Polypeptides: chains of amino acids Amino group Carboxyl group
A.2. Building Blocks: Abbreviation of Amino Acids (CH3)2-CH-CH(NH2)-COOH val V Valine HO-Ph-CH2-CH(NH2)-COOH tyr Y Tyrosine Ph -NH-CH= C -CH2-CH(NH2)-COOH trp W Tryptophan CH3-CH(OH)-CH(NH2)-COOH thr T Threonine HO-CH2-CH(NH2)-COOH ser S Serine N H-(CH2)3- C H-COOH pro P Proline Ph-CH2-CH(NH2)-COOH phe F Phenylalanine CH3-S-(CH2)2-CH(NH2)-COOH met M Methionine H2N-(CH2)4-CH(NH2)-COOH lys K Lysine (CH3)2-CH-CH2-CH(NH2)-COOH leu L Leucine CH3-CH2-CH(CH3)-CH(NH2)-COOH ile I Isoleucine N H-CH=N-CH= C -CH2-CH(NH2)-COOH his H Histidine NH2-CH2-COOH gly G Glycine H2N-CO-(CH2)2-CH(NH2)-COOH gln Q Glutamine HOOC-(CH2)2-CH(NH2)-COOH glu E Glutamic Acid HS-CH2-CH(NH2)-COOH cys C Cysteine HOOC-CH2-CH(NH2)-COOH asp D Aspartic Acid H2N-CO-CH2-CH(NH2)-COOH asn N Asparagine HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH arg R Arginine CH3-CH(NH2)-COOH ala A Alanine Linear Structure Abbreviation Name
A.2. Building blocks: Properties of Amino Acids I http://www.russell.embl-heidelberg.de/aas/aas.html
A.2. Building blocks: Some Terms for describing Properties of Amino Acids
Hydrophobic amino acids are those with side-chains that do not like to reside in an aqueous (i.e. water) environment.
Polar amino acids are those with side-chains that prefer to reside in an aqueous (i.e. water) environment.
Strictly speaking, aliphatic implies that the protein side chain contains only carbon or hydrogen atoms.
A side chain is aromatic when it contains an aromatic ring system.
A.2 Building Blocks: Covalent and Non-covalent Bonds
Covalent bonds : stronger. Nucleic acid and protein polymers are from by covalent binds connecting nucleotides and amino acids (respectively) to form a linear backbone
Non-covalent bonds : weaker and revisible. 4 types:
Hydrogen bonds : N – H –O [double-stranded DNA, protein folding, …etc
Ionic bonds : Ionic interaction between charged group, sat Na+ and Cl-
Van der Waals : Optimum attraction between two atoms.
A.6. Translation, post-translation processing and protein structure
A.7. Project ideas
A.4 Transcription and Gene Expression:Transcription exon exon exon intron intron start stop 5’ UTR 3’ UTR promote r TFBS 5’ 3’ (1 st key) Nuclear membrane (2 nd key, May not be there) cap pore TFBS (almost always there) (mostly for non-housing gene) TFBS – Transcription factor binding site exon exon exon intron intron start stop 5’ UTR 3’ UTR (complementary nucleotides) Pre-mRNA poly A
A.4 Transcription and Gene Expression:Gene Regulation http:// henge.bio.miami.edu/mallery/movies/transcription.mov http://www-class.unl.edu/biochem/gp2/m_biology/animation/gene/gene_a2.html A G T C U C A G G C G
A.4 Transcription and Gene Expression:RNA Polymerase
There are three classes of RNA Polymerases:
Polymerase I: Localized in the nucleolus. Transcribe rRNA (ribosome RNA) 28S, 18S 5.8S rRNA.
Polymerase II: All protein-coding genes most smRNAs. Unique in capping and polyadenylation.
Polymerase III: tRNA, other rRNAs, snRNAs. [The promoter can be downstream]
Pusedo-genes (gene fragments): Previously were genes
Only 2% of the human genome encode proteins.
A.4 Transcription and Gene Expression: Trans- and cis-elements Important: If pattern is there, does not necessary mean it is a cis-element. CREB/ATF family GTGACGT(A/C)A(A/G) CRE (cAMP response element) AP-1 family (many) GTGAGT(A/C)A TRE Many CCAAT CAAT Box TFIID (TFIIA – stabilize it) TATAA TATA Box Spl GGGCGG GC Box Trans-acting Factor DNA sequence Cis- element
A.4 Transcription and Gene Expression: Promoters Start from 1 not 0
A.4 Transcription and Gene Expression: Enhancers and Silencers (Transcription Factors) Many basepairs away
A.4 Transcription and Gene Expression: Tissue Specific Genes
House keeping genes: Genes encoding histone protein, ribosome protein. Always on.
Tissue or development-specific (non-housekeeping) genes:
Transcriptional inactive chromatin
Methylation of Cytosine, replacing a hydrogen (H) with methyl (CH 3 )
Transcription factors’ expression levels are low.
Microarrays measure the expression levels of genes
A.6 Translation and Post-Translational Processing : Peptide Bond Formation
A.6 Translation and Post-Translational Processing: The Genetic Codes N-terminal C-terminal
A.6 Translation and Post-Translational Processing: The Genetic Codes 64 possible codons: 1 Start codon AUG. 3 stop codons, 20 amino acids Signal in mRNAs can lead to alternative interpretation of stop codons: UGA 21 st AA selencocysteine, UAG 22 nd AA pyrrolysine. wobble - mitochondrial
A.6 Translation and Post-Translational Processing: Multiple Post-Translational Cleavages of Polypeptide Precursors
A.6 Translation and Post-Translational Processing: Protein Secondary Structure
A.6 Translation and Post-Translational Processing: Protein Sorting (Localization) 1. Signal Peptide 2. Post-translational modification Addition of mannose 6-phosphate residues Lysosome Internal sequence of amino acids. Often a string of basic amino acids plus prolines; maybe bipartite. Nucleus N-terminal peptide, a-helix. One side hydrophilic and one side hydrophobic Mitochondria N-terminal peptide of 20 or so very hydrophobic AAs. Endoplasmic reticulum and secretion from cell (Typical) Location and form of signal Protein Destination
A.6 Translation and Post-Translational Processing: Cellular Function of Proteins