Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to data integration in bioinformatics

1,315 views

Published on

A brief introduction to the basic concepts and terms in bioinformatics

Published in: Technology
  • Be the first to comment

Introduction to data integration in bioinformatics

  1. 1. Introduction to Data Integration in Bioinformatics Yan Xu Dec. 2013
  2. 2. Data Integration Copy Number Epigenome Methylation miRNA Gene Expression Clinical data Introduction to Data Integration in Bioinformatics Pathways Dec. 2013
  3. 3. Recent Publications R. Louhimo, T. Lepikhova, O. Monni, and S. Hautaniemi, ‖Comparative analysis of algorithms for integration of copy number and expression data,‖ Nature Methods, 2012. The ENCODE Project Consortium, ―An integrated encyclopedia of DNA elements in the human genome, ‖ Nature, 2012. S. Aerts and J. Cools, ―Cancer: Mutations close in on gene regulation,‖ Nature, Jul. 2013. V. J. H. Powell and A. Acharya, ―Disease Prevention: Data Integration,‖ Science, Dec. 2012. A. Vinayagam, Y. Hu, M. Kulkarni, C. Roesel, R. Sopko, S. E. Mohr, and N. Perrimon ―Protein Complex–Based Analysis Framework for High-Throughput Data Sets,‖ Science Signaling, Feb. 2013. Introduction to Data Integration in Bioinformatics Dec. 2013
  4. 4. DNA the molecule of life Protein-coding DNA makes up barely 2% of the human genome, About 80% of the bases in the genome may be expressed without an identified function. Introduction to Data Integration in Bioinformatics Dec. 2013
  5. 5. Gene Expression DNA: Two long biopolymers made of nucleotides,composed of nucleobase: A: Adenine T: Thymine C: Cytosine G: Guanine termination codon Poly-A tail cap start codon Sequence of amino acids Introduction to Data Integration in Bioinformatics Dec. 2013
  6. 6. Microarray Reverse Transcription Result Introduction to Data Integration in Bioinformatics Dec. 2013
  7. 7. Next generation RNA-sequencing EST: Expressed Sequence Tag Reads of a single type of nucleotide at one moment (animation) The number of nucleotide reads at one moment Reference: Open Reading Frame Introduction to Data Integration in Bioinformatics Time Dec. 2013
  8. 8. DNA structural variation: Copy number CNV (Copy Number Variation): • 12% of human genomic DNA • 0.4% of the genome of unrelated people differ with respect to copy number • Range from 1000 nucleotide bases to several megabases • Inherited or caused by de novo mutation (not inherited from either parent). Relation to disease: Higher EGFR (Epidermal growth factor receptor) copy number exist in Non-small cell lung cancer. (Cappuzzo et al. Journal of the National Cancer Institute, 2005) Higher copy number of CCL3L1 decreases susceptibility to HIV. (Gonzalez et al. Nature, 2005) Low copy number of FCGR3B increases susceptibility to inflammatory autoimmune disorders (Aitman et al. Nature, 2006). Introduction to Data Integration in Bioinformatics Dec. 2013
  9. 9. Epigenome: DNA Methylation Why we look so different even we have the exactly identical genes ?? What, when and where Epigenome directions Introduction to Data Integration in Bioinformatics Genome • Addition of a methyl group to the C or A DNA nucleotides. • Permanent and unidirectional • Can be copied across cell divisions or even passed on to offsprings Dec. 2013
  10. 10. miRNA (microRNA) Genome has protein-coding genes, also has genes that code for small RNA e.g., ―transfer RNA‖ that is used in translation is coded by genes e.g., ―ribosomal RNA‖ that forms part of the structure of the ribosome, is also coded by genes miRNA: 21-22 nucleotide non-coding RNA miRNA Pathway • Perfect complementary binding leads to mRNA degradation of the target gene • Imperfect pairing inhibits translation of mRNA to protein RISC: RNA-induced silencing complex. Use miRNA as a template for recognizing complementary mRNA Introduction to Data Integration in Bioinformatics Dec. 2013
  11. 11. Clinical data General clinical checkup data: temperature, blood pressure; Pathology: blood test, antibody test; Radiology: X-ray, CT (Computed tomography), Ultrasound, MRI (Magnetic resonance imaging). Texture Heterogeneity High score Low score Introduction to Data Integration in Bioinformatics Internal Arteries High score Low score Dec. 2013
  12. 12. Challenges of data integration analysis • Large highly connected data sources and ontologies • Heterogeneity: functions, structures, data access and analysis methods, dissemination formats. • Incomplete or overlapping data sources • Frequent changes Introduction to Data Integration in Bioinformatics Dec. 2013
  13. 13. Case I E. Segal et al.,―Decoding global gene expression programs in liver cancer by noninvasive imaging,‖ nature biotechnology, May 2007. E. Segal et al. “, Module network: identifying regulatory modules and their condition-specific regulators from gene expression data,” nature genetics, 2003. Introduction to Data Integration in Bioinformatics Dec. 2013
  14. 14. Case II O. Gevaert et al., ―Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data—Methods and Preliminary Results ,‖ Radiology, Aug. 2012. Introduction to Data Integration in Bioinformatics Dec. 2013

×