• Like
  • Save
Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Principle of high density cDNA microarrays. Easy, Southern technique. 500-500 bp with average around 1000 bp. Requires about 1-3 ug of total RNA per sample. Caveats: Discrimination between closely related family members not too good. 85-85% homology is not discriminated and actually contribute to the signal in an additive fashion. Consequences in analyzing specific phenotypes For example, Bcl-XL and Bcl-Xs have opposing phenotypes in vivo and would not be distinguish by this technology.
  • Add scanned image from Integrative p. 75
  • Photolithochemistry to add 25 mer to a glass surface. Literally millions of oligonucleotide probes per chip
  • Process of genechip assay. cRNA is fragmented prior to be hybridized to the array.
  • The perfect match and mismatch approaches control for non-specific hybridization.
  • What it looks like
  • Oftware to manage the data provide an intensity of fluorescence and a call on the presence, absence or marginal status of a queried gene.
  • Average difference (intensity of fluorescence) was plotted for duplicate experiments. Nearly all genes called present were found to be expressed at the same level in a duplicate experiment. The diagonal across represent a perfect correlation between experiment. The next line up or down represents a 3 fold difference and the next lines a 10 fold difference. The variability was within 3 fold. This was used as our reference for calling a modulation significant.
  • Integrative, pp. 13-15

presentation presentation Presentation Transcript

  • Bioinformatics: Definitions, Challenges and Impact on Health Care Systems Daniel Masys, M.D. Professor and Chair Department of Biomedical Informatics Vanderbilt University School of Medicine
  • Topics
    • What is Bioinformatics?
    • Health Informatics compared to Bioinformatics
    • Scope of Bioinformatics
    • Genomics data and patient care
    • Impact of Bioinformatics on Health Information Systems
  • Central Dogma of Molecular Biology DNA RNA Protein Phenotype Phenotype Transcription Translation Replication Post Translational Modification
  • What is Bioinformatics? Definitions…
  • NIH Working Definition
    • Bioinformatics : Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
    • http://www. bisti . nih . gov / CompuBioDef . pdf
  • Another… NCBI (National Center for Biotechnology Information
    • Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned.
    • http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
  • Bioinformatics & Health Informatics
    • Bioinformatics is the study of the flow of information in biological sciences.
    • Health Informatics is the study of the flow of information in patient care.
    • These two field are on a collision course as genomics data becomes used in patient care.
            • Russ Altman,MD, PhD, Stanford Univ.
  • Different Areas of Strength
    • Bioinformatics
      • Much more data available on the Internet than Health Informatics
      • Much more progress on database integration across multiple data sources
    • Health (Clinical) Informatics
      • Focus on tailoring common functions to local (very complex) healthcare environments
      • More need for aggregation of local, regional, national outcomes, statistics, knowledge
      • Much more progress on terminologies for integration of data
  • Scope of Bioinformatics OMES and OMICS
  • Omes and Omics
    • Genomics
      • Primarily sequences (DNA and RNA)
      • Databanks and search algorithms
      • Supports studies of molecular evolution (“Tree wars”)
    • Proteomics
      • Sequences (Protein) and structures
      • Mass spectrometry, X-ray crystallography
      • Databanks, knowledge bases, visualization
    • Functional Genomics (transcriptomics)
      • Microarray data
      • Databanks, analysis tools, controlled terminologies
    • Systems Biology (metabolomics)
      • Metabolites and interacting systems (interactomics)
      • Graphs, visualization, modeling, networks of entities
  • Central Dogma of Molecular Biology DNA RNA Protein Phenotype Phenotype Structural Genomics Functional Genomics (Transcriptomics) Proteomics Phenomics
  • Genome and Genomics
    • Genome – entire complement of DNA in a species
      • Both nuclear and mitochondrial/chloroplast
      • Variants among individuals
    • Genomics – study of the sequence, structure and function of the genome. Study relationships among sets of genes rather than single genes.
    • Comparative genomics – study of the differences among species. Usually covers evolutionary studies of differences & conservation over time.
  • Genome Databases (e.g., GenBank)
    • Consists of
      • long strings of DNA bases – ATCG…..
      • Annotations of this database to attach meaning to the sequence data.
    • Example entry from GenBank:
      • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_000410&dopt=gb Hemochromatosis gene HFE
  • Human Genome Project
    • Human Genome Project - International research effort
    • Determine sequence of human genome and other model organisms
    • Began 1987, completed 2003
    • Next steps for ~20,000 genes
      • Function and regulation of all genes
      • Significance of variations between people
      • Cures, therapies, “genomic healthcare”
  • The Genome Sequence is at hand…so? “ The good news is that we have the human genome. The bad news is it’s just a parts list”
  • “ The Human Genome Project has catalyzed striking paradigm changes in biology - biology is an information science.” Leroy Hood, MD, PhD Institute for Systems Biology Seattle, Washington
  • Genomes In Public Databases
    • Published complete genomes:
    • Ongoing prokaryotic genomes:
    • Ongoing eukaryotic genomes:
    http://www.genomesonline.org/ 2050 72 255 158 12/01 10/02 104 316 218 8/03 156 386 246 6/2006 375 945 730
  • Genomics activities
    • Sequence the genes and chromosomes – done by breaking the DNA into parts
    • Map the location of various gene entities to establish their order
    • Compare the sequences with other known sequences to determine similarity
      • Across species, conserved sequence “motifs”
      • Predict secondary structure of proteins
    • Create large databases – GenBank, EMBL, DDBJ
    • Develop algorithms and similarity measures
      • BLAST and its many forms
  • Structural genomics vocabulary
    • Homolog
      • a gene from one species, for example the mouse, that has a common origin and functions the same as a gene from another species, for example, humans, Drosophila, or yeast
    • Orthologs
      • genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution.
    • Paralogs
      • Genes related by duplication within a genome . Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions
  • Central Dogma of Molecular Biology DNA RNA Protein Phenotype Phenotype Genomics Transcriptomics Functional Genetics Proteomics
  • Proteome vs Transcriptome
    • Functional genomics (transcriptomics) looks at the timing and regulation of gene products (mRNA, primarily)
    • Proteome is final end-product (set of many or all proteins).
    • Relationship between transcriptome and proteome is complex, due to longevity of mRNA signal, subsequent control of translation to protein, and post translational modifications.
  • Functional Genomics – Microarrays
    • Transcriptome and transcriptomics
    • High throughput technique designed to measure the relative abundance of mRNA in a cell or tissue in response to an experiment.
    • Also called gene expression analysis
  • Functional Genomics Technologies: Slide, Chip and Filter Arrays
  • How Microarrays Work
    • Conceptual description:
      • Set of targets (oligonucleotides, cDNA’s, proteins, tissues, etc) are immobilized in predetermined positions on a substrate
      • Solution containing tagged molecules capable of binding to the targets is placed over the targets
      • Binding occurs between targets and tagged molecules.
      • Fluorescent or radiolabel tags allows visualization of targets that have been bound.
  • Schematic of probe preparation, hybridization, scanning and image analysis for slide arrays
  • Array slides Amino-silane/poly l-lysine coated
  • Arrayer
  • GeneChip synthesis
  • Genechip analysis system
  • Genechip array design
  • Raw data
  • Genechip analysis software
  • Duplicate Experiments Determination of the confidence level between duplicates. 3 fold differences are generally considered significant.
  • Experimental Design
    • A fundamental challenge of microarray experiments: underdetermined systems
    Kohane IS, Kho AT, Butte AJ. Microarrays for an Integrative Genomics. (The MIT Press; Cambridge, MA; 2003), p. 11.
  • Characteristics of Array Data
    • Voluminous – tens of thousands of variables with relatively few observations of each (upside down vs. classical biostatistics)
    • Noisy – error rates up to 8%
    • Methods designed to detect patterns and associations always find patterns and associations
  • Uses of Expression Profiling
    • Pharmaceutical research:
      • ID drug targets by comparing expression profile of drug-treated cells with those of cells containing mutations in genes encoding known drug targets
    • Disease Dx and Tx:
      • Distinguish morphologically similar cancers
        • DLBCL (Poulsen et al (2005) Microarray-based classification of diffuse large B-cell lymphomas European Journal of Haematology 74(6):453-65.))
      • Therapy potential
        • Rabson AB, Weissmann D. From microarray to bedside: targeting NF-kappaB for therapy of lymphomas. Clin Cancer Res. 2005 Jan 1;11(1)2-6.
  • Future Applications
    • Diagnostic tool to screen for infective agents
      • Chip imprinted with set of pathogenic genomes used to identify bacterial, viral, or parasite genomic material in patient’s body fluids
    • Diagnostic chip to check for mutations involved in drug-gene interactions.
      • Roche Amplichip
  • Public Microarray Data Repositories
    • Major public repositories:
    • GEO (NCBI)
      • http://www.ncbi.nlm.nih.gov/geo/
    • ArrayExpress (EBI)
      • http://www.ebi.ac.uk/arrayexpress/
  • Standards and Repositories
    • Brazma, A, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics. 2001 Dec;29(4):373.
    • http://www.nature.com/ cgi - taf / DynaPage . taf ?file=/ng/journal/v29/n4/full/ng1201-365.html
    • Ball, CA, et al. Submission of Microarray Data to Public Repositories. PLoS Biology . 2004 September; 2 (9): e317
    • http://www. pubmedcentral . nih . gov / articlerender . fcgi ?tool= pubmed & pubmedid =15340489
  • Central Dogma of Molecular Biology DNA RNA Protein Phenotype Phenotype Tissues Organs Organisms Genomics Transcriptomics Functional Genetics Proteomics
  • Proteome and Proteomics
    • Proteome – the entire set of proteins (and other gene products) made by the genome.
    • Proteomics – study of the interactions among proteins in the proteome, including networks of interacting proteins and metabolic considerations. Also includes differences in developmental stages, tissues and organs.
  • Protein Functions
    • Catalysis
    • Transport
    • Nutrition and storage
    • Contraction and mobility
    • Structural elements
      • Cytoskeleton
      • Basement membranes
    • Defense mechanisms
    • Regulation
      • Genetic
      • Hormonal
    • Buffering capacity
  • Protein Databases
    • SwissProt
    • PIR http://www.pir.uniprot.org/
    • GENE http://www.ncbi.nlm.nih.gov/gene
    • InterPro http://www.ebi.ac.uk/interpro/
    • Correspond to (and derived from) Genome data bases
    • All connected by Reference Sequences (NCBI)
  • Gene/Protein Database entries
    • HFE record in Entrez GENE (NCBI)
    • http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?&db=gene&cmd=retrieve&dopt=Graphics&list_uids=3077
  • Structure & Function Determination
    • X-ray crystallography
    • Nuclear magnetic resonance spectroscopy and tandem MS/MS
    • Computational modeling
    • Sequence alignment from others
    • Homology modeling
  • Structure Databases
    • Contain experimentally determined and predicted structures of biological molecules
    • Most structures determined by X-ray crystallography, NMR
    • Example – MMDB molecular modeling db http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml
    • HFE Entry
      • http://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv.cgi?form=6&db=t&Dopt=s&uid=9816
  • Protein Interaction Databases
    • Record observations of protein-protein interactions in cells
    • Attempts to detail interactions observed in thousands of small-scale experiments described in published articles
    • Examples:
      • BIND: Biomolecular Interaction Network Database
      • DIP: Database of Interacting Proteins
      • MIPS: Munich Information Center for Protein Sequences
      • PRONET: Protein interaction on the Web
      • Many others, both academic and commercial
  • Controlled Vocabularies in Bioinformatics
    • The Gene Ontology http://www. geneontology .org/
      • Knowledge about gene function (the ontology itself)
      • Annotation of gene products (for comparisons)
    • The MGED Ontology (arising from MIAME)
      • http:// mged . sourceforge .net/
      • Annotation of microarray experiments for public repositories
    • Clinical Bioinformatics Ontology:
      • Annotation of gene tests in electronic medical records
      • http://www.cerner.com/cbo
    • MIAPE from Proteomics Standards Initiative (PSI)
      • Annotation of proteomics experiments for public repositories
      • http://psidev.sourceforge.net/
  • Genomics Data and Patient Care From genotype to phenotype
  • Human Disease Gene Specifics
    • Genes linked to human diseases (9-2004)
    • + 425 in 2 yrs
    • 1700/20,000 = 9% of loci
  • Informatics Issues related to Genomics Data and Patient Care
    • Linking known data for genes causing human diseases to clinical decision support and EMR documentation
    • Representation of genetic data in electronic medical records
  • Clinical Bioinformatics: Common Questions
    • What genes cause the condition?
    • What are the normal function of the gene?
    • What mutations have been linked to diseases?
    • How does the mutation alter gene function?
    • What laboratories are performing DNA tests?
    • Are there gene therapies or clinical trials?
    • What names are used to refer to the genes and the diseases?
    • What other conditions are linked to these same genes?
  • Answers exist online
    • … but it is not easy; answers in many places
    • Can’t navigate by genes names - must use hot links and numeric identifiers
    • The number and function of alternate forms of the protein are inconsistently reported
    • Synonymy (many names, same meaning) and polysemy (same name, different meanings) cause confusion
    • Upper and lower case are used for species distinctions
  • Major Challenges of Navigation
    • Complexity of data
    • Dynamic nature of the data
    • Diverse foci and number of data/knowledge base systems
    • Data and knowledge representation lack standards
    • Can navigate if you know what you are looking for.
  • Genetics Home Reference
    • Consumer health resource to help the public navigate from phenotype to genotype.
    • Focus on health implications of the Human Genome Project.
    • http://ghr.nlm.nih.gov
    • Mitchell, Fun, McCray, JAMIA, 2004 Nov 11(6):439-437
  • Genetics is Impacting Medicine Today
    • 1700 genes & health conditions
    • > 1100 gene tests for diagnosis
    • Relate to diagnosis, therapy, drug dosage, occupational hazards, reproductive plans, health risks, ….
  • Well-known Examples
    • Pharmacogenetics:
      • CYP450 alleles: exaggerated, diminished or ultra-rapid drug responses. E.G., Warfarin. 93% of patients are OK on standard doses. 7% of patients have severe hemorrhage. CYP2C9*2 and CYP2C9*3 most severe of 6 known mutations.
    • Environmental susceptibility
      • Sickle Cell trait carrier and malaria parasite
    • Nutrition
      • PKU and avoidance of phenylalanine
  • Iressa (gefitinib)
    • Non-small cell lung CA ~ 140,000 pt/yr
    • Iressa (Astra Zeneca) causes remission in 1 of 10 patients if taken daily for life.
    • Iressa efficacy correlates with EGFR mutation in the tumor. Now have gene testing for EGFR so can target appropriate people. http://www.sciencemag.org/cgi/content/full/305/5688/1222a
    • BUT – Astra Zeneca can’t make money on only 14,000 per year.
    • http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=131550
  • Implications for Health Care System
    • More gene tests will be ordered. [reports of 300% increase in gene tests in 2003.]
      • Arch Pathol Lab Med – 2004, 128(12):1330-1333
    • Simultaneous testing will cause the “Incidentalome” – unanticipated findings on screeening genetic tests.
      • Kohane , Masys, Altman, RB. The incidentalome : a threat to genomic medicine. JAMA, 296(2), 212-5, 2006.
    • Preventive healthcare will play a larger part.
    • Environmental risk factors dictate OSHA-type approach to worker empowerment and education about safe behavior
  • Unsolved Informatics Issues: What Should Be Stored in the EMR?
    • Complete DNA sequence for specific genes into the EMR? Where?
    • Meta-data about the DNA sequence?
    • If not the sequence (ie., diff from reference sequence), what to do when the reference sequence changes?
    • How to trigger alerts and reminders? And for what?
  • Genetic data in electronic medical records
    • Implications for component systems:
      • Laboratory
      • Pharmacy
      • Computerized order entry
      • Documentation and notes
    • Knowledge management
      • Alerts and reminders
      • Finding patients matching profiles
      • Practice guidelines and clinical trials
  • Genome Data and Other Information Systems
    • Genomic information will be pervasive in all healthcare information systems.
    • Also in public health systems
      • Newborn screening
      • Tissue and organ banks
      • DOD requires DNA samples
      • Bioterrorism and homeland security
      • Identification of World Trade Center victims
    • Privacy and security issues are important but not inherently different than other EMR data.
  • Summary
    • Informatics will be the key enabling technology for personalized, genomic medicine.
    • Current separation between bioinformatics and clinical informatics will diminish as the two subdisciplines merge
  • Optional Exercise: Hands-on with GHR
    • Scavenger hunt with hemochromatosis and the genes that influence it.
    • Explore the Genetics Home Reference by answering the following questions. Start at http:// ghr . nlm . nih . gov .
  • GHR Scavenger Hunt
    • How common is hemochromatosis?
    • How many genes have been proven to be involved in hemochromatosis when the genes are mutated?
    • What are the symbols for these genes?
    • Can you find the link to MedlinePlus with health information on hemochromatosis?
  • GHR Scavenger Hunt
    • What are the names of the patient support associations for hemochromatosis?
    • One synonym for this condition is “bronze diabetes”. Can you find a reason for this?
    • What kind of damage is done to the liver of people with hemochromatosis?
  • GHR Scavenger Hunt
    • For the genes involved in hemochromatosis, how many of them are available as a DNA test?
    • Give one place where you would choose to send a tissue sample for DNA testing.
    • What sites are listed under “Research Resources” for the TFR2 gene?
      • How many alternately spliced proteins for TFR2?
      • In what tissues is this gene expressed?
  • GHR Scavenger Hunt
    • How do people inherit hemochromatosis?
    • Do the genes involved in hemochromatosis cause other health conditions when they are mutated?
    • Can you find a protein sequence for one of the genes?
    • What clinical trials are available for hemochromatosis patients close to where you live?