Published on

An Introduction to Bioinformatics
Drexel University INFO648-900-200915

A Presentation of Health Informatics Group 5

Cecilia Vernes
Joel Abueg
Kadodjomon Yeo
Sharon McDowell Hall
Terrence Hughes

Published in: Technology
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Welcome to Group 5’s presentation of Bioinformatics for Information Science and Technology Course 648, Health Informatics, Fall 2009, at Drexel University. Group 5 is comprised of Cecilia, Yeo, Sharon, Terry, and your narrator, Joel.
  • As the Altman and Mooney chapter begins, “Bioinformatics is the study of how information is represented and analysis in biological systems, starting at the molecular level.” None of the group members are subject matter experts in molecular biology and genomics, or biology for that matter; it is inconceivable that anyone in our group would expect to be making a meaningful contribution to the field without significant further study. So, it in this presentation we don’t plan to go in any great depth regarding specific tools and techniques of bioinformatics. We also won’t bother with a discussion of the basic facts of genetics, DNA, or the content matter that is managed in bioinformatics as a separate topic. There are suggested readings for this, some reference material and websites that we point you can reviewed for background information. We have approached the subject as outsiders, and will focus in on things that we’ve found interesting, or that the have connections to things we’ve discussed in this class. In a sense, through our assigned readings, and our perusal of the literature, we’ve been hitchhikers moving from different islands of information looking for sense of connectedness. We don’t claim to provide you something comprehensive here or very systematic, though we’ve made an attempt to do so, within the constraints of the assignment, technology, and time. Hopefully, you’ll find the information we provide here in this “guide” interesting and useful—which goes to a major goal of the presentation, we hope to answer why we (in the specific sense of members of this class) should study bioinformatics. The bioinformatics topic is quite a bit different than others we’ve studied. Although the clinical applications are many, it is a part of the basic biological sciences. It has also played a critical and important role in much of the new biological knowledge gained in recent years, and the rate at which new biological information is accelerating dramatically. Creating meaning out of this information, that is adding to the biological knowledge (and through translational research, clinical/medical knowledge) is a significant challenge that bioinformatics hopes to address. Provide some definitions – bioinformatics is a relatively new field, and as an interdisciplinary one, the exact definition varies in the literature Though we won’t go into details (again we provide references if you are interested in pursuing something further), it is helpful to give a sense of what has been and will be accomplished with bioinformatics We do this in part to identify some of things that have made the field successful – in particular some of the information management methods and technologies that constitute the “informatics” part of bio-informatics, After a considering the bio and informatics part of the topic, we point out some of the connections to issues discussed in previous weeks of the class Finally, we raise issues and questions for discussion. Hopefully, what we provide here, along with the what our instructor has given us all, we lead to an engaging discussion that is both informative and useful.
  • To meet the goals of this presentation, we’ll actually try to answer the question of why bioinformatics should be studied in three parts. For the first part of the presentation, we’ll look at the role bioinformatics has played in an explosion of biological information. We’ll see that the same informatics techniques and tools applied to other domains of biomedical informatics have been applied to great effect in the biological sciences. In section two, we look briefly at some non-medical research supported by bioinformatics, then look at ways in which the products biological research impact medicine, and medical informatics. We’ll also explicitly compare bioinformatics and medical informatics, given that one of the papers assigned handles that topic pretty well. Finally, in the last section, we’ll outline the legal, ethical, and social questions raised as a consequence of genomic data. We’ve provided some additional resources that appear at the end of the presentation. This includes a reference list, two sets of websites that we recommend for further research on the topic, and a short glossary of terms. We also have a short un-narrated presentation on the topic of bioterrorism. In most cases, the content of any narration is found in the presentation’s notes section of this document.
  • Bioinformatics uses computers and algorithms to acquire, manage, and analyze biological information. It’s a multidisciplinary field that meshes the fields of computer science, with the world of molecular biology to develop databases of information than can be useful to improving patient care and practice.
  • In fact, the bioinformatics also crosses into math and physics, which are important in understanding molecular structures and how they interact. In this presentation, we won’t talk too much about specific findings in biology. But we will consider what informatics tools have been important, and the relation to medicine, clinical, public health, and consumer health informatics.
  • Bioinformatics evolved as biological researchers needed to solve problems of how to store, manage, analyze, and model biological data, from the DNA sequences, to molecules, proteins, cells, all the way up to the organism and its physiology. There are many definitions, but a key point is that a core part of the field is concerned with the fundamental information of biology….
  • Since the discovery of the structure of DNA in the 1950’s, biological research has been increasing reliant on information science, its methods, and tools. DNA (and RNA) is widely understood as being nature’s way of storing the much of the key information used by cells and organisms for their structure and function. The total collection of DNA for an organism is its genome. As pointed out in the chapter, the roots of bioinformatics go back the 1930’s with the use of electrophoresis--specifically gel electrophoresis--a key technique of molecular biology used to separate out the nucleic acids (DNA and RNA) and proteins so they can be studied separately. The advent of computing in the 1950’s also contributed to the rise of computational methods in biology. But as we shall see, much of the growth in knowledge has emerged in the last 10 years. Before we go into that lets consider what sorts of bioinformatics data there are. The pictured here are merely a starting point.
  • Most everyone is aware that the Human Genome Project mapped the Human Genome in 2003, which documented that all humans are essentially 99% similar at the genetic level, in spite of any self perceived differences in race or ethnicity. The real work is now underway to forge a better understanding of that 1% difference that makes each of us unique, and to better understand how those differences, along with environment and lifestyle, affect our individual health status. This international collaboration made the data available in the public domain to spur scientific breakthroughs, which it has done in areas related to pharmacology and other areas of medicine. The field of genomics is changing how diseases are understood and treated, which has significant implications for human health. New technologies are being developed to address the myriad of issues related to collecting, storing and interpreting the data. Personalized medicine is the next frontier, with treatments customized to an individual based upon an understanding of their genome.
  • As complex as the genome, or full set of an organism’s DNA, genomics, or the study of it, is only one part of an increasing complex set of things to be studied. Bioinformatics is largely concerned supporting these –omics.
  • A recent Nursing Outlook article discussed what nurses and consumers should know of the –omics research and its implications for consumer health informatics. The table on the right provides a useful set of definitions from this article that gives some context to the “levels” of research being done and where the linkages to clinical work come in. We’ll discuss the relations between bioinformatics and medical informatics later in this presentation.
  • Any type of biological data that can be recorded and processed by computers is considered bioinformatics data. This can be raw data from experiments, genetic sequences and expressions, images, software systems, and basically any other data that a computer can handle. Since the development and publishing of the Human Genome Project, the field is overwhelmingly focused upon research and data collected involving genomic study. Let’s take a brief look at the results of genomic studies.
  • Three years after the Human Genome Project was completed, five loci related to disease conditions were identified. These are indicated here on this map with the colored dots.
  • Six months later, more were identified.
  • A year later the picture looked like this.
  • And, as of six months ago, the picture looked like this.
  • If the rate at which these loci were found were plotted, we would see this.
  • And, projecting this out we get this. These last set of figures were gleaned from a presentation by NIH Director Francis Collins who recently spoke at the annual meeting of the National Coalition for Health Professional Education in Genetics last September. The main point is that there’s an information explosion in genetics underway.
  • Since we’ve touched on the “bio” side of bioinformatics, lets consider the “informatics” side of things. It’s surely the case than information technology has something to do what we’ve seen. And some have even talked about the explosion being related to Moore’s Law. The name by the way comes from Gordon Moore, the co-founder of Intel. There are actual two [mors] laws, both of which are relevant to the current discussion. For the non-techies, the more widely known Moore’s Law is not really a law in the natural or legal sense. Rather it’s an observation and prediction: namely, every two years, the number of fundamental logic components that can be put onto a chip doubles. Effectively, this has meant that computing power grows exponentially over time. We can expect this to be the case until 2015 when physical limitations are likely slow things down. (We’ll likely need a new paradigm for computing, one in which silicon is not used to make the chips). The implication for genomic research is that bioinformatics tools, indeed the field itself, has become transformed through a growth in computational power. The next slide from Collin’s presentation underscores this, but says something surprising about the subject matter. Before we get to that, we would be remiss in not mentioning the other [mors] law, given that this is an informatics course. Calvin Mooers’ said that “An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it. “ He meant that people will avoid an information system because it gives them information. This is related to one or our questions for discussion. A related interpretation of what he meant is that information is more likely to be used is it is easy (and cheap) to access. This most certainly has impacted genomics and leads us back to the next slide.
  • This slide shows the cost of sequencing over the last ten years. The cost predicted by computing power alone (Moore’s Law) is in black. It appears that costs are falling even faster than that. Last year (2008) it cost about $60,000 to sequence a person’s genome. At the beginning of this year, it cost $5000. By the end of this year it may fall to $1000. This is has important implications for the biological sciences, medicine, and society. We’ll come back to this topic later.
  • Over the last 10 years, there has been an information explosion in the biological sciences. Biology has become data driven. A genome generates quite a bit of data to analyze, especially if we start to consider that it is only a starting point. Genomic information needs to be related to other type of data: discern structure of data relate to transciptomics, proteomics relate to structure, physiology relate to disease relate to variation Bioinformatics tools help manage these data. They also are a part of the generation of these data. A major outcome of the Human Genome Project was the development of automated systems for doing laboratory work, which was previously done manually. In addition, systems that take advantage of artificial intelligence have helped to automate what procedures or even experiments should be done. Systems that utilize machine learning have helped in the discovery process. The explosion in information will continue. Making this into knowledge that advances understanding in biology and medicine – that is making connections to other knowledge domains will be a major task for both bio- and clinical- informaticists.
  • Earlier we mentioned that a discussion of genomics doesn’t capture the full range of study supported by bioinformatics. Obtaining sequence data is just the start, and here were briefly mention a few major projects that build upon work done in the Human Genome project. From the website: Project goals were to identify all the approximately 20,000-25,000 genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, improve tools for data analysis, transfer related technologies to the private sector, and address the ethical, legal, and social issues (ELSI) that may arise from the project. In his presentation to the NCHPEG Annual Meeting last September, Alan Guttmacher from the National Human Genome Research Institute made the following observations: The Human Genome project was: An international government project that ended ahead of schedule! And under budget!! And from its start earmarked funds for consideration of its ethical, legal, and social implications (ELSI) - the greatest funding ever devoted to bioethics We quoted Guttmacher in an earlier slide: The project… Produced the human genome sequence Spawned a new field: genomics Spurred new technologies And now provides us an unparalleled opportunity to apply new knowledge, technologies, and approaches to health care A centrally-coordinated “big” science project that was not hypothesis driven. The project didn’t test some theory except in any broad sense. Demonstrated the value of consideration of the societal impact of science Demonstrated the value of public release, rather than hoarding, of data International HapMap Project Overview ( from ) The elucidation of the entire human genome has made possible our current effort to develop a haplotype map of the human genome. The haplotype map, or "HapMap," is a tool that allows researchers to find genes and genetic variations that affect health and disease. The DNA sequence of any two people is 99.5 percent identical. The variations, however, may greatly affect an individual's disease risk. Sites in the DNA sequence where individuals differ at a single DNA base are called single nucleotide polymorphisms (SNPs). Sets of nearby SNPs on the same chromosome are inherited in blocks. This pattern of SNPs on a block is a haplotype. Blocks may contain a large number of SNPs, but a few SNPs are enough to uniquely identify the haplotypes in a block. The HapMap is a map of these haplotype blocks and the specific SNPs that identify the haplotypes are called tag SNPs. The HapMap is valuable by reducing the number of SNPs required to examine the entire genome for association with a phenotype from the 10 million SNPs that exist to roughly 500,000 tag SNPs. This makes genome scan approaches to finding regions with genes that affect diseases much more efficient and comprehensive, since effort is not wasted typing more SNPs than necessary and all regions of the genome can be included. In addition to its use in studying genetic associations with disease, the HapMap is a powerful resource for studying the genetic factors contributing to variation in response to environmental factors, in susceptibility to infection, and in the effectiveness of and adverse responses to drugs and vaccines. From The 1000 Genomes Project is an international research consortium formed to create the most detailed and medically useful picture to date of human genetic variation. The project involves sequencing the genomes of approximately 1200 people from around the world. Goal is to catalog human variants present at > 1% (or, within genes, > 0.5%) frequency. Will include not only SNPs, but also rearrangements, deletions, and duplications. In its production phase, will produce ~8.2 billion bases/day (> two genomes/day). Samples from HapMap and extended HapMap set: Yoruba, Japanese, Chinese (Beijing and Denver), Maasai, Toscani, Gujarata Indian, CEPH, Mexican ancestry (L.A.), African ancestry (SW U.S.) Encyclopedia of DNA Elements Project Before the best use of the information contained in the [Human Genome] sequence can be made, the identity and precise location of all of the protein-encoding and non-protein-encoding genes in the human genome will have to be determined, as will the identities and locations of other functional elements including promoters and other transcriptional regulatory sequences and determinants of chromosome structure and function, such as origins of replication. To date, much remains unknown about these functional elements in the human genome. A comprehensive encyclopedia of all of these features is needed to fully utilize the sequence to better understand human biology, to predict potential disease risks, and to stimulate the development of new therapies to prevent and treat these diseases. The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing. Among the most promising strategies are efforts to identify key genetic “targets” within cancer cells and then create therapeutics designed to zero-in on those targets. This approach for attacking cancer through its genetic vulnerabilities stems from recent advances made possible by the sequencing of the human genome. Bioinformatics is not only used to support research on human cells. Human Microbiome Project Within the body of a healthy adult, microbial cells are estimated to outnumber human cells by a factor of ten to one. These communities, however, remain largely unstudied, leaving almost entirely unknown their influence upon human development, physiology, immunity, and nutrition. To take advantage of recent technological advances and to develop new ones, the NIH Roadmap has initiated the Human Microbiome Project (HMP) with the mission of generating resources enabling comprehensive characterization of the human microbiota and analysis of its role in human health and disease. We’ll say more about the last project later.
  • Looking across all this projects, we can identify common features related to informatics. One is that the projects use data derived from “high throughput” methods. The basic idea is to take traditional laboratory methods, the bench methods, automate them using robotics, in order to conduct highly parallel research. Data are automatically processed and eventually stored in public databases that accessible via the internet. Tools for analysis are also typically available in the public domain. Research is done in different locations and coordinated via electronic communications.
  • All areas of Biomedical Informatics have challenges, but these are especially great in the area of Bioinformatics. Because Bioinformatics spans so many disciplines, the need for standardized vocabularies, interoperable data systems, security and privacy are especially important. Unique challenges have to do with the shear volume of data that is generated in bioinformatics applications, and how to store, collect, analyze and display it. The standard relational database model, for example, is not well suited to genetic sequence data, which is better handled by object oriented databases. Automated algorithms have been developed to help to analyze the data, and to more easily identify the 1% of the individual’s genome that is unique. A big challenge is to make meaning of that 1% difference, and present it in ways that can help the individual to understand its implications. Like all data, genetic information must be put into context, but it is not well suited to the printed medical record. An electronic medical record, with data visualization and modeling applications, will be especially helpful in this regard.
  • Hopefully by this point you will have some appreciation of the importance of bioinformatics in advances in biological understanding, and a sense of the informatics challenges specific to this domain. On this slide we step back to see some of the methodological commonalities, and point out the informatics related techniques and tools. The most obvious feature of bioinformatics is that is deals with all sorts of data, which we’ve described earlier. These data need to be stored in databases. The research in computer and information science is particularly relevant here. The data are not homogenous in form. There’s a lot of it. We need to store information about the data, its structure, we need to link data across many physical databases, and provide tools for easy retrieval. That’s were standards and interoperability between systems come into play. At a higher level of analysis, bioinformatics includes work that library folk are commonly associated with. Early in the course, we looked at medical coding, and vocabularies more generally. Understanding all this new information is one thing, representing it is another: work in ontology is key to this. Informatics is also important in the discovery process. Much of the work is distributed in space and time. The internet and groupware tools for communication and coordination of research is certainly an important factor in explain the pace of progress. Within a given laboratory, work is managed through software, and much of the manual work is automated (workflow software) though robotics; and some of the intellectual work is done through sophisticated algorithms, or even artificial intelligence. Where machine learning ends, humans, aids by visualization and other modeling tools, take over. Content on discovery adapted from a slide from Mark Gerstein, Yale University,
  • Chicurel (2002): The working biologist now has an enormous number of options when it comes to bioinformatics tools. On one hand, there is a lot of free high-quality software in the public domain. On the other, researchers can buy commercial products offering added features, such as programs to streamline sequential tasks, to access proprietary databases and to enhance data security. And because software producers realize that users’ needs change and their products will rarely be used in isolation, flexibility and modularity are on the rise. An important trend has been the increasing integration and sophistication of tools available to non-experts. A wide range of user-friendly packages incorporating tools for nucleotide and protein sequence analysis are available from companies such as MiraiBio, a Hitachi Software Engineering subsidiary based in Alameda, California; DNASTAR in Madison,Wisconsin; InforMax in Bethesda,Maryland; and Accelrys in San Diego, California. On the non-commercial side, the Biology WorkBench maintained by the Supercomputer Center at the University of California, San Diego, is particularly popular, offering more than 80 bioinformatics tools to more than 10,000 registered users. “It’s a one-stop-shop for doing a lot of things,” says lead developer Shankar Subramaniam. “You can be sitting in front of any type of computer; as long as you have a web browser, you can access it.” Software has also become more user friendly. Back in the early 1990s, users of the GCG Wisconsin package, the grandfather of molecular-biology packages (now sold by Accelrys), had to work with UNIX-based systems. Although these systems are still preferred by some, users can now point and click their way through a wide range of tasks on ordinary desktop computers. Another trend is the increased integration of data analysis with experimental design. The needs of bench scientists don’t always coincide with those of professional bioinformaticians producing tools for whole-genome analyses. Genome projects require programs that can efficiently, if not very accurately, process huge amounts of sequence data, but the biologist in the lab is often interested in studying small sets of genes and their products with very high precision. Last month, for example, InforMax released GenomBench, a tool that allows users to predict the structure of genes and their splice variants, progressively refine these predictions, and then design experiments to validate them. “It’s an interactive tool that can work with researchers not just to analyse the data they have, but to design the right experiment to resolve ambiguities in the data,” says Steve Lincoln, senior vice-president of life-science informatics at the company. Others are hooking up their software to catalogues of reagents. As just one example, the genome browser run by the University of California, Santa Cruz, is being used in a collaboration with the National Cancer Institute in Bethesda,Maryland, to identify new genes to expand, and ultimately complete, the Mammalian Gene Collection — a set of cDNA clones of expressed genes for human and mouse. The browser will be linked to the collection’s website, so that users can go straight from analysing an electronic representation of a gene to ordering a clone. A key trend in the development of commercial products is the emergence of workflows, automated chains of operations that can dramatically increase analysis throughput. For example, software producer geneticXchange of Menlo Park, California, recently demonstrated a workflow that sorts gene-expression data generated by microarrays, looks up the accession numbers that identify the selected genes, collects sequence information from the US National Center for Biotechnology Information’s UniGene database, gathers annotation information from the LocusLink website, and goes to Medline to assemble a list of relevant references. “You just hit a button and it does what might take a biologist 600 hours to do, in about five hours,” says Mark Haselup, chief technical officer for the company. Some commercial products are valuable because they’re linked to otherwise unavailable proprietary data. One of the main selling points of the Celera Discovery System, for example, is the access it provides to the biotech firm’s high-quality human and mouse genome annotations. Unlike many other collections of annotations, a high proportion of Celera’s have been generated by manual Curation. Commercial products often provide greater security for those who don’t wish to manipulate their unpublished or unpatented results openly over the Internet. Although some public sites offer a degree of security, commercial packages usually have more protection options and can be operated behind a firewall. But the recurrent theme in the design of bioinformatics tools is the trend towards increased integration. The Discovery Studio Gene package recently launched by Accelrys is a case in point. “Results are put into a project database that has the ability to be accessed by a set of applications that span both chemistry and biology,” says Scott Kahn, senior vice-president of life science at Accelrys. “We set up the ability to collaborate between domains.” M.C. Biology WorkBench ç
  • Collins NCPEG Presentation 2009 Sept NIH, Genomics, and Health Francis Collins, M.D., Ph.D., NIH Director September 23, 2009
  • Fig.2. Above is a schematic outlining how scientists can use bioinformatics to aid rational drug discovery. MLH1 is a human gene encoding a mismatch repair protein ( mmr ) situated on the short arm of chromosome 3. Through linkage analysis and its similarity to mmr genes in mice, the gene has been implicated in nonpolyposis colorectal cancer. Given the nucleotide sequence, the probable amino acid sequence of the encoded protein can be determined using translation software. Sequence search techniques can be used to find homologues in model organisms, and based on sequence similarity, it is possible to model the structure of the human protein on experimentally characterised structures. Finally, docking algorithms could design molecules that could bind the model structure, leading the way for biochemical assays to test their biological activity on the actual protein.
  • Clinically relevant information is growing very quickly In fields like oncology, patients are becoming more involved in the research activities Providing knowledge support, and facilitating professional development in genomics is an obvious role for informatics Need a standardize language of genomics in clinical work Biomedical Ontologies ( EMRs/PHRs will need to include this EMRs/PHRs may be a location for in silico genome EMR data informs clinical genomics research NCHPEG: National Coalition for Health Professional Education in Genetics Example problem: oncology, lab data for certain genetic test, performed by contracted facilities, reports scanned into EMR and exist as PDF, which makes it somewhat inaccessible to decision support. KRAS– gene mutation that predicts aggressiveness of colon cancer. If no mutation is detected, then the patient is eligible for a specific therapy (Cetuximab). The nomenclature for the KRAS test result is not yet standardized and this has led to some misinterpretation of the assay.
  • It used to be the case that cancer treatment was driven by features of how the cancer appeared: location, size, and spread. Now, in is the normal practice to include protein and genetic assays describe the tumor on a molecular level. So a case of breast cancer is currently described by an alphabet soup of terms (HER2 status, PR-progesterone status, ER-estrogen status, ….)—all different factors that (1) were identified in the last 10 years through research enabled by bioinformatics, (2) can predict mortality …. ONCOTYPE-DX and Mammaprint....these are genetic assay PANELS that incorporate multiple genetic tests to give "recurrence score" - ie, if you have a low RS then low chance of recurrence, etc.  They've taken it a step further to say low recurrence (ie <18) low benefit of chemotherapy and should be on hormonal tx only.  High is the opposite. Example of a report HER2 stands for H uman E pidermal growth factor R eceptor 2. Each normal breast cell contains copies of the HER2 gene, which helps normal cells grow. The HER2 gene is found in the DNA of a cell, and this gene contains the information for making the HER2 protein. The HER2 protein, also called the HER2 receptor, is found on the surface of some normal cells in the body. In normal cells, HER2 proteins help send growth signals from outside the cell to the inside of the cell. These signals tell the cell to grow and divide. In HER2+ breast cancer, the cancer cells have an abnormally high number of HER2 genes per cell. When this happens, too much HER2 protein appears on the surface of these cancer cells. This is called HER2 protein overexpression. Too much HER2 protein is thought to cause cancer cells to grow and divide more quickly. This is why HER2+ breast cancer is considered aggressive.
  • Medical Informatics resources used to be developed using only logical and statistical methods but are now based on knowledge-based methods. The large databases developed have led to huge shifts in how medicine is practiced and medical research is done. Sometimes physicians forget about how complex the systems are that deliver so much data and information transfer. Bioinformatics has been getting a bad rap, too, as “professionals outside the field are cited as considering Bioinformatics research to be easy and cheap, yielding free software, and producing rapid publication of easily verified predictions. The software systems are very complex and while hard to evaluate on their own, in the context of biological inquiry become a huge benefit to solving problems and analyzing data.
  • Bioinformatics and Medical Informatics differ in three significant ways. Regarding the actual domain of expertise, Bioinformatics is focused upon biology while Medical informatics is solely based upon medicine. Another difference is in the application of their work; Bioinformatics data is targeted toward fellow bio scientists while Medical informatics data is geared toward use by healthcare professionals. The overall purpose of bioinformatics is to educate other scientists and further the research field of molecular biology; the educational emphasis of Medical Informatics is to help healthcare professionals improve their patient care and medical delivery systems. The scientific world of Bioinformatics yields data that is concrete and may be replicated in similar experiments. The data of Medical Informatics tends to be difficult to replicate due to the subjective nature of clinical observations and the individual variability of the patients. Because of their domains and the methods used, the types of data produced are quite different.
  • While having different scientific bases and methodologies, the two fields complement each other and will push each other. While bioinformatics makes scientific discoveries, the medical informatics field will capitalize on the new data and develop systems for improved patient care. In the same way, the medical informatics will identify problems in patient care and thus, bioinformatics will be given a new biological problem in which to experiment and develop new data sources. We see this in action on the next slide.
  • This slide is from Guttmacher’s presentation to NCHPEG. We mention eMERGE here since it is a project that looks to build knowledge through linkages between genomic data and electronic medical records. It portends the evolving relationship between bioinformatics and medical informatics.
  • While the eMERGE project looks at real data, Lussier & Saker took existing knowledge bases in the Clinical and Genetic/Genomic domains, and created a mediating knowledge base. They then conducted a gene-traits expression study in order to test the feasibility of using such a system for genomic discovery research, using standard bioinformatics tools. Although more research is need to test the significance of findings using such a system, it was a positive proof of concept, and at the very least (for our purposes), their work neatly outlines the kinds of information linkages that need to occur between EMR and genomic systems. An Integrative Model for In-Silico Clinical-Genomics Discovery Science Yves A. Lussier§, M.D., Indra Neil Sarkar, B.Sc., Michael Cantor, M.D. AMIA 2002 Annual Symposium Proceedings pp 469-473
  • Thus far we’ve looked at the biological data, and have seen that tools are used to generate, store, manage, and analyze these data. We also have considered that we need to manage the communications among researchers, and their findings. This slide shows that managing information in clinical applications extends further out to even the patients themselves.
  • In the future a person’s in silico genome may exist in the “cloud” or on a device like an iPhone or USB memory stick. From: At the Consumer Genetics Show in Boston, MA, Illumina President and CEO Jay Flatley reveals an iPhone with a conceptualized application called MyGenome. Flatley said a developer at Illumina put the application together in just 10 days. The MyGenome concept application includes device pairing functionality so users can connect to nearby devices via a short-range wireless technology like Bluetooth. The user is connecting to their doctor’s iPhone in this slide to transfer genomic data for their doctor’s perusal and records.
  • With the mention of Web 3.0, we complete the 2 nd of 3 parts to this presentation. Now we turn to the implications of bioinformatics and related research. We find ourselves in a situation where the notion of a $1000 personal genome is possible, yet at the same time where we might not need sequencing of each of the 3 billion base pairs, where perhaps only single nucleotide polymorphism (SNP) and haplotype maps that associate with genes of importance is necessary, Where advances in functional genomics may be of use to design drugs, predict, and treat disease Where it appears the using genetic information will become cheaper and more common This gives rise to many questions (how these are answered may required new laws and technologies). Robertson, J. A. 2003. The $1000 Genome: Ethical and Legal Issues in Whole Genome Sequencing of Individuals. The American Journal of Bioethics 3(3):InFocus. ajob 1 The $1000 Genome: Ethical and Legal Issues in Whole Genome Sequencing of Individuals John A. Robertson The University of Texas School of Law Abstract Progress in gene sequencing could make rapid whole genome sequencing of individuals affordable to millions of persons and useful for many purposes in a future era of genomic medicine. Using the idea of $1000 genome as a focus, this article reviews the main technical, ethical, and legal issues that must be resolved to make mass genotyping of individuals cost-effective and ethically acceptable. It presents the case for individual ownership of a person’s genome and its information, and shows the implications of that position for rights to informed consent and privacy over sequencing, testing, and disclosing genomic information about identifiable individuals. Legal recognition of a person’s right to control his or her genome and the information that it contains is essential for further progress in applying genomic discoveries to human lives. See presentation by Robinson Bradshan & Hinson
  • Though there are well-studied principles for examining these issues, the questions are pretty new. In 1865 when Gregor Mendel was experimenting with the pea plants in monastery courtyard who could have realized that his principles of hereditary units made sense and clearly are physically manifested in the structure of DNA. All these discoveries and projects have advanced the progress and debates in the bioinformatics field.
  • In 1990, at the start of the Human Genome Project, another research body called the Ethical, Legal and Social Implications (ELSI) Research Program was founded. It’s purpose was to research the issues that could arise from the genetic information of humans being known. The concern was and still is how that information can be used in favor or against a person. Genetic testing results would be known to insurance companies as well as employers. No human being should be denied access due to a risk of their coming down with a terminal or long-term condition or illness. Discriminating based on illness or diagnosis has happened in the past in the this country. In the 1970’s it was sickle cell anemia, now in 21 st century some countries do not permit visitors if the person is either HIV positive or has AIDS. Although how a foreign government would know that status without being revealed by the passenger themselves seems far-fetched.
  • In 2008 President Bush signed into law the Genetic Information Non-Discrimination Act (GINA). It was created to protect the rights of individuals concerned that their genetic information or details about their genes could be considered grounds for termination of employment or that risk would be viewed by insurance companies as “a pre-existing condition” and thus disqualify them from coverage or have their premiums be much more costly for their employers or themselves. This bill took 13 years to push through and it took effort to craft the bill. However, it does not cover other discriminatory issues such as long term care or already existing conditions that are based on genetics. In the past, insurance companies used to use family history in coverage decisions. Now family history is available. This frees up physicians to practice good medicine. With a full history available, or even in the medical record, the patient and physician can work together. There are penalties for both insurance companies or employers who are found to violate this law. In the case of employers, the employee address to complaint to the EEOC (Equal Employment Opportunity Commission ) and for employers there are monetary penalties. To read the entire law please refer to this URL:
  • As you can see, even in 2009, the laws and provisions regarding the guarding and improper use or potential misuse of genetic information are still evolving. The University of Akron just this past week rescinded on its plan to require potential employees including faculty to be tested in order to check criminal records. In addition, as recently as last month the Federal Register published new Interim rules created for GINA: A copy of the rule from the Federal Register is available at
  • To better illustrate some of the issues at hand with genetic testing and profiling we can take a few cases from the ELSI Research Program website which has been cited at the end of the presentation These hypothetical cases brought up by the ELSI Project illustrate that there are implications of genetic information availability into areas of bioethics, as well as law. As these advances continue, likely a pace, potentially faster than Moore’s Law, standards of ethical behavior must be considered. A few of the hypothetical cases listed on the website have already occurred. The case of a sick young child requiring an exact match and the parents deciding to have a second child to attempt to have him or her be the match is one example.
  • In this last section, we described and alluded to the ethical, legal, and social issues raised by a growing body of genetic information. We’ve raised some other related questions for discussion on the board. But, please feel free to discuss those mentioned in this presentation as well. In any event, we hope that you have seen that aside from these difficult questions, bioinformatics data will likely pose challenges in clinical work, and systems intended to support it. This is one important reason to study it in the context of a health informatics course. Another is that that is gives an opportunity to recapitulate or review some of the informatics ideas and methods discussed previously in other contexts. These include technologies and methods like databases, and search algorithms, automation, artificial intelligence and machine learning, workflow coordination, groupware, and ontologies. We also explicitly considered the relationship of bioinformatics to other biomedical informatics fields. There are many definitions of bioinformatics, but they all share the concept that it is an interdisciplinary endeavor that hopes to advance biological understanding through computational means. The work of biology has clearly been transformed by bioinformatics, and the unique biological understanding afforded by bioinformatics is clearly impacting the work of medicine.
  • All areas of Biomedical Informatics have challenges, but these are especially great in the area of Bioinformatics. Because Bioinformatics spans so many disciplines, the need for standardized vocabularies, interoperable data systems, security and privacy are especially important. Unique challenges have to do with the shear volume of data that is generated in bioinformatics applications, and how to store, collect, analyze and display it. The standard relational database model is not well suited to genetic sequence data, which is better handled by object oriented databases. Automated algorithms have been developed to help to analyze the data, and to more easily identify the 1% of the individual’s genome that is unique. A big challenge is to make meaning of that 1% difference, and present it in ways that can help the individual to understand its implications. Like all data, genetic information must be put into context, but it is not well suited to the printed medical record. An electronic medical record, with data visualization and modeling applications, will be especially helpful in this regard.
  • The state of perception of risk is an important aspect.   Developed countries are more conscious of anthrax, botulism, pneumonic plague, tularaemia and smallpox.   A lot of information has been pouring into websites on these diseases, their potential risks and the precautions to be taken, in the unfortunate event.    However, there is not much concern about the diarrhoeal diseases caused by several viruses and bacteria, which can be used as bioweapons.   Poverty, ignorance, high population density and low levels of hygiene in the developing countries, coupled with incompetence and apathy on the part of public and medical authorities, make such diseases as cholera, pneumonic plague, tularemia, smallpox, hemorrhagic viral infections and other contagious diseases, effective weapons in the arsenal of a bioterrorist targeting the developing countries.     Extensive economic damage can be inflicted through the introduction of animal and plant diseases or pests into the livestock and crops.    The degree of damage an organism or a toxin can inflict varies widely and this also depends upon the socio-economic status of the country or sections of her vulnerable population.    Although anthrax is much in the news currently, it is a poor biological weapon and can cause only a localised damage, as it is primarily not a prevalent human disease and it is not contagious.   Diseases that spread through food and water (or by human contact) cause a far greater damage among the poorer sections of the population of any country.
  • Monitoring changes in individual and population health is likely the only way to detect an intentional disease outbreak. The vision of the Centers for Disease Control and Prevention is to have a system (National Electronic Disease Surveillance System, that can transfer appropriate public health, laboratory, and clinical data efficiently and securely over the Internet. NEDSS will revolutionize public health by gathering and analyzing information quickly and accurately. This will help to improve the nation's ability to identify and track emerging infectious diseases and potential bioterrorism attacks as well as to investigate outbreaks and monitor disease trends. As we read in the “Roundtable on Bioterrorism Detection,” multiple models of data collection and management are being tested throughout the United States; all seek to detect early outbreaks, maintain security, and confidentiality while collecting data to detect similarities in symptom development, geographic location, etc. If NEDSS is to facilitate transfer of data between health organizations and governmental health departments, the criteria for data collections will need to be uniform; current systems in place have their own means of collecting data and when compiled, the collected information may not be truly representative of the outbreak. Developing a system that allows various software systems to communicate with each other or via the central NEDSS will allow bioinformatics to disseminate the data and determine if biological terrorism is in action. By having science and technology work together, systems will continue to evolve that will make society safer from bioterrorism.
  • As we’ve read, protecting society requires health surveillance systems that can document the data, synthesize, and share information with other networks. To have success, the CDC evaluates systems and looks at how they index symptom definition, frequency, severity, disparity, locality, preventability, clinical course of action, and overall public concern, how they maintain confidentiality and security, how information is shared with other systems, and how the system is managed via personnel and funding. The slide lists additional concerns in the evaluation of surveillance systems.
  • This study reported in the CDC’s Emerging Infectious Diseases in January 2004, identified 55 detection systems and 23 diagnostic decision support systems via literature searches of five databases and internet websites. Of these systems, only 35 systems have been evaluated. If the CDC and health agencies are trying to identify the best systems for safeguarding the population, proper evaluation is crucial.
  • The research found the following needs to be evaluated in order to identify successful systems. First of all, both sensitivity and specificity (or likelihood ratios) must be measured relative to an appropriate reference standard. Because sensitivity and specificity are jointly determined by the choice of threshold for a positive (or abnormal) test, either sensitivity or specificity can be made arbitrarily high at the expense of the other. Secondly, the reference standard should be applied to ALL samples, whether negative or positive so as to avoid test referral bias which can produce over or under estimates of sensitivity and specificity. Thirdly, test-interpretation bias may occur if the result of the detection system is not determined while blinded to the reference test (and vice versa). This bias causes an artificial concordance between the detection system and reference test, which results in overestimates of both sensitivity and specificity. Fourthly, to make the testing valid, the sample population used needs to resemble the population for which the surveillance system was designed. Finally, if a system is being evaluated, it should be done so using the most realistic scenario possible; testing for real-time data is difficult when the conditions can rage from hoaxes with no cases to real life situations in which a number of cases are reported. As bioinformatics work to design systems that can collect, analyze, disseminate, manage, and share data so as to identify bioterrorism at work, the systems need to be evaluated with concrete data that isn’t going to be skewed by various means of collection and transmission. Evaluating the systems currently in place in the United States and working toward improved diagnostic decision support systems is crucial to protecting society from biological agents.
  • Bioinformatics

    1. 1. A Hitchhikers Guide to Bioinformatics Drexel University INFO648-900-200915 A Presentation of Health Informatics Group 5 Cecilia Vernes Joel Abueg Kadodjomon Yeo Sharon McDowell Hall Terrence Hughes SlideShare.Net: Click on the Notes tab below to see a transcript of the presentation
    2. 2. Goals of this Presentation <ul><li>Provide some definitions </li></ul><ul><li>Answer the question: Why study it? </li></ul><ul><ul><li>What has been accomplished it? </li></ul></ul><ul><ul><li>What challenges exist? </li></ul></ul><ul><li>Identify what role it plays (how) </li></ul><ul><li>Relate it to topics from previous weeks </li></ul><ul><li>Raise issues and questions </li></ul>
    3. 3. Bioinformatics: Why study it? <ul><li>Methodological elements </li></ul><ul><ul><li>Tools and techniques of informatics </li></ul></ul><ul><li>Understand how it supports applications </li></ul><ul><ul><li>Non-medical applications </li></ul></ul><ul><ul><li>Genomic Medicine and the challenges posed </li></ul></ul><ul><li>Raise awareness for legal, ethical, social issues </li></ul>
    4. 4. What is Bioinformatics ? Bioinformatics is the use of computers for the acquisition, management, and analysis of biological information. It incorporates elements of molecular biology, computational biology, database computing, and the Internet… … bioinformatics is clearly a multi-disciplinary field including: computer systems management networking, database design, computer programming, molecular biology From Using Computers for Molecular Biology, Stuart M. Brown, PhD, RCR, NYU Medical Center
    5. 5. <ul><ul><li>Bioinformatics is a multifaceted discipline combining many scientific fields including computational biology, statistics, mathematics, molecular biology and genetics (Fenstermacher, 2005, p. 440). </li></ul></ul>… from Bayat (2002), p 1018.
    6. 6. Bioinformatics: Origins & Definitions <ul><li>Bioinformatics has many definitions </li></ul><ul><ul><li>… the study of how information is represented and analyzed in biological systems, starting at the molecular level … concerned with understanding how basic biological systems conspire to create molecules, organelles, living cells, organs, and entire organisms (Altman & Mooney, 2006, p. 763) </li></ul></ul><ul><ul><li>… application of tools of computation and analysis to the capture and interpretation of biological data (Bayat, 2003, p. 1018) </li></ul></ul>
    7. 7. DNA is the nature’s universal information storage medium … increasingly, biological research relies on information science
    8. 8. The Human Genome Project <ul><li>Produced the human genome sequence </li></ul><ul><li>Spawned a new field: genomics </li></ul><ul><li>Spurred new technologies </li></ul><ul><li>And now provides us an unparalleled opportunity to apply new knowledge, technologies, and approaches to health care </li></ul><ul><li>Guttmacher (2009) </li></ul>
    9. 9. Bioinformatics supports “-omics” research … from Bayat (2002), p 1020.
    10. 10. … from Bayat (2002), p 1020. … from McDaniel, Schutte, & Keller (2008), p. 220
    11. 11. Bioinformatics Data From Using Computers for Molecular Biology, Stuart M. Brown, PhD, RCR, NYU Medical Center <ul><li>Bioinformatics deals with any type of data that is of interest to biologists </li></ul><ul><ul><li>DNA and protein sequences </li></ul></ul><ul><ul><li>Gene expression (microarray) </li></ul></ul><ul><ul><li>Raw data collected from field or laboratory experiment </li></ul></ul><ul><ul><li>Images, virtual models, Software </li></ul></ul><ul><ul><li>Articles from literature and databases of citations </li></ul></ul><ul><li>Each type of data can exist in many incompatible computer formats </li></ul><ul><li>The analysis of DNA sequence data has come to dominate the field of bioinformatics, but the term can be applied to any type of biological data that can be recorded as numbers or images and handled by computers </li></ul>
    12. 18. Moore’s Law A Side Note Mooer’s Law
    13. 19. 14,000X
    14. 20. An information explosion… <ul><li>Lots of data in genome </li></ul><ul><li>More data in when we attempt to </li></ul><ul><ul><li>discern structure of data </li></ul></ul><ul><ul><li>relate to transciptomics, proteomics </li></ul></ul><ul><ul><li>relate to structure, physiology </li></ul></ul><ul><ul><li>relate to disease </li></ul></ul><ul><ul><li>relate to variation </li></ul></ul><ul><li>Automated discovery, experiments </li></ul><ul><li>Biomedical knowledge (coming) </li></ul><ul><li>Clinical knowledge (coming) </li></ul>
    15. 21. [Some] Research Projects <ul><li>The Human Genome Project -- old news, 6 years ago </li></ul><ul><li>International HapMap Project -- </li></ul><ul><li>The 1000 Genomes Project – </li></ul><ul><li>Encyclopedia of DNA Elements ( ENCODE ) Project </li></ul><ul><li>The Cancer Genome Atlas (TCGA) </li></ul><ul><li>Human Microbiome Project (HMP) – </li></ul><ul><li>The eMERGE (Electronic Medical Records and Genomics) Network </li></ul>
    16. 22. Common Features of Projects <ul><li>High throughput </li></ul><ul><li>Use of technology, in particular </li></ul><ul><ul><li>Automation (Robotics, AI) </li></ul></ul><ul><ul><li>Databases </li></ul></ul><ul><ul><li>Visualization, simulation/computational models </li></ul></ul><ul><ul><li>Groupware: Coordination and communication </li></ul></ul><ul><li>Public domain tools </li></ul><ul><li>Open sharing of data </li></ul>
    17. 23. Some Challenges <ul><li>Volume of data is staggering </li></ul><ul><ul><li>How to store and collect sequence information? </li></ul></ul><ul><ul><li>RDBMSs don’t handle sequence data well </li></ul></ul><ul><ul><li>Better handled by Object Oriented DBM </li></ul></ul><ul><li>How to analyze and display the data </li></ul><ul><ul><li>Automated algorithms </li></ul></ul><ul><ul><li>Contextual visualization methods </li></ul></ul><ul><ul><ul><li>Clusters, profiles, etc </li></ul></ul></ul><ul><li>Sequence data is meaningless without context </li></ul><ul><ul><li>Not well suited to printed medical record </li></ul></ul>
    18. 24. General Informatics Techniques/Tools in Bioinformatics <ul><li>Discovery and Analyses </li></ul><ul><ul><li>Text String Comparison </li></ul></ul><ul><ul><ul><li>Text search </li></ul></ul></ul><ul><ul><ul><li>Statistical analysis </li></ul></ul></ul><ul><ul><li>Finding Patterns </li></ul></ul><ul><ul><ul><li>AI / Machine Learning </li></ul></ul></ul><ul><ul><ul><li>Clustering </li></ul></ul></ul><ul><ul><ul><li>Data mining </li></ul></ul></ul><ul><ul><li>Geometric </li></ul></ul><ul><ul><ul><li>Robotics </li></ul></ul></ul><ul><ul><ul><li>Graphics (Surfaces, Volumes) </li></ul></ul></ul><ul><ul><ul><li>Comparison and 3D Matching (Vision, Recognition) </li></ul></ul></ul><ul><ul><li>Physical Simulation </li></ul></ul><ul><ul><ul><li>Newtonian Mechanics </li></ul></ul></ul><ul><ul><ul><li>Electrostatics </li></ul></ul></ul><ul><ul><ul><li>Numerical Algorithms </li></ul></ul></ul><ul><ul><ul><li>Simulation </li></ul></ul></ul><ul><li>Storage </li></ul><ul><ul><li>Databases </li></ul></ul><ul><ul><ul><li>Building, Querying </li></ul></ul></ul><ul><ul><ul><li>Complex data </li></ul></ul></ul><ul><ul><ul><li>Annotations </li></ul></ul></ul><ul><ul><ul><li>Citations </li></ul></ul></ul><ul><li>Standards </li></ul><ul><li>Interoperability </li></ul><ul><li>Knowledge Management </li></ul><ul><ul><li>Classification </li></ul></ul><ul><ul><li>Vocabularies </li></ul></ul><ul><ul><li>Ontologies </li></ul></ul><ul><li>Communications </li></ul><ul><li>Process Workflow </li></ul>
    19. 25. Bioinformatics: Tools <ul><li>Annotation </li></ul>… from Chicurel (2002), p 753-754. <ul><ul><li>user friendly, in the public domain, and increasingly integrated </li></ul></ul><ul><ul><li>commercial tools streamline tasks, access proprietary databases </li></ul></ul><ul><li>Visualization </li></ul>
    20. 26. Bioinformatics: Why study it? <ul><li>Methodological elements </li></ul><ul><ul><li>… [Clinical and bioinformatics] share significant methodological elements, so an understanding of the issues in bioinformatics can be valuable for the student of clinical informatics (Altman & Mooney, 2006, p. 763) </li></ul></ul><ul><li>Understand how it supports applications </li></ul><ul><ul><li>Non-medical applications </li></ul></ul><ul><ul><li>Genomic Medicine </li></ul></ul><ul><ul><ul><li>EMRs, PHRs will need to capture genetic data </li></ul></ul></ul><ul><ul><ul><li>Clinical research that links genomic and clinical KBs </li></ul></ul></ul><ul><ul><ul><li>DTC/Consumer informatics: Personalized testing/diagnostics </li></ul></ul></ul>
    21. 27. Bioinformatics are essential to many non-medical applications Agriculture Crops resistant to drought and insects More nutritious products Bio pesticides <ul><li>Risk assessment & mitigation </li></ul><ul><li>Waste cleanup </li></ul><ul><ul><li>Explore the properties of the bacteria Deinococcus radiodurans for clean up of hazardous waste sites </li></ul></ul><ul><li>Reduce likelihood of heritable mutations </li></ul><ul><li>Energy and environment </li></ul><ul><li>New energy sources: bio fuels </li></ul><ul><li>Pollutant detection </li></ul><ul><li>Climate change </li></ul><ul><ul><li>Carbon sequestration </li></ul></ul><ul><li>Forensic science </li></ul><ul><li>DNA </li></ul><ul><ul><li>Detect criminals by analysis of DNA in crime scenes </li></ul></ul><ul><li>Mass spectrometry </li></ul>
    22. 28. Bioinformatics & Molecular Medicine <ul><li>Early detection of genetic predispositions to diseases </li></ul><ul><li>Improved diagnosis of disease </li></ul><ul><li>Pharmacogenomics </li></ul><ul><ul><li>Customized drugs </li></ul></ul><ul><ul><li>Individualized drugs selection </li></ul></ul><ul><ul><li>Better methods for determining drug doses for individuals </li></ul></ul><ul><ul><li>Appropriate doses determination </li></ul></ul><ul><li>Gene therapy and control systems for drugs </li></ul>
    23. 30. <ul><li>Recent Example: Anticoagulant Dosing </li></ul><ul><ul><li>Genetic variability among patients plays an important role in determining the dose of warfarin that should be used when oral anticoagulation is initiated, but practical methods of using genetic information have not been evaluated in a diverse and large population. We developed and used an algorithm for estimating the appropriate warfarin dose that is based on both clinical and genetic data from a broad population base. </li></ul></ul>
    24. 31. A Fun Fact: How Many Human Genes Do All Current Drugs Target? <ul><li>~500 (2.5% of the genome) </li></ul><ul><li>~1,000 (10%) </li></ul><ul><li>~5,000 (25%) </li></ul><ul><li>~10,000 (50%) </li></ul><ul><li>~ 15,000 (75%) </li></ul><ul><li>~20,000 (100%) </li></ul>
    25. 32. A Fun Fact: How Many Human Genes Do All Current Drugs Target? <ul><li>~500 (2.5% of the genome) </li></ul><ul><li>~1,000 (10%) </li></ul><ul><li>~5,000 (25%) </li></ul><ul><li>~10,000 (50%) </li></ul><ul><li>~ 15,000 (75%) </li></ul><ul><li>~20,000 (100%) </li></ul>
    26. 33. Bioinformatics & Drug Discovery … from Luscombe, Greenbaum, Gerstein (2001), p 95.
    27. 34. Genomic medicine <ul><li>Goes beyond genetic risk factor of disease </li></ul><ul><ul><li>Family history </li></ul></ul><ul><li>Considers genetics in effectiveness of drugs </li></ul><ul><ul><li>Genomic assays </li></ul></ul>
    28. 35. Bioinformatics and clinical informatics: Genomic medicine poses several challenges <ul><li>Clinically relevant information growing very quickly </li></ul><ul><ul><li>Patients are becoming more involved in the research activities </li></ul></ul><ul><ul><li>Knowledge support, facilitating professional development in genetics is an obvious role for informatics </li></ul></ul><ul><li>EMR data informs clinical genomics research, and vice versa ( more on this later ) </li></ul><ul><li>Standardized language of genomics in clinical work </li></ul><ul><ul><li>Biomedical Ontologies ( </li></ul></ul><ul><ul><li>EMRs/PHRs will need to include this </li></ul></ul><ul><li>EMRs, PHRs may be a location for in silico genome </li></ul><ul><li>Clinical Decision Support Systems </li></ul>
    29. 36. <ul><li>Clinical Decision Support Systems </li></ul><ul><ul><li>CDSSs will need to include genetic factors, or the results of genetic testing </li></ul></ul><ul><ul><ul><li>An example: Targeted cancer therapy </li></ul></ul></ul><ul><li>The appropriateness of adjuvant therapy in breast cancer patients given presence of gene producing the HER2 protein </li></ul><ul><li> </li></ul><ul><li> </li></ul>
    30. 37. Medical Informatics vs. Bioinformatics
    31. 38. Getting Respect?? <ul><li>Medical Informatics </li></ul><ul><ul><li>“ just data sources” but are in fact the product of 30 years of research working to make medical information retrieval a fluid technological system. </li></ul></ul><ul><li>Bioinformatics </li></ul><ul><ul><li>professionals outside the field are cited as considering Bioinformatics research to be easy and cheap, yielding free software, and producing rapid publication of easily verified predictions. </li></ul></ul><ul><ul><li>In truth, Bioinformatics programs use a mixture of mathematical models and expert heuristics in complex software systems. </li></ul></ul>
    32. 39. Medical Informatics and Bioinformatics The Differences? Although Medical Informatics and Bioinformatics both exploit computers and computational tools, they differ in many ways. Arguably, these differences are due to diversity in the domain expertise of the practitioners (medicine vs. biology) and researchers involved in the application field (healthcare professionals vs. bio scientists) and the educational emphasis adopted by the independent disciplines (patient-care vs. basic-research). Training Multidisciplinary Biomedical Informatics Students: Three Years of Experience, JAMIA 2008, Mar-Apr; 15(2): 246 .
    33. 40. Bioinformatics professionals focus on scientific discovery and use exacting specifications, tools, models, and evaluation criteria. Medical Informatics professionals utilize cognitive reasoning and empirically justified decision support systems. Medical Informatics expertise in developing health care applications and the strength of Bioinformatics in biological “discovery science” complement each other well. Maojo & Kulikowski, p. 515 Vs Symbiotic Relationship
    34. 41. eMERGE <ul><li>The eMERGE ( E lectronic Me dical R ecords and Ge nomics) Network is a five-member consortium formed to develop, disseminate, and apply approaches to research that combine DNA biorepositories with electronic medical record (EMR) systems for large-scale, high-throughput genetic research. </li></ul><ul><li> </li></ul>
    35. 42. Relating Clinical and Genetic KBs <ul><ul><li>… from Lussier & Sakar (2002), p 470. </li></ul></ul>
    36. 43. Bioinformatics & Drug Discovery: Connecting patients with researchers … from Collins (2008) presentation at NCHPEG
    37. 44. Bioinformatics meets public health informatics <ul><li>Health literacy </li></ul><ul><li>Public policy concerning testing </li></ul>
    38. 45. Bioinformatics meets consumer health informatics
    39. 46. My in-silico genome: There may be an “app for that”
    40. 47. Bioinformatics in the future …and biomedical- and other informatics in the future too <ul><li> Web 3.0 is likely to have a big effect on medicine in 2008. In bioinformatics, it will become more common to process ever larger amounts of data. In fact, experts in bioinformatics already search for data from disparate systems, and they have started to build rich semantic relations into information tools for knowledge discovery. Finally, greater capacity for creating knowledge in medicine will be possible if we have the will to publish clinical data openly and transparently, and subject it to scrutiny. </li></ul><ul><li> Developing a more personalised healthcare system will be an important challenge for doctors in web 3.0. In an era of greater personalisation, treating patients’ health problems according to their genetic profiles will depend on using the latest information technologies. </li></ul><ul><li>Giustini editorial in BMJ 2007;335:1273-4 doi: 10.1136/bmj.39428.494236. </li></ul><ul><li>What is Web 3.0? (next 10 years) </li></ul><ul><li>Semantic web </li></ul><ul><li>Uses metadata </li></ul><ul><ul><li>Establish authority (wisdom of crowd vs. experts) </li></ul></ul><ul><ul><li>Ontologies, semantic systems </li></ul></ul><ul><ul><li>Knowledge discovery </li></ul></ul>
    41. 48. Ethical, legal, and social issues <ul><li>ETHICAL </li></ul><ul><li>who controls acquisition of a person’s DNA and the information it contains; </li></ul><ul><li>Consent : what uses may be made of that information; who decides how data are used; </li></ul><ul><li>Privacy, de-identification : How to do this fairly, acknowledging ownership consent, and privacy (Is genomic privacy even possible?) </li></ul><ul><li>LEGAL </li></ul><ul><li>Prevent discrimination based on genomic data (cf. Genetic Information Nondiscrimination Act (GINA) of 2008 </li></ul><ul><li>Other regulatory frameworks that may public genomics possible </li></ul><ul><li>SOCIAL </li></ul><ul><li>Expand access to ensure generalizability (address sample bias) and pursue patient (as opposed to basic research) agendas </li></ul><ul><li>Consider benefits and costs of open or public genomic models </li></ul>
    42. 49. In the beginning, there was Gregor Mendel <ul><li>1865 Gregor Mendel and his pea plants </li></ul><ul><li>1953 Watson & Crick published their article in </li></ul><ul><li>“ Nature” detailing DNA’s 3D structure </li></ul><ul><li>1984 DNA fingerprinting invented </li></ul><ul><li>2003 Human Genome Project completed </li></ul><ul><ul><ul><ul><li>Largest funding for bioethics to date </li></ul></ul></ul></ul><ul><li>2005 HapMap Project completed </li></ul><ul><li>2008 The Genetic Information </li></ul><ul><li>Non-Discrimination Act (GINA) </li></ul><ul><li> signed into law </li></ul>
    43. 50. Ethical, Legal and Social Implications (ELSI) Research Program <ul><li>Established at the same time as the Human Genome Project. </li></ul><ul><li>Researchers knew there could be major issues with the genetic information obtained </li></ul><ul><li>Still exists as part of the project and research continues to this day </li></ul><ul><li>Issues brought up during the project are now used to educate the public </li></ul>
    44. 51. Genetic Information Non-Discrimination Act (GINA) <ul><li>President Bush signed into law in 2008 </li></ul><ul><li>Protects rights of individuals from the misuse or discrimination that could come from knowledge of their risk for disease or conditions based upon genetic information </li></ul><ul><li>Uncertainty if the value of GINA is misplaced or could be abused </li></ul><ul><li>Physicians are free to practice good medicine by offering the genetic tests to patients </li></ul><ul><li>Clinical research records concerned potential study patients and how their information would or could be used </li></ul>
    45. 52. A Real Life Case <ul><li>Swabbing for a Job </li></ul><ul><ul><li>University of Akron rescinded their requirement of potential employees to provide a DNA sample </li></ul></ul><ul><ul><li>“ appears to violate a federal law that takes effect on November 21 called the Genetic Information Nondiscrimination Act , better known as GINA. It also could conflict with the Americans with Disabilities Act.” </li></ul></ul>
    46. 53. Hypothetical Cases to Consider <ul><li>Nurse immune to Ebola gives blood sample that later becomes grounds for creating a cure/vaccine against Ebola. </li></ul><ul><li>Is she entitled to royalties from the pharmaceutical company that used her blood and developed the product? </li></ul>
    47. 54. Summary of this Presentation <ul><li>Bioinformatics has many definitions </li></ul><ul><li>Its study is useful </li></ul><ul><ul><li>Methodology of informatics </li></ul></ul><ul><ul><li>Clinical connections </li></ul></ul><ul><li>Bioinformatics data poses challenges </li></ul><ul><ul><li>Technical </li></ul></ul><ul><ul><li>Ethical, legal, social </li></ul></ul>
    48. 55. Questions for Discussion <ul><li>In light what we have learned with electronic health records systems, what challenges do you see in terms of data integration in Bioinformatics? </li></ul><ul><li>If you were considering marrying someone (and did not plan on having children), would you want him or her to provide you an analysis of their genome? What if your potential future partner asked this of you? What may be the social, cultural, and genetic implications of genome information in information utilization in mate selection? </li></ul><ul><li>The Genetic Information Nondiscrimination Act (“GINA”) becomes effective November 21, 2009, and provides new protections against the improper use of genetic information. How will employee sponsored wellness programs, particularly those that require a self-reported health risk appraisal, be affected? Does GINA provide sufficient protections against all potential misuse of information?  Debate what &quot;misuse&quot; might be.  For example, if parts of your genome are critical for development of some new treatment, would it be right for it or some derived work to be patented? </li></ul><ul><li>Assume that health care reform is passed, and a basic set of benefits is mandated. “Personalized medicine” can lead to more effective treatments, but there are costs to determine what is effective for an individual. For example, the cost of some genetic assays for breast cancer are on the order of $5000.  Should these tests be a covered benefit, even it if increases overall costs--why or why not? </li></ul>
    49. 56. References <ul><li>Altman, R. B., & Mooney, S. D. (2006). Bioinformatics. In E. H. Shortliffe & J. J. Cimino (Eds.), Biomedical Informatics (pp. 763-789). New York, NY: Springer. </li></ul><ul><li>Bayat, A. (2002). Bioinformatics. British Medical Journal, 324, 1018-1022. </li></ul><ul><li>Brown, S. M. (2009). Using computers for molecular biology. NYU Medical Center Course G16.2604. Retrieved from </li></ul><ul><li>Chicurel, M. (2002). Bioinformatics: Putting it all together. Nature, 419, 751-757. </li></ul><ul><li>Collins, F. (2009, September). NIH, genomics, and health. Presentation at the NCHPEG 2009 Annual Meeting (see website for download). </li></ul><ul><li>ELSI Research Program. (Nov. 6, 2009) Retrieved November 11, 2009, from </li></ul><ul><li>Fenstermacher, D. (2005). Introduction to bioinformatics. Journal of the American Society for Information Science and Technology, 56(5 ), 440-446. </li></ul><ul><li>Giustini, D. (2007). Web 3.0 and medicine. British Medical Journal, 335 , 1273-1274. </li></ul><ul><li>Goodman, N. (2002). Biological data becomes computer literacy: new advances in bioinformatics. Current Opinion in Biotechnology, 13, 68-71. </li></ul><ul><li>Guttmacher, A. E. (2009, September). The future of human genome research and its implications for the education of health professionals. PowerPoint presentation at the NCHPEG 2009 Annual Meeting (see website for download). </li></ul><ul><li>Hudson, K.L., Holohan, M.K., & Collins, F. S. (2008). Keeping pace with the times--the Genetic Information Nondiscrimination Act of 2008. The New England Journal of Medicine. 358 (25), 2661-3. </li></ul><ul><li>Jaschik, S. (2009, Oct 29). DNA Swab for Your Job. Inside Higher Ed. Retrieved from November 11, 2009 from, </li></ul><ul><li>Lussier, Y. A., Sarkar, I. N., & Cantor, M. (2002). An integrative model for in-silico clinical-genomics discovery science. AMIA 2002 Annual Symposium Proceedings, 469-473. </li></ul><ul><li>Magio, V. (2003). Bioinformatics and medical informatics: Collaborations of the Road to Genomic Medicine. Journal of the American Medical Informatics Association, 10(6), 515-522. </li></ul><ul><li>McDaniel, A. M., Schutte, D. L., & Keller, L. O. (2008). Consumer health informatics: From genomics to Population health. Nursing Outlook, 56 , 216-223. </li></ul><ul><li>National Human Genome Research Institute. A guide to your genome. </li></ul><ul><li>Online Education Kit: Ethical, Legal and Social Implications of Genetic Knowledge. (Feb 13, 2009). Retrieved November 11, 2009 from, </li></ul><ul><li>“ Prohibiting Discrimination Based on Genetic Information; Interim Final Rules; HIPAA Administrative Simplification; Genetic Information Nondiscrimination Act; Proposed Rules” Federal Register 74:193 (October 7, 2009) p.51644-51697; Available from ; Accessed 11/13/09. </li></ul><ul><li>Ramoni, M. F. (2003). Population genetics in the post-genomic era. Presentation for HST950J Medical Computing. Boston, MA: Harvard University-MIT. </li></ul><ul><li>Robertson, J. A. (2003). The $1000 genome: Ethical and legal issues in whole genome sequencing of individuals. The American Journal of Bioethics, 3(3): Infocus. </li></ul><ul><li>U.S. Department of Health and Human Services. (October 1, 2009). New Rules Protect Patient’s Genetic Information. U.S. Department of Health and Human Services. Retrieved November 13, 2009 , from the World Wide Web: </li></ul><ul><li>Van Mulligen, E. M., Cases, M., Hettne, K.., Molero, E., Weeber, M., Robertson, K. A., Oliva, B., de la Calle, G., & Maojo., V., (2008). Training multidisciplinary biomedical informatics students: Three years of experience. Journal of the Medical Informatics Association, 15 (2), 246-254. </li></ul>
    50. 57. Useful Bioinformatics Websites (Bayat, 2003) <ul><li>National Center for Biotechnology Information (—maintains bioinformatic tools and databases </li></ul><ul><li>National Center for Genome Resources (—links scientists to bioinformatics solutions by collaborations, data, and software development </li></ul><ul><li>Genbank (—stores and archives DNA sequences from both large scale genome projects and individual laboratories </li></ul><ul><li>Unigene (—gene sequence collection containing data on map location of genes in chromosomes </li></ul><ul><li>European Bioinformatic Institute (—centre for research and services in bioinformatics; manages databases of biological data </li></ul><ul><li>Ensembl (—automatic annotation database on genomes </li></ul><ul><li>BioInform (—global bioinformatics news service </li></ul><ul><li>SWISS­PROT (—important protein database with sequence data from all organisms, which has a high level of annotation (includes function, structure, and variations) and is minimally redundant (few duplicate copies) </li></ul><ul><li>International Society for Computational Biology (—aims to advance scientific understanding of living systems through computation; has useful bioinformatic links </li></ul>Appendix B: Website List 1
    51. 58. Useful Bioinformatics Websites from “Group 5” <ul><li>US National Institutes of Health Roadmap for Research in Bioinformatics and Computational Biology ( </li></ul><ul><li>National Human Genome Research Institute ( </li></ul><ul><li>National Coalition for Health Professional Education in Genetics ( ) </li></ul><ul><li>NIH Roadmap for Research in Bioinformatics and Computational Biology ( </li></ul><ul><li>Genomics Law Report ( ) </li></ul><ul><li>Others will posted on our wiki </li></ul>Appendix B: Website List 2
    52. 59. Appendix C: Some Terms to Know <ul><li>Alleles – </li></ul><ul><li>form of the same gene with small differences in their sequence of DNA bases </li></ul><ul><li>( ) </li></ul><ul><li>Gene – </li></ul><ul><li>A hereditary unit consisting of a sequence of DNA that occupies a specific location on a chromosome and determines a particular characteristic in an organism </li></ul><ul><li>(p 943 – Chapter on Biomedical Informatics) </li></ul><ul><li>Genome – </li></ul><ul><li>… all of an organism's genetic material. </li></ul><ul><li>( ) </li></ul><ul><li>Microarray ( gene chip or a DNA chip ) </li></ul><ul><li>Microarrays consist of large numbers of molecules (often, but not always, DNA) distributed in rows in a very small space. Microarrays permit scientists to study gene expression by providing a snapshot of all the genes that are active in a cell at a particular time. </li></ul><ul><li>( ) </li></ul><ul><li>DNA , - </li></ul><ul><li>Deoxyribonucleic acid, is the hereditary material in humans and almost all other organisms. </li></ul><ul><li>( http:// ) </li></ul><ul><li>RNA – </li></ul><ul><li>Ribonucleic acid – the building block of proteins -- is a molecule similar to DNA. Unlike DNA, RNA is single-stranded. An RNA strand has a backbone made of alternating sugar (ribose) and phosphate groups. Attached to each sugar is one of four bases--adenine (A), uracil (U), cytosine (C), or guanine (G). Different types of RNA exist in the cell: messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA). More recently, some small RNAs have been found to be involved in regulating gene expression. </li></ul><ul><li>( http:// /glossary= rna ) </li></ul><ul><li>Transcription – </li></ul><ul><li>The first major step in gene expression, in which the information coded in DNA is copied into a molecule of RNA. </li></ul><ul><li>( ) </li></ul><ul><li>Translation – </li></ul><ul><li>The second major step in gene expression, in which the instructions encoded in RNA are carried out by making a protein or starting or stopping protein synthesis. </li></ul><ul><li>( ) </li></ul><ul><li>SNP _ also called “snips” – a variation of a single base pair </li></ul><ul><li>(Single Nucleotide Polymorphism) A DNA sequence variation that occurs when a single nucleotide in the genome is altered. For example, an SNP might change the nucleotide sequence AAGC C TA to AAGC T TA. A variation must occur in at least 1% of the population to be considered an SNP. </li></ul><ul><li>(p 985 -- Chapter on Biomedical Informatics) </li></ul>
    53. 60. Bioterrorism Bioterrorism employs biological weapons to inflict damage on human populations, livestock and the environment.   It is largely a matter of microbiology, principally involving the use of micro-organisms and/or their toxins.   Appendix D: Bioterrorism
    54. 61. Perception of Bioterrorism risk:  <ul><li>Developed Countries most concerned about anthrax, botulism, pneumonic plague, tularemia, and smallpox. </li></ul><ul><li>Developing countries concerned with cholera, pneumonic plague, tularemia, smallpox, hemorrhagic viral infections and other contagious diseases. </li></ul><ul><li>Economic concern that animal and plant diseases or pests may be introduced into the food chain. </li></ul><ul><li>  </li></ul>Are Our Defences Against Bioterrorism Adequate? C Kameswara Rao
    55. 62. National Electronic Disease Surveillance System (NEDSS) <ul><li>This broad initiative is designed to : </li></ul><ul><ul><li>To detect outbreaks rapidly and to monitor the health of the nation </li></ul></ul><ul><ul><li>Facilitate the electronic transfer of appropriate information from clinical information systems in the health care system to public health departments </li></ul></ul><ul><ul><li>Reduce provider burden in the provision of information </li></ul></ul><ul><ul><li>Enhance both the timeliness and quality of information provided </li></ul></ul>
    56. 63. Health Surveillance Systems <ul><li>The Centers for Disease Control & Prevention evaluates surveillance systems on the following: </li></ul><ul><ul><li>Indexing of frequency, severity, disparities, associated costs, preventability, potential clinical course and public interest. </li></ul></ul><ul><ul><li>Purpose and objectives </li></ul></ul><ul><ul><li>Planned uses of the data </li></ul></ul><ul><ul><li>Case definition/event under surveillance </li></ul></ul><ul><ul><li>Legal authority for data collection </li></ul></ul><ul><ul><li>Organizational home of system </li></ul></ul><ul><ul><li>Level of integration with other systems </li></ul></ul><ul><ul><li>Flowchart </li></ul></ul><ul><ul><li>Description (population, interval of data collection, data collected, reporting sources, data management, data analysis and dissemination, patient privacy, confidentiality, and system security, and records management) </li></ul></ul><ul><ul><li>Personnel requirements </li></ul></ul><ul><ul><li>Funding sources </li></ul></ul><ul><ul><li>Other resources </li></ul></ul>CDC table as quoted in “Roundtable on Bioterrorism Detection by W.B. Lober, et al
    57. 64. Is the Evaluation System Deficient? <ul><li>A study concerned with the evaluation methods of detection systems & diagnostic decision support systems found in 2004: </li></ul><ul><li>Of 35 evaluated systems, </li></ul><ul><li>--only 4 systems reported both sensitivity and specificity </li></ul><ul><li>--13 were evaluated against a reference standard </li></ul><ul><li>--31 systems evaluated for timeliness </li></ul><ul><li>Most evaluations of detection systems and some evaluations of diagnostic systems for bioterrorism responses are critically deficient. </li></ul><ul><li>Because false-positive and false-negative rates are unknown for most systems, decision making on the basis of these systems is seriously compromised. </li></ul>“ Evaluating Detection and Diagnostic Decision Support Systems for Bioterrorism Response” (D. Bravata , et al (100-108).
    58. 65. Evaluation Methods <ul><li>Both Sensitivity & specificity of the data must be measured relative to an appropriate reference standard. </li></ul><ul><li>Reference standard should be applied to all samples, whether positive or negative. </li></ul><ul><li>The tests should be evaluated blind to the results of the reference standard. </li></ul><ul><li>Samples or patient population needs to resemble the populations in which the system will be used . </li></ul><ul><li>detection systems should be evaluated under the most realistic conditions possible, which may be difficult for bioterrorism agents as conditions can range from hoaxes with no cases to real situations with a number of cases. </li></ul>
    59. 66. References for Bioterrorism Section Are Our Defences Against Bioterrorism Adequate? C Kameswara Rao Bioinformatics and Medical Informatics: Collaborations on the road to genomic medicine? V. Maojo, MD, C.A. Kulikowski. Journal of the American Medical Informatics Association 10(6), Nov/Dec 2003, 515-521. Roundtable on Bioterrorism Detection. W. B. Lober, MD, et al. Journal of the American Medical Informatics Association 9(2), Mar/Apr 2002, 105-115.