This document discusses the role of bioinformatics in medicine today. It begins by explaining how genomics differs from genetics in studying many genes and genomic features together rather than single genes. It then describes some of the key genomic databases that are used in bioinformatics, including primary sequence databases like GenBank, metadatabases like Entrez, genome databases like Ensembl and UCSC, and pathway and protein databases. The document provides an example of how bioinformatics is used to analyze autism data, including processing sequencing data, identifying copy number variations, mapping genes, building networks, and identifying significant clusters to understand autism better.
WHAT IS BIOINFORMATICS?
Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between:
Observations (data) in diverse biologically-related disciplines and
The derivations of understanding (information)
APPLICATIONS OF BIOINFORMATICS
Computer Aided Drug Design
Microarray Bioinformatics
Proteomics
Genomics
Biological Databases
Phylogenetics
Systems Biology
WHAT IS BIOINFORMATICS?
Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between:
Observations (data) in diverse biologically-related disciplines and
The derivations of understanding (information)
APPLICATIONS OF BIOINFORMATICS
Computer Aided Drug Design
Microarray Bioinformatics
Proteomics
Genomics
Biological Databases
Phylogenetics
Systems Biology
Bioinformatics is the branch of life science that deals with the use of mathematical, statistical and computer methods to analyze biological and biochemical data.
Types of Bioinformatics (see the slides)
An Introduction to Bioinformatics
Drexel University INFO648-900-200915
A Presentation of Health Informatics Group 5
Cecilia Vernes
Joel Abueg
Kadodjomon Yeo
Sharon McDowell Hall
Terrence Hughes
Bioinformatics is a hybrid science that links biological data with techniques for information storage, distribution, and analysis to support multiple areas of scientific research, including biomedicine.
this presentation is about bioinformatics. the contents of bioinformatics are as under:
1.Introduction to bioinformatics.
2.Why bioinformatics is necessary?
3.Goals of bioinformatics
4.Field of bioinformatics
5.Where bioinformatics help?
6.Applications of bioinformatics
7.Software and tools of bioinformatics
8.References
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
Thanks to Next Generation Sequencing (NGS), a technology that is lowering the cost and time of reading DNA, we are faced with huge amounts of biomedical data. These data are continuously collected by research laboratories, and often organized through world-wide consortia, which are releasing many public data bases. One of the main aims of bioinformatics is to solve fundamental issues in biomedicine research (e.g., how cancer occurs) starting from big genomic data and their analysis. In this talk I will give an overview of big genomic data management, integration, and mining.
Bioinformatics is the branch of life science that deals with the use of mathematical, statistical and computer methods to analyze biological and biochemical data.
Types of Bioinformatics (see the slides)
An Introduction to Bioinformatics
Drexel University INFO648-900-200915
A Presentation of Health Informatics Group 5
Cecilia Vernes
Joel Abueg
Kadodjomon Yeo
Sharon McDowell Hall
Terrence Hughes
Bioinformatics is a hybrid science that links biological data with techniques for information storage, distribution, and analysis to support multiple areas of scientific research, including biomedicine.
this presentation is about bioinformatics. the contents of bioinformatics are as under:
1.Introduction to bioinformatics.
2.Why bioinformatics is necessary?
3.Goals of bioinformatics
4.Field of bioinformatics
5.Where bioinformatics help?
6.Applications of bioinformatics
7.Software and tools of bioinformatics
8.References
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
Thanks to Next Generation Sequencing (NGS), a technology that is lowering the cost and time of reading DNA, we are faced with huge amounts of biomedical data. These data are continuously collected by research laboratories, and often organized through world-wide consortia, which are releasing many public data bases. One of the main aims of bioinformatics is to solve fundamental issues in biomedicine research (e.g., how cancer occurs) starting from big genomic data and their analysis. In this talk I will give an overview of big genomic data management, integration, and mining.
Aim1: To study the method of genome identification through ENSEMBL browser.
Aim2: To study the method of genome identification through VISTA.
Aim3: To study the method of genome identification through UCSC Genome Browser.
Aim4: To study the method of genome and amino acid sequences through UCSC Genome Browser.
Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
Talk given at the Netherlands Institute of Ecology in Wageningen, where I describe the development of the MetaboLights database and the value of data sharing in Metabolomics and molecular Biology in General
As an interdisciplinary field of science, bioinformatics combines biology, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data.
INTRODUCTION
WHAT IS DATA AND DATABASE?
WHAT IS BIOLOGICAL DATABASE?
TYPES OF BIOLOGICAL DATABASE
PRIMARY DATABASE
Nucleic acid sequence database
Protein sequence database
SECONDARY DATABASE
COMPOSITE DATABASE
TERTIARY DATABASE
WHY NEED?
CONCLUSION
REFRENCES
Used for a seminar on business opportunities in the health sector.
Official Master in Entrepreneurship and Business Management; Faculty of Economics of the University of Valencia.
http://www.genometra.com/seminar-on-business-opportunities
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...
Bioinformatics Introduction
1. Bioinformatics in medicine
today
David Montaner
dmontaner@cipf.es
Centro de Investigación Príncipe Felipe
Institute of Computational Genomics
9 May 2013
in Valencia
David Montaner Bioinformatics in medicine 1/26
2. Genomics
“Progress in science depends on new techniques, new
discoveries and new ideas, probably in that order.”
Sydney Brenner, 1980
Microarray devices and high-throughput sequencing allow us
measuring thousands or millions of genomic characteristics.
David Montaner Bioinformatics in medicine 2/26
3. Genomics vs. genetics
Genetics:
• Single genes are responsible for biological changes.
• one gene → one hypothesis → one p-value → conclusions
Genomics:
• Genes or genomic features act together to produce
biological changes.
• many genes → many hypothesis → many p-value →
→ more data analysis
• Computational support is needed even for drawing
conclusions
David Montaner Bioinformatics in medicine 3/26
4. Genomic numbers
Microarray:
• 30.000 genes
• 2 million SNPs
• 100 Mb
Measured features:
• genes, isoforms
• SNPs, Polymorphisms
• IN-DELS
• loss of heterozygosity
• methylation
• copy number alterations
NGS:
• 30.000 genes
• 30.000 transcripts
• 20 million SNPs
• 10-100 GB
Registered information:
• Genomic characteristics:
position, chromosome ...
• Biological function
• Disease association
• miRNA targets
David Montaner Bioinformatics in medicine 4/26
5. Genomic databases
Nucleic Acid Research lists +1500 online databases!
http://www.oxfordjournals.org/nar/database/c
• Many different databases for each category, which should I
use?
• No standards: different IDs, methods, servers, formats, ...
• Lack of international initiatives, many local and small
databases
• Different gene IDs, more than 50
• In vivo vs in silico databases
David Montaner Bioinformatics in medicine 5/26
6. Biological databases (Wikipedia)
1 Primary nucleotide
sequence databases
2 Metadatabases
3 Genome databases
4 Protein sequence
databases
5 Proteomics databases
6 Protein structure
databases
7 Protein model databases
8 RNA databases
9 Carbohydrate structure
databases
10 Protein-protein interactions
11 Signal transduction
pathway databases
12 Metabolic pathway
databases
13 Experimental data
repositories (Microarrays
NGS, Sanger)
14 Exosomal databases
15 Mathematical model
databases
16 PCR / real time PCR
primer databases
17 Specialized databases
18 Taxonomic databases
19 Wiki-style databasesDavid Montaner Bioinformatics in medicine 6/26
7. Primary nucleotide sequence
databases
Contain any kind of nucleotide sequences, form genes to
genomes.
The International Nucleotide Sequence Database (INSD)
Collaboration:
• GenBank
National Center for Biotechnology Information (NCBI)
• European Nucleotide Archive (ENA)
European Bioinformatics Institute (EBI)
• DNA Data Bank of Japan (DDBJ)
David Montaner Bioinformatics in medicine 7/26
8. GenBank
Primary nucleotide sequence databases
• available on the NCBI ftp site:
http://www.ncbi.nlm.nih.gov/Ftp/
• A new release is made every two months.
• 3 types of entries:
• CoreNucleotide (the main collection)
• dbEST (Expressed Sequence Tags)
• dbGSS (Genome Survey Sequences)
Access:
• Search for sequence identifiers using Entrez Nucleotide:
http://www.ncbi.nlm.nih.gov/nucleotide/
• Align GenBank sequences to a query sequence using
BLAST (Basic Local Alignment Search Tool).
http://blast.ncbi.nlm.nih.gov/Blast.cgi
• Several other e-utilities (see book)
See an example of a GenBank record.
David Montaner Bioinformatics in medicine 8/26
9. Metadatabases
• Collect and organize data from primary nucleotide
sequence databases and may other resources.
• Make the information available in a convenient format and
provide data handling resources: web pages, application
programming interface (API) …
• Focus on particular species, diseases …
Examples
• Entrez: searches through almost all NCBI resources.
http://www.ncbi.nlm.nih.gov/sites/gquery
• GeneCards: provides genomic, proteomic, transcriptomic,
genetic and functional information for human genes (known
and predicted)
http://www.genecards.org/
David Montaner Bioinformatics in medicine 9/26
10. Entrez
Metadatabases
• Searches through almost all NCBI resources.
• Entrez search page: http://www.ncbi.nlm.nih.gov/sites/gquery
• queries can be saved if you have a a MyNCBI account
http://www.ncbi.nlm.nih.gov/
David Montaner Bioinformatics in medicine 10/26
11. Genome databases
Collect genome sequences and annotation (specification about
genes) for particular organisms, and try to improve them:
• Data curation.
• Complete missing information using insilico methods.
• Generate new relational organization.
• Complement feature IDs.
• Provide “easy” access, visualization …
Examples
• Ensembl: automatic annotation on selected eukaryote
genomes.
• UCSC Genome Browser: reference sequence and working
draft assemblies for a large collection of genomes
• Wormbase: genome of the model organism C.elegans.
David Montaner Bioinformatics in medicine 11/26
12. Ensembl
Genome databases
• Ensembl is a joint project between European Bioinformatics
Institute (EBI) the European Molecular Biology Laboratory
(EMBL) and the Wellcome Trust Sanger Institute.
• Develop a software system which produces and maintains
automatic annotation on selected vertebrate and
eukaryote genomes.
• http://www.ensembl.org
David Montaner Bioinformatics in medicine 12/26
13. UCSC Genome Browser
Genome databases
• UCSC: University of California, Santa Cruz.
• This site contains the reference sequence and working
draft assemblies for a large collection of genomes.
• http://genome.ucsc.edu/
David Montaner Bioinformatics in medicine 13/26
14. Protein sequence databases
• Most times proteins are the final unit of interest to research.
• There is a direct conversion from DNA/RNA sequences to
protein sequences.
• Gene IDs and protein IDs are equivalently used by
researchers (biologists not bioinformaticians …)
Examples
• UniProt: Universal Protein Resource (EBI)
• Swiss-Prot (Swiss Institute of Bioinformatics)
• InterPro Classifies proteins into families and predicts the
presence of domains and sites.
• Pfam Protein families database of alignments and HMMs
(Sanger Institute)
David Montaner Bioinformatics in medicine 14/26
15. RNA databases
• Contain information about RNA molecules.
• Most of them regarding gene regulatory factors. (Gene
information is usually in other repositories).
Examples
• mirBase: microRNAs
http://www.mirbase.org/
• TRANSFAC: transcription factors in eukaryote (Proprietary
database).
• JASPAR: transcription factor binding sites for eukaryote
(Open access, curated, non-redundant).
http://jaspar.genereg.net/
David Montaner Bioinformatics in medicine 15/26
16. Protein-protein interactions
• Proteins are the main functional units.
• But they do not work in isolation.
• Pretty useless at the moment but promising in the future …
• some information is experimental, but most of it is
generated insilico.
Examples
• IntAct: protein–small molecule
and protein–nucleic acid
interactions.
• BIND: Biomolecular Interaction
Network Database.
David Montaner Bioinformatics in medicine 16/26
17. Signal transduction pathway
databases
& Metabolic pathway databases
• Information about how genes (or proteins) interact among
them.
• not only physical interactions …
Examples
• Reactome: free online database of biological pathways.
http://www.reactome.org
• KEGG: Kyoto Encyclopedia of Genes and Genomes.
Metabolic pathways.
http://www.genome.jp/kegg/pathway.html
David Montaner Bioinformatics in medicine 17/26
19. Experimental data repositories
Contain Microarray, NGS, Sanger, and other experimental high
throughput data.
• GEO: Gene Expression Omnibus (NCBI)
http://www.ncbi.nlm.nih.gov/geo/
• ArrayExpress: database of functional genomics
experiments including (EBI)
http://www.ebi.ac.uk/arrayexpress/
• The Cancer Genome Atlas (TCGA): Data on different
cancer related tissues.
http://cancergenome.nih.gov/
David Montaner Bioinformatics in medicine 19/26
20. Bioinformatics
Training
• Biology 1/3
• Statistics 1/3
• Computer science 1/3 ←−
Efficiently combine:
• Experimental information
• Database registered knowledge
Time and resources:
• As in the wet lab
David Montaner Bioinformatics in medicine 20/26
22. Example I
Autistic children
1 (microarray) NGS data processing
• data quality control, filtering...
• map against reference genome
• CNV calling
2 CNV filtering
• just 75 rare de novo CNV events (not registered in
databases)
• filter out the long ones
• keep the ones that contain genes
David Montaner Bioinformatics in medicine 22/26
23. Example II
3 move to the gene level
• 47 loci in total affecting 433 human genes
4 Building the background likelihood network
• GO annotations
• KEGG pathways
• InterPro domains
• protein-proteins interactions. Databases: BIND, BioGRID,
DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS
• sequence homology between the gene pair (BLAST)
David Montaner Bioinformatics in medicine 23/26
24. Example III
5 Search for high scoring clusters affected by CNVs
6 Evaluating significance of cluster scores:
10.000 simulations
David Montaner Bioinformatics in medicine 24/26
25. Example IV
7 Functional characterization of the identified network
8 And, finally, draw conclusions
David Montaner Bioinformatics in medicine 25/26