Bioinformatics Introduction

Uploaded on

Introduction to Bioinformatics. Presented on May 9, 2013 at the Hospital La Fe in Valencia.

Introduction to Bioinformatics. Presented on May 9, 2013 at the Hospital La Fe in Valencia.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Bioinformatics in medicinetodayDavid Montanerdmontaner@cipf.esCentro de Investigación Príncipe FelipeInstitute of Computational Genomics9 May 2013in ValenciaDavid Montaner Bioinformatics in medicine 1/26
  • 2. Genomics“Progress in science depends on new techniques, newdiscoveries and new ideas, probably in that order.”Sydney Brenner, 1980Microarray devices and high-throughput sequencing allow usmeasuring thousands or millions of genomic characteristics.David Montaner Bioinformatics in medicine 2/26
  • 3. Genomics vs. geneticsGenetics:• Single genes are responsible for biological changes.• one gene → one hypothesis → one p-value → conclusionsGenomics:• Genes or genomic features act together to producebiological changes.• many genes → many hypothesis → many p-value →→ more data analysis• Computational support is needed even for drawingconclusionsDavid Montaner Bioinformatics in medicine 3/26
  • 4. Genomic numbersMicroarray:• 30.000 genes• 2 million SNPs• 100 MbMeasured features:• genes, isoforms• SNPs, Polymorphisms• IN-DELS• loss of heterozygosity• methylation• copy number alterationsNGS:• 30.000 genes• 30.000 transcripts• 20 million SNPs• 10-100 GBRegistered information:• Genomic characteristics:position, chromosome ...• Biological function• Disease association• miRNA targetsDavid Montaner Bioinformatics in medicine 4/26
  • 5. Genomic databasesNucleic Acid Research lists +1500 online databases!• Many different databases for each category, which should Iuse?• No standards: different IDs, methods, servers, formats, ...• Lack of international initiatives, many local and smalldatabases• Different gene IDs, more than 50• In vivo vs in silico databasesDavid Montaner Bioinformatics in medicine 5/26
  • 6. Biological databases (Wikipedia)1 Primary nucleotidesequence databases2 Metadatabases3 Genome databases4 Protein sequencedatabases5 Proteomics databases6 Protein structuredatabases7 Protein model databases8 RNA databases9 Carbohydrate structuredatabases10 Protein-protein interactions11 Signal transductionpathway databases12 Metabolic pathwaydatabases13 Experimental datarepositories (MicroarraysNGS, Sanger)14 Exosomal databases15 Mathematical modeldatabases16 PCR / real time PCRprimer databases17 Specialized databases18 Taxonomic databases19 Wiki-style databasesDavid Montaner Bioinformatics in medicine 6/26
  • 7. Primary nucleotide sequencedatabasesContain any kind of nucleotide sequences, form genes togenomes.The International Nucleotide Sequence Database (INSD)Collaboration:• GenBankNational Center for Biotechnology Information (NCBI)• European Nucleotide Archive (ENA)European Bioinformatics Institute (EBI)• DNA Data Bank of Japan (DDBJ)David Montaner Bioinformatics in medicine 7/26
  • 8. GenBankPrimary nucleotide sequence databases• available on the NCBI ftp site:• A new release is made every two months.• 3 types of entries:• CoreNucleotide (the main collection)• dbEST (Expressed Sequence Tags)• dbGSS (Genome Survey Sequences)Access:• Search for sequence identifiers using Entrez Nucleotide:• Align GenBank sequences to a query sequence usingBLAST (Basic Local Alignment Search Tool).• Several other e-utilities (see book)See an example of a GenBank record.David Montaner Bioinformatics in medicine 8/26
  • 9. Metadatabases• Collect and organize data from primary nucleotidesequence databases and may other resources.• Make the information available in a convenient format andprovide data handling resources: web pages, applicationprogramming interface (API) …• Focus on particular species, diseases …Examples• Entrez: searches through almost all NCBI resources.• GeneCards: provides genomic, proteomic, transcriptomic,genetic and functional information for human genes (knownand predicted) Montaner Bioinformatics in medicine 9/26
  • 10. EntrezMetadatabases• Searches through almost all NCBI resources.• Entrez search page:• queries can be saved if you have a a MyNCBI account Montaner Bioinformatics in medicine 10/26
  • 11. Genome databasesCollect genome sequences and annotation (specification aboutgenes) for particular organisms, and try to improve them:• Data curation.• Complete missing information using insilico methods.• Generate new relational organization.• Complement feature IDs.• Provide “easy” access, visualization …Examples• Ensembl: automatic annotation on selected eukaryotegenomes.• UCSC Genome Browser: reference sequence and workingdraft assemblies for a large collection of genomes• Wormbase: genome of the model organism C.elegans.David Montaner Bioinformatics in medicine 11/26
  • 12. EnsemblGenome databases• Ensembl is a joint project between European BioinformaticsInstitute (EBI) the European Molecular Biology Laboratory(EMBL) and the Wellcome Trust Sanger Institute.• Develop a software system which produces and maintainsautomatic annotation on selected vertebrate andeukaryote genomes.• http://www.ensembl.orgDavid Montaner Bioinformatics in medicine 12/26
  • 13. UCSC Genome BrowserGenome databases• UCSC: University of California, Santa Cruz.• This site contains the reference sequence and workingdraft assemblies for a large collection of genomes.• Montaner Bioinformatics in medicine 13/26
  • 14. Protein sequence databases• Most times proteins are the final unit of interest to research.• There is a direct conversion from DNA/RNA sequences toprotein sequences.• Gene IDs and protein IDs are equivalently used byresearchers (biologists not bioinformaticians …)Examples• UniProt: Universal Protein Resource (EBI)• Swiss-Prot (Swiss Institute of Bioinformatics)• InterPro Classifies proteins into families and predicts thepresence of domains and sites.• Pfam Protein families database of alignments and HMMs(Sanger Institute)David Montaner Bioinformatics in medicine 14/26
  • 15. RNA databases• Contain information about RNA molecules.• Most of them regarding gene regulatory factors. (Geneinformation is usually in other repositories).Examples• mirBase: microRNAs• TRANSFAC: transcription factors in eukaryote (Proprietarydatabase).• JASPAR: transcription factor binding sites for eukaryote(Open access, curated, non-redundant). Montaner Bioinformatics in medicine 15/26
  • 16. Protein-protein interactions• Proteins are the main functional units.• But they do not work in isolation.• Pretty useless at the moment but promising in the future …• some information is experimental, but most of it isgenerated insilico.Examples• IntAct: protein–small moleculeand protein–nucleic acidinteractions.• BIND: Biomolecular InteractionNetwork Database.David Montaner Bioinformatics in medicine 16/26
  • 17. Signal transduction pathwaydatabases& Metabolic pathway databases• Information about how genes (or proteins) interact amongthem.• not only physical interactions …Examples• Reactome: free online database of biological pathways.• KEGG: Kyoto Encyclopedia of Genes and Genomes.Metabolic pathways. Montaner Bioinformatics in medicine 17/26
  • 18. KEGGMetabolic pathway databasesDavid Montaner Bioinformatics in medicine 18/26
  • 19. Experimental data repositoriesContain Microarray, NGS, Sanger, and other experimental highthroughput data.• GEO: Gene Expression Omnibus (NCBI)• ArrayExpress: database of functional genomicsexperiments including (EBI)• The Cancer Genome Atlas (TCGA): Data on differentcancer related tissues. Montaner Bioinformatics in medicine 19/26
  • 20. BioinformaticsTraining• Biology 1/3• Statistics 1/3• Computer science 1/3 ←−Efficiently combine:• Experimental information• Database registered knowledgeTime and resources:• As in the wet labDavid Montaner Bioinformatics in medicine 20/26
  • 21. ExampleDavid Montaner Bioinformatics in medicine 21/26
  • 22. Example IAutistic children1 (microarray) NGS data processing• data quality control, filtering...• map against reference genome• CNV calling2 CNV filtering• just 75 rare de novo CNV events (not registered indatabases)• filter out the long ones• keep the ones that contain genesDavid Montaner Bioinformatics in medicine 22/26
  • 23. Example II3 move to the gene level• 47 loci in total affecting 433 human genes4 Building the background likelihood network• GO annotations• KEGG pathways• InterPro domains• protein-proteins interactions. Databases: BIND, BioGRID,DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS• sequence homology between the gene pair (BLAST)David Montaner Bioinformatics in medicine 23/26
  • 24. Example III5 Search for high scoring clusters affected by CNVs6 Evaluating significance of cluster scores:10.000 simulationsDavid Montaner Bioinformatics in medicine 24/26
  • 25. Example IV7 Functional characterization of the identified network8 And, finally, draw conclusionsDavid Montaner Bioinformatics in medicine 25/26
  • 26. QuestionsThanksDavid Montaner Bioinformatics in medicine 26/26