Basic bioinformatics concepts, databases and tools Module 5 Genome browsers and  interpretation of  gene lists Dr. Joachim Jacob http://www.bits.vib.be Updated 21 July 2011 http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod5-intro_H1_2011_genomebrowsers.pdf
Integrating biological information Genome databases and browsers Integration on a species basis all biological information: Ensembl Genome Browser http://www.ensembl.org/ Table Browsers Retrieving biological (not only sequence) data applying various criteria: Biomart http://www.biomart.org/ Interpreting gene lists 'What is the biology behind my gene list': DAVID http://david.abcc.ncifcrf.gov/
Reference genome sequences provide a standard genome sequence per species  Genomes  From various sequence sources, a genome is  assembled By NCBI: currently assembly 37 in human (or 'build') (2010)  By Celera: commercial Each build differs! 1. Data freeze: all data for assembling (ignoring new data from that point) 2. Assembly process and annotation 3. Release of the Build: Reference Sequence Genom e http://www.ncbi.nlm.nih.gov/Genomes/
 
Finding your way in genomes Annotation and terms See also  NCBI handbook Locus = place on the genome, ~ a gene (different alleles) Location: Rough location by staining of chromosomes e.g. 18q12.1 -> chromosome 18, long arm (=q, small arm is p) Exact bases on genomes (assembly must be mentioned!)
Genome Browsers: main players Three main players  MapViewer (NCBI) UCSC Genome Browser Ensembl Genome browser BITS UCSC Genome Browser training BITS Ensembl Genome Browser training
Ensembl Genome browser We will use this browser in this session Information is combination of   automatic  annotation and  manually curated  s ources (ENS >< Havana (Vega) genes) All entries can be accessed through the browser, each with its own clear identifiers
28 November 2009 [email_address] /10 http://www.ensembl.org Information about the genomes
http://www.ensemblgenomes.org
[email_address] /10 ! …  or click on the figure feature!
28 November 2009 [email_address] /10
28 November 2009 [email_address] /10 [email_address]
TAB SUMMARY DETAILED INFORMATION INFOR-MATION SELEC-TOR DATA MANAGER tab DAS
Ensembl Genome browser Usefulness: One place for all information on a particular gene / structure / location / variation But also:  Comparison to other species The Ensembl Team has a lot of training movies and examples available. Check them out! http://www.ensembl.org/info/index.html http://www.ensembl.org/Help/Movie?id=188
Ensembl Genome browser Usefulness: One place for all information on a particular gene / structure / location / variation But also:  Comparison to other species The Ensembl Team has a lot of training movies and examples available. Check them out! http://www.ensembl.org/info/index.html http://www.ensembl.org/Help/Movie?id=188
Tracks are a way to display information on a genome sequence The annotation on a genome-wide scale is displayed in tracks.  Relevant database content can be formatted in tracks and displayed on a reference genome Genome reference tracks Screenshot of Ensembl genome browser
Tracks are a way to display information on a genome sequence The annotation on a genome-wide scale is displayed in tracks, most used formats: - each base receives a value: dense continuous data:  WIG format  (e.g. %GC) - annotation has a start and a stop coordinate:  bed format  (e.g. gene annotations) Example Variations in genomes are reported in vcf format http://www.ensembl.org/info/website/upload/bed.html http://www.bits.vib.be/wiki/index.php/.vcf #CHROM POS  ID  REF  ALT  QUAL FILTER INFO  FORMAT  20  14370  rs6054257 G  A  29  PASS  NS=3;DP=14;AF=0.5;DB;H2  GT:GQ:DP:HQ 20  17330  .  T  A  3  q10  NS=3;DP=11;AF=0.017  GT:GQ:DP:HQ
Biomart, your one stop portal to fetch information Biomart  http://www.biomart.org/   These questions are easy: Hey, can you tell me how many genes in mouse  exist which regulate transcription and are located on  Chromosome 19 ?
Biomart, your one stop portal to fetch information Biomart  http://www.biomart.org/   These questions are easy: Hey, can you tell me  how many   genes  in  mouse   exist which  regulate transcription  and are located on  Chromosome 19  ? Ensembl  Genes Genome sequence (Ensembl) Gene Ontology GO:0009299
Biomart, your one stop portal to fetch information Biomart  http://www.biomart.org/   Translated questions reflect in database choice and  Filters Resulting genes are counted and the output set via  Attributes
Biomart is available for an increasing number of databases Biomart http://www.biomart.org/
Gene lists resulting from different analyses can reveal their biology  DAVID -  http://david.abcc.ncifcrf.gov/
Gene lists resulting from different analyses can reveal their biology  DAVID -  http://david.abcc.ncifcrf.gov/   DEMO Alternatives g:Profiler http://biit.cs.ut.ee/gprofiler/ Babelomics http://www.babelomics.org/
Galaxy allows you to store your data and to (re)analyse it conveniently Galaxy -  http://usegalaxy.org
Galaxy allows you to store your data and to (re)analyse it conveniently Galaxy -  http://usegalaxy.org   DEMO TOOLS RESULTS DATA SETS

BITs: Genome browsers and interpretation of gene lists.

  • 1.
    Basic bioinformatics concepts,databases and tools Module 5 Genome browsers and interpretation of gene lists Dr. Joachim Jacob http://www.bits.vib.be Updated 21 July 2011 http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod5-intro_H1_2011_genomebrowsers.pdf
  • 2.
    Integrating biological informationGenome databases and browsers Integration on a species basis all biological information: Ensembl Genome Browser http://www.ensembl.org/ Table Browsers Retrieving biological (not only sequence) data applying various criteria: Biomart http://www.biomart.org/ Interpreting gene lists 'What is the biology behind my gene list': DAVID http://david.abcc.ncifcrf.gov/
  • 3.
    Reference genome sequencesprovide a standard genome sequence per species Genomes From various sequence sources, a genome is assembled By NCBI: currently assembly 37 in human (or 'build') (2010) By Celera: commercial Each build differs! 1. Data freeze: all data for assembling (ignoring new data from that point) 2. Assembly process and annotation 3. Release of the Build: Reference Sequence Genom e http://www.ncbi.nlm.nih.gov/Genomes/
  • 4.
  • 5.
    Finding your wayin genomes Annotation and terms See also NCBI handbook Locus = place on the genome, ~ a gene (different alleles) Location: Rough location by staining of chromosomes e.g. 18q12.1 -> chromosome 18, long arm (=q, small arm is p) Exact bases on genomes (assembly must be mentioned!)
  • 6.
    Genome Browsers: mainplayers Three main players MapViewer (NCBI) UCSC Genome Browser Ensembl Genome browser BITS UCSC Genome Browser training BITS Ensembl Genome Browser training
  • 7.
    Ensembl Genome browserWe will use this browser in this session Information is combination of automatic annotation and manually curated s ources (ENS >< Havana (Vega) genes) All entries can be accessed through the browser, each with its own clear identifiers
  • 8.
    28 November 2009[email_address] /10 http://www.ensembl.org Information about the genomes
  • 9.
  • 10.
    [email_address] /10 !… or click on the figure feature!
  • 11.
    28 November 2009[email_address] /10
  • 12.
    28 November 2009[email_address] /10 [email_address]
  • 13.
    TAB SUMMARY DETAILEDINFORMATION INFOR-MATION SELEC-TOR DATA MANAGER tab DAS
  • 14.
    Ensembl Genome browserUsefulness: One place for all information on a particular gene / structure / location / variation But also: Comparison to other species The Ensembl Team has a lot of training movies and examples available. Check them out! http://www.ensembl.org/info/index.html http://www.ensembl.org/Help/Movie?id=188
  • 15.
    Ensembl Genome browserUsefulness: One place for all information on a particular gene / structure / location / variation But also: Comparison to other species The Ensembl Team has a lot of training movies and examples available. Check them out! http://www.ensembl.org/info/index.html http://www.ensembl.org/Help/Movie?id=188
  • 16.
    Tracks are away to display information on a genome sequence The annotation on a genome-wide scale is displayed in tracks. Relevant database content can be formatted in tracks and displayed on a reference genome Genome reference tracks Screenshot of Ensembl genome browser
  • 17.
    Tracks are away to display information on a genome sequence The annotation on a genome-wide scale is displayed in tracks, most used formats: - each base receives a value: dense continuous data: WIG format (e.g. %GC) - annotation has a start and a stop coordinate: bed format (e.g. gene annotations) Example Variations in genomes are reported in vcf format http://www.ensembl.org/info/website/upload/bed.html http://www.bits.vib.be/wiki/index.php/.vcf #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ
  • 18.
    Biomart, your onestop portal to fetch information Biomart http://www.biomart.org/ These questions are easy: Hey, can you tell me how many genes in mouse exist which regulate transcription and are located on Chromosome 19 ?
  • 19.
    Biomart, your onestop portal to fetch information Biomart http://www.biomart.org/ These questions are easy: Hey, can you tell me how many genes in mouse exist which regulate transcription and are located on Chromosome 19 ? Ensembl Genes Genome sequence (Ensembl) Gene Ontology GO:0009299
  • 20.
    Biomart, your onestop portal to fetch information Biomart http://www.biomart.org/ Translated questions reflect in database choice and Filters Resulting genes are counted and the output set via Attributes
  • 21.
    Biomart is availablefor an increasing number of databases Biomart http://www.biomart.org/
  • 22.
    Gene lists resultingfrom different analyses can reveal their biology DAVID - http://david.abcc.ncifcrf.gov/
  • 23.
    Gene lists resultingfrom different analyses can reveal their biology DAVID - http://david.abcc.ncifcrf.gov/ DEMO Alternatives g:Profiler http://biit.cs.ut.ee/gprofiler/ Babelomics http://www.babelomics.org/
  • 24.
    Galaxy allows youto store your data and to (re)analyse it conveniently Galaxy - http://usegalaxy.org
  • 25.
    Galaxy allows youto store your data and to (re)analyse it conveniently Galaxy - http://usegalaxy.org DEMO TOOLS RESULTS DATA SETS