BITS training - UCSC Genome Browser - Part 2

  • 1,224 views
Uploaded on

These is the second part of the lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser. …

These is the second part of the lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser.

See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,224
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
56
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. UCSCgenome browsing Paco Hulpiau http://www.bits.vib.be
  • 2. TABLE GET CURRENT BROWSER DNA BROWSER GRAPHIC IN PDF TO GET OTHERCLICK DATALINE
  • 3. TO GET OTHERCLICKLINE 2 DATA
  • 4. Databases & accession numbers§ GenBank exchanges data daily with its two partners in the International Nucleotide Sequence Database Collaboration (INSDC): European Bioinformatics Institute (EBI, part of EMBL) DNA Data Bank of Japan (DDBJ)§ Characteristics of GenBank and RefSeq @ NCBI : GenBank RefSeq Curated, NCBI creates from existing Not curated, author submits data Multiple records for same loci Single records for each molecule No limit to species included Limited to model organisms
  • 5. Databases & accession numbers§§ The Ensembl automatic gene annotation system (Curwen et al, 2004) : The gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences incl. GenBank sequences shared by INSDC, UniProtKB and NCBI RefSeq
  • 6. Databases & accession numbers§ Database Typical accession numbers GenBank AAA37420 NM_123456 = mRNA NP_123456 = proteins RefSeq XM_123456 = predicted mRNA XP_123456 = predicted proteins UniProtKB (Swiss- P12345, Q1AAA9 Prot/TrEMBL) ENSMUSG00000123456 for Genes Ensembl ENSMUST00000123456 for Transcripts ENSMUSP00000123456 for Proteins
  • 7. TO GET OTHERCLICKLINE 2 DATA
  • 8. zoom in onexon 1 +upstream
  • 9. Exercises (II)1) Are there any diseases related to your gene of interest? (OMIM) Which interactions partners are known? (Entrez Gene) Any important SNPs changing the amino acid sequence? Get the multiple sequence alignment (MSA, multiz46way) showing the nucleotide sequences of human, mouse, chicken, Xenopus and zebrafish genes (CDS fasta alignment, exons not separate). Save your results (e.g. exercises2_1.doc).
  • 10. 3GETDNA TO GET OTHER DATA
  • 11. http://www.visibone.com/colorlab/
  • 12. Exercises (II)2) Get the DNA sequence for your gene of interest including 2000 base pairs upstream and use the following extended case/color options: » RefSeq and Ensembl genes in bold » SNPs (132) underlined » Regulatory information e.g. from Oreganno and miRNA sites in different colors » Save your results (e.g. exercises2_2a.doc).
  • 13. Exercises (II)2) Try to get the DNA sequence for your gene of interest in chicken or zebrafish and use the following extended case/color options: » UCSC, RefSeq and Ensembl genes in bold » Other RefSeq genes underlined » Human proteins in a specific color » Save your results (e.g. exercises2_2b.doc).
  • 14. 4TABLEBROWSER TO GET OTHER DATA
  • 15. COPY (Ctrl+C)
  • 16. = Accession Number (RefSeq) e.g. NM_001229= Gene Name (Entrez) e.g. CASP1
  • 17. Exercises (II)3) Get a list of the RefSeq and Ensembl transcripts using the table browser with the following selected fields: » name, chromosome, exon count, name2 » Save the results (exercises2_3a.xls) Also get the sequences and save as genename_transcripts.fasta Search the mouse genome using the filter in the table browser to get all family members of a protein family (research interest) and save the results in a list (exercises2_3b.xls) containing name, chromosome, cds start and end, exon count and name2
  • 18. TO GETOTHER DATA
  • 19. TO GETOTHER DATA
  • 20. BLAT = Blast-Like Alignment ToolØ search for high similarity matches by indexing entiregenomeØ DNA limit = 25000 bases, for multiple seqs 50000 basesØ protein limit = 10000 aa, for multiple seqs 25000 aaØ total sequences = 25
  • 21. PASTE (Ctrl+V)
  • 22. TTTAGCCAACGAACAGTCGCT TTCTCTTTGCATCTGTCCCAG
  • 23. § The Utilities page contains links to some tools created by the UCSC Genome Bioinformatics Group.§ DNA Duster & Protein Duster remove non-sequence related characters from an input sequence.
  • 24. Exercises (II)4) Use BLAT to find orthologs of your gene in chicken, zebrafish and fruit fly. What is the genomic location? Are the flanking genes the same? Perform an in silico PCR to see what happens when more than 1 PCR product may arise and determine product size and Tm: species: human forward primer: TTC AAG GAG GCC TTC TCC CT reverse primer: CTG GGG GAG AAG CTG A (+click flip reverse)