Welcome to BIOINFORMATICS                   -MiRON
Outline   Workshops chronology on hands out   Brief background information   Applications & role   Bioinformatics tool...
Aims   To introduce the concepts and language of    bioinformatics.   To provide an understanding of how nucleic acid   ...
In this workshop…..   You will learn about how data is    generated and analysed   As well as what the generated data ca...
What is bioinformatics?
Why bioinformatics?   Over the past decade massive amounts    of sequence data have been generated   This has more recen...
Main Topics (Review............)   Genome organisation and analysis   Functional genomics   Advanced techniques in mole...
What bioinformatcian thinkthey are
What they do
Examples of Bioinformatics    Database interfaces        Genbank/EMBL/DDBJ, Medline, SwissProt, PDB,         …    Seque...
Five W that all biologists    should know   NCBI (The National Center for Biotechnology Information;       http://www.nc...
Remember while using web    server-based tools   You are using someone else’s    computer   You are (probably) getting a...
Human Gene Index Database   HGI is a database of expressed DNA    sequences, mostly made of ESTs, which are    a type of ...
Gene Structure
Similarity Searching   There are a variety of computer    programs that are used for making    comparisons between DNA se...
BLAST is Complex   Similarity searching relies on the concepts of    alignment and distance between pairs of    sequences...
Workshop -1 (database search & inference of possible     homology)     Please refer to getting started with bioinformatics...
BLAST has AutomaticTranslation   BLASTX makes automatic translation (in all    6 reading frames) of your DNA query    seq...
A typical sequence ready for        submission to BLAST>THC2465887GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGC...
BLAST OUTPUT
BLAST line-up of human v canine partial cDNAs forhexokinase 1  Query:   3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAA...
Understand theStatistics!   BLAST produces an E-value for every match       This is the same as the P value in a statist...
BLAST is Approximate   BLAST makes similarity searches very    quickly because it takes shortcuts.       looks for short...
Bad GenomeAnnotation   Gene finding is at best only 90%    accurate.   New sequences are automatically    annotated with...
Conclusions   We have only touched small parts of    the elephant   Trial and error (intelligently) is often    your bes...
Bioinformatics MiRON
Upcoming SlideShare
Loading in …5
×

Bioinformatics MiRON

548 views

Published on

Bioinformatics Presentation - Training of MiRON

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
548
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • 25
  • 28
  • 30
  • 31
  • Bioinformatics MiRON

    1. 1. Welcome to BIOINFORMATICS -MiRON
    2. 2. Outline Workshops chronology on hands out Brief background information Applications & role Bioinformatics tools Practical classes Problem solving exercises What’s expected of you ? Questions/comments are welcome at all points
    3. 3. Aims To introduce the concepts and language of bioinformatics. To provide an understanding of how nucleic acid and protein sequence data is obtained and analysed. To develop skills in utilising online databases and interpreting data. To develop an understanding of how bioinformatics can be applied to solve specific problems in biomedical science. To develop transferable IT and communications skills.
    4. 4. In this workshop….. You will learn about how data is generated and analysed As well as what the generated data can tell us about the molecular biology of organisms And various practical applications of this knowledge
    5. 5. What is bioinformatics?
    6. 6. Why bioinformatics? Over the past decade massive amounts of sequence data have been generated This has more recently been joined by gene expression data obtained from microarrays and proteomic technologies This vast amount of data can only be analysed using various specialised computer algorithms
    7. 7. Main Topics (Review............) Genome organisation and analysis Functional genomics Advanced techniques in molecular biology Archives, information retrieval and alignments: Nucleic acid sequence databases; genome databases; protein sequence databases; database searching Dot plots (SIMILARITY MATRX) and sequence alignments (PSI BLAST); Genome expression: Microarray analysis, proteomics, eukaryotic genome expression
    8. 8. What bioinformatcian thinkthey are
    9. 9. What they do
    10. 10. Examples of Bioinformatics  Database interfaces  Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …  Sequence alignment  BLAST, FASTA  Multiple sequence alignment  Clustal W, MultAlin, DiAlign  Gene finding  Genscan, GenomeScan, GeneMark, GRAIL  Protein Domain analysis and identification  pfam, BLOCKS, ProDom,  Pattern Identification/Characterization  Gibbs Sampler, AlignACE, MEME  Protein Folding prediction  PredictProtein, SwissModeler
    11. 11. Five W that all biologists should know NCBI (The National Center for Biotechnology Information;  http://www.ncbi.nlm.nih.gov/ EBI (The European Bioinformatics Institute)  http://www.ebi.ac.uk/ The Canadian Bioinformatics Resource  http://www.cbr.nrc.ca/ SwissProt/ExPASy (Swiss Bioinformatics Resource)  http://expasy.cbr.nrc.ca/sprot/ PDB (The Protein Databank)  http://www.rcsb.org/PDB/
    12. 12. Remember while using web server-based tools You are using someone else’s computer You are (probably) getting a reduced set of options or capacity Servers are great for sporadic or proof- of-principle work, but for intensive work, the software should be obtained and run locally
    13. 13. Human Gene Index Database HGI is a database of expressed DNA sequences, mostly made of ESTs, which are a type of partial cDNA EST stands for Expressed Sequence Tag These short sequences were created using essentially the same method used to make cDNAs As such they represent the expressed part of a genome and are made from mRNA which is ultimately expressed from GENES
    14. 14. Gene Structure
    15. 15. Similarity Searching There are a variety of computer programs that are used for making comparisons between DNA sequences. The most popular is known as BLAST (Basic Local Alignment Search Tool) BLAST is free at the NCBI website
    16. 16. BLAST is Complex Similarity searching relies on the concepts of alignment and distance between pairs of sequences. Distances can only be measured between aligned sequences (match vs. mismatch at each position). A similarity search is a process of testing the best alignment of a query sequence with every sequence in a database.
    17. 17. Workshop -1 (database search & inference of possible homology) Please refer to getting started with bioinformatics INTRO TO BLAST Basic Local Alignment Search Tool It is used to compare a query sequence with those contained in nucleotide databases by aligning the query sequence with previously characterised genes, therefore helping in identifying genes. The emphasis of this tool is to find regions of sequence similarity between two different genes. These sequence alignments can yield clues about the structure and function of a novel sequence, and about its evolutionary history and homology with other sequences in the database.
    18. 18. BLAST has AutomaticTranslation BLASTX makes automatic translation (in all 6 reading frames) of your DNA query sequence to compare with protein databanks TBLASTN makes automatic translation of an entire DNA database to compare with your protein query sequence Only make a DNA-DNA search if you are working with a sequence that does not code for protein.
    19. 19. A typical sequence ready for submission to BLAST>THC2465887GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCCTATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAACTCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCAAGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCTTCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCCAGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGAAGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTGATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAAGCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
    20. 20. BLAST OUTPUT
    21. 21. BLAST line-up of human v canine partial cDNAs forhexokinase 1 Query: 3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086 |||||| | |||||| |||||||| | ||| ||||||||||| |||||||| ||| Sbjct: 75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134 Query: 3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146 |||| | | | ||||||| || |||||||||||||||||| ||||| ||| |||| | Sbjct: 135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194 Query: 3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205 || | | |||||||| |||| |||| ||||| ||||||||| | | ||||||||| Sbjct: 195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253 Query: 3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263 ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | | Sbjct: 254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311 Query: 3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319 || | | | |||| ||||| || | ||| | | | |||| | || | | Sbjct: 312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370 Query: 3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378 | | || || | | |||| | || || | || ||| | | || || | Sbjct: 371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422 Query: 3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437 | | | || || ||| | ||| | | | | || || ||||| || Sbjct: 423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478 Query: 3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489 ||| | | |||| |||||||||| ||||| |||| |||| ||||||| || |||| Sbjct: 479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537 Query: 3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541 ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| || Sbjct: 538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597 Query: 3542 TTTGT 3546 || | Sbjct: 598 CTTNT 602
    22. 22. Understand theStatistics! BLAST produces an E-value for every match  This is the same as the P value in a statistical test A match is generally considered significant if the E-value < 0.05 (smaller numbers are more significant) Very low E-values (e-100) are homologs or identical genes Moderate E-values are related genes Long regions of moderate similarity are more important than short regions of high identity.
    23. 23. BLAST is Approximate BLAST makes similarity searches very quickly because it takes shortcuts.  looks for short, nearly identical “words” (11 bases) It also makes errors  misses some important similarities  makes many incorrect matches  easily fooled by repeats or skewed composition
    24. 24. Bad GenomeAnnotation Gene finding is at best only 90% accurate. New sequences are automatically annotated with BLAST scores. Bad annotations propagate Its going to take us 10-20 years or more to sort this mess out!
    25. 25. Conclusions We have only touched small parts of the elephant Trial and error (intelligently) is often your best tool Keep up with the main five sites, and you’ll have a pretty good idea of what is happening and available

    ×