SlideShare a Scribd company logo
1 of 26
Welcome to BIOINFORMATICS
                   -MiRON
Outline
   Workshops chronology on hands out
   Brief background information
   Applications & role
   Bioinformatics tools
   Practical classes
   Problem solving exercises
   What’s expected of you ?
   Questions/comments are welcome at all
    points
Aims
   To introduce the concepts and language of
    bioinformatics.
   To provide an understanding of how nucleic acid
    and protein sequence data is obtained and
    analysed.
   To develop skills in utilising online databases and
    interpreting data.
   To develop an understanding of how bioinformatics
    can be applied to solve specific problems in
    biomedical science.
   To develop transferable IT and communications
    skills.
In this workshop…..
   You will learn about how data is
    generated and analysed
   As well as what the generated data can
    tell us about the molecular biology of
    organisms
   And various practical applications of
    this knowledge
What is bioinformatics?
Why bioinformatics?
   Over the past decade massive amounts
    of sequence data have been generated
   This has more recently been joined by
    gene expression data obtained from
    microarrays and proteomic technologies
   This vast amount of data can only be
    analysed using various specialised
    computer algorithms
Main Topics (Review............)
   Genome organisation and analysis
   Functional genomics
   Advanced techniques in molecular biology
   Archives, information retrieval and alignments:
   Nucleic acid sequence databases; genome
    databases; protein sequence databases; database
    searching
   Dot plots (SIMILARITY MATRX) and sequence
    alignments (PSI BLAST);
   Genome expression: Microarray analysis,
    proteomics, eukaryotic genome expression
What bioinformatcian think
they are
What they do
Examples of Bioinformatics
    Database interfaces
        Genbank/EMBL/DDBJ, Medline, SwissProt, PDB,
         …
    Sequence alignment
        BLAST, FASTA
    Multiple sequence alignment
        Clustal W, MultAlin, DiAlign
    Gene finding
        Genscan, GenomeScan, GeneMark, GRAIL
    Protein Domain analysis and identification
        pfam, BLOCKS, ProDom,
    Pattern Identification/Characterization
        Gibbs Sampler, AlignACE, MEME
    Protein Folding prediction
        PredictProtein, SwissModeler
Five W that all biologists
    should know
   NCBI (The National Center for Biotechnology Information;
       http://www.ncbi.nlm.nih.gov/
   EBI (The European Bioinformatics Institute)
       http://www.ebi.ac.uk/
   The Canadian Bioinformatics Resource
       http://www.cbr.nrc.ca/
   SwissProt/ExPASy (Swiss Bioinformatics Resource)
       http://expasy.cbr.nrc.ca/sprot/
   PDB (The Protein Databank)
       http://www.rcsb.org/PDB/
Remember while using web
    server-based tools

   You are using someone else’s
    computer
   You are (probably) getting a reduced
    set of options or capacity
   Servers are great for sporadic or proof-
    of-principle work, but for intensive work,
    the software should be obtained and
    run locally
Human Gene Index Database
   HGI is a database of expressed DNA
    sequences, mostly made of ESTs, which are
    a type of partial cDNA
   EST stands for Expressed Sequence Tag
   These short sequences were created using
    essentially the same method used to make
    cDNAs
   As such they represent the expressed part of
    a genome and are made from mRNA which is
    ultimately expressed from GENES
Gene Structure
Similarity Searching
   There are a variety of computer
    programs that are used for making
    comparisons between DNA sequences.
   The most popular is known as BLAST
    (Basic Local Alignment Search Tool)
   BLAST is free at the NCBI website
BLAST is Complex
   Similarity searching relies on the concepts of
    alignment and distance between pairs of
    sequences.
   Distances can only be measured between
    aligned sequences (match vs. mismatch at
    each position).
   A similarity search is a process of testing the
    best alignment of a query sequence with
    every sequence in a database.
Workshop -1 (database search & inference of possible
     homology)

     Please refer to getting started with bioinformatics




    INTRO TO BLAST
   Basic Local Alignment Search Tool
   It is used to compare a query sequence with those contained in
    nucleotide databases by aligning the query sequence with
    previously characterised genes, therefore helping in identifying
    genes.
   The emphasis of this tool is to find regions of sequence
    similarity between two different genes.
   These sequence alignments can yield clues about the structure
    and function of a novel sequence, and about its evolutionary
    history and homology with other sequences in the database.
BLAST has Automatic
Translation
   BLASTX makes automatic translation (in all
    6 reading frames) of your DNA query
    sequence to compare with protein
    databanks
   TBLASTN makes automatic translation of
    an entire DNA database to compare with
    your protein query sequence
   Only make a DNA-DNA search if you are
    working with a sequence that does not code
    for protein.
A typical sequence ready for
        submission to BLAST
>THC2465887
GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCC
TATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAAC
TCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCA
AGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCT
TCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCC
AGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGA
AGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTG
ATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAA
GCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
BLAST OUTPUT
BLAST line-up of human v canine partial cDNAs for
hexokinase 1


  Query:   3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086
                |||||| | |||||| ||||||||    |   ||| ||||||||||| |||||||| |||
  Sbjct:     75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134

  Query:   3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146
                |||| | | | ||||||| || ||||||||||||||||||    ||||| ||| |||| |
  Sbjct:    135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194

  Query:   3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205
                || | | |||||||| |||| ||||   ||||| |||||||||   | | |||||||||
  Sbjct:    195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253

  Query:   3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263
                ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | |
  Sbjct:    254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311

  Query:   3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319
                ||    |   | | ||||     ||||| || | ||| | | | |||| | || | |
  Sbjct:    312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370

  Query:   3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378
                | | || || | | ||||      | ||    || | || ||| | | ||    || |
  Sbjct:    371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422

  Query:   3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437
                | | | || || ||| | ||| | | | | ||                || ||||| ||
  Sbjct:    423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478

  Query:   3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489
                 |||   | | |||| |||||||||| ||||| |||| |||| ||||||| || ||||
  Sbjct:    479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537

  Query:   3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541
                  ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| ||
  Sbjct:    538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597

  Query:   3542 TTTGT 3546
                 || |
  Sbjct:    598 CTTNT 602
Understand the
Statistics!
   BLAST produces an E-value for every match
       This is the same as the P value in a statistical test
   A match is generally considered significant if the
    E-value < 0.05 (smaller numbers are more significant)
   Very low E-values (e-100) are homologs or
    identical genes
   Moderate E-values are related genes
   Long regions of moderate similarity are more
    important than short regions of high identity.
BLAST is Approximate
   BLAST makes similarity searches very
    quickly because it takes shortcuts.
       looks for short, nearly identical “words” (11 bases)

   It also makes errors
       misses some important similarities
       makes many incorrect matches
            easily fooled by repeats or skewed composition
Bad Genome
Annotation
   Gene finding is at best only 90%
    accurate.
   New sequences are automatically
    annotated with BLAST scores.
   Bad annotations propagate
   Its going to take us 10-20 years or more
    to sort this mess out!
Conclusions
   We have only touched small parts of
    the elephant
   Trial and error (intelligently) is often
    your best tool
   Keep up with the main five sites, and
    you’ll have a pretty good idea of what is
    happening and available

More Related Content

What's hot

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)Sobia
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BiotechOnline
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopMahmoud Parsian
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 

What's hot (20)

blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
BLAST
BLASTBLAST
BLAST
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
BLAST
BLASTBLAST
BLAST
 
BLAST
BLASTBLAST
BLAST
 
Harvester I
Harvester IHarvester I
Harvester I
 
Harvester Ii
Harvester IiHarvester Ii
Harvester Ii
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
 
Myers CV_2015
Myers CV_2015Myers CV_2015
Myers CV_2015
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Arraygen_Brochure
Arraygen_BrochureArraygen_Brochure
Arraygen_Brochure
 
31931 31941
31931 3194131931 31941
31931 31941
 

Similar to Bioinformatics MiRON

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptxericndunek
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234alizain9604
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Syed Lokman
 

Similar to Bioinformatics MiRON (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Article
ArticleArticle
Article
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Blasta
BlastaBlasta
Blasta
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Ncbi
NcbiNcbi
Ncbi
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 

Recently uploaded

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 

Recently uploaded (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

Bioinformatics MiRON

  • 2. Outline  Workshops chronology on hands out  Brief background information  Applications & role  Bioinformatics tools  Practical classes  Problem solving exercises  What’s expected of you ?  Questions/comments are welcome at all points
  • 3. Aims  To introduce the concepts and language of bioinformatics.  To provide an understanding of how nucleic acid and protein sequence data is obtained and analysed.  To develop skills in utilising online databases and interpreting data.  To develop an understanding of how bioinformatics can be applied to solve specific problems in biomedical science.  To develop transferable IT and communications skills.
  • 4. In this workshop…..  You will learn about how data is generated and analysed  As well as what the generated data can tell us about the molecular biology of organisms  And various practical applications of this knowledge
  • 6. Why bioinformatics?  Over the past decade massive amounts of sequence data have been generated  This has more recently been joined by gene expression data obtained from microarrays and proteomic technologies  This vast amount of data can only be analysed using various specialised computer algorithms
  • 7. Main Topics (Review............)  Genome organisation and analysis  Functional genomics  Advanced techniques in molecular biology  Archives, information retrieval and alignments:  Nucleic acid sequence databases; genome databases; protein sequence databases; database searching  Dot plots (SIMILARITY MATRX) and sequence alignments (PSI BLAST);  Genome expression: Microarray analysis, proteomics, eukaryotic genome expression
  • 10. Examples of Bioinformatics  Database interfaces  Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …  Sequence alignment  BLAST, FASTA  Multiple sequence alignment  Clustal W, MultAlin, DiAlign  Gene finding  Genscan, GenomeScan, GeneMark, GRAIL  Protein Domain analysis and identification  pfam, BLOCKS, ProDom,  Pattern Identification/Characterization  Gibbs Sampler, AlignACE, MEME  Protein Folding prediction  PredictProtein, SwissModeler
  • 11. Five W that all biologists should know  NCBI (The National Center for Biotechnology Information;  http://www.ncbi.nlm.nih.gov/  EBI (The European Bioinformatics Institute)  http://www.ebi.ac.uk/  The Canadian Bioinformatics Resource  http://www.cbr.nrc.ca/  SwissProt/ExPASy (Swiss Bioinformatics Resource)  http://expasy.cbr.nrc.ca/sprot/  PDB (The Protein Databank)  http://www.rcsb.org/PDB/
  • 12. Remember while using web server-based tools  You are using someone else’s computer  You are (probably) getting a reduced set of options or capacity  Servers are great for sporadic or proof- of-principle work, but for intensive work, the software should be obtained and run locally
  • 13. Human Gene Index Database  HGI is a database of expressed DNA sequences, mostly made of ESTs, which are a type of partial cDNA  EST stands for Expressed Sequence Tag  These short sequences were created using essentially the same method used to make cDNAs  As such they represent the expressed part of a genome and are made from mRNA which is ultimately expressed from GENES
  • 14.
  • 16. Similarity Searching  There are a variety of computer programs that are used for making comparisons between DNA sequences.  The most popular is known as BLAST (Basic Local Alignment Search Tool)  BLAST is free at the NCBI website
  • 17. BLAST is Complex  Similarity searching relies on the concepts of alignment and distance between pairs of sequences.  Distances can only be measured between aligned sequences (match vs. mismatch at each position).  A similarity search is a process of testing the best alignment of a query sequence with every sequence in a database.
  • 18. Workshop -1 (database search & inference of possible homology) Please refer to getting started with bioinformatics INTRO TO BLAST  Basic Local Alignment Search Tool  It is used to compare a query sequence with those contained in nucleotide databases by aligning the query sequence with previously characterised genes, therefore helping in identifying genes.  The emphasis of this tool is to find regions of sequence similarity between two different genes.  These sequence alignments can yield clues about the structure and function of a novel sequence, and about its evolutionary history and homology with other sequences in the database.
  • 19. BLAST has Automatic Translation  BLASTX makes automatic translation (in all 6 reading frames) of your DNA query sequence to compare with protein databanks  TBLASTN makes automatic translation of an entire DNA database to compare with your protein query sequence  Only make a DNA-DNA search if you are working with a sequence that does not code for protein.
  • 20. A typical sequence ready for submission to BLAST >THC2465887 GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCC TATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAAC TCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCA AGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCT TCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCC AGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGA AGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTG ATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAA GCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
  • 22. BLAST line-up of human v canine partial cDNAs for hexokinase 1 Query: 3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086 |||||| | |||||| |||||||| | ||| ||||||||||| |||||||| ||| Sbjct: 75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134 Query: 3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146 |||| | | | ||||||| || |||||||||||||||||| ||||| ||| |||| | Sbjct: 135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194 Query: 3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205 || | | |||||||| |||| |||| ||||| ||||||||| | | ||||||||| Sbjct: 195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253 Query: 3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263 ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | | Sbjct: 254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311 Query: 3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319 || | | | |||| ||||| || | ||| | | | |||| | || | | Sbjct: 312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370 Query: 3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378 | | || || | | |||| | || || | || ||| | | || || | Sbjct: 371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422 Query: 3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437 | | | || || ||| | ||| | | | | || || ||||| || Sbjct: 423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478 Query: 3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489 ||| | | |||| |||||||||| ||||| |||| |||| ||||||| || |||| Sbjct: 479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537 Query: 3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541 ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| || Sbjct: 538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597 Query: 3542 TTTGT 3546 || | Sbjct: 598 CTTNT 602
  • 23. Understand the Statistics!  BLAST produces an E-value for every match  This is the same as the P value in a statistical test  A match is generally considered significant if the E-value < 0.05 (smaller numbers are more significant)  Very low E-values (e-100) are homologs or identical genes  Moderate E-values are related genes  Long regions of moderate similarity are more important than short regions of high identity.
  • 24. BLAST is Approximate  BLAST makes similarity searches very quickly because it takes shortcuts.  looks for short, nearly identical “words” (11 bases)  It also makes errors  misses some important similarities  makes many incorrect matches  easily fooled by repeats or skewed composition
  • 25. Bad Genome Annotation  Gene finding is at best only 90% accurate.  New sequences are automatically annotated with BLAST scores.  Bad annotations propagate  Its going to take us 10-20 years or more to sort this mess out!
  • 26. Conclusions  We have only touched small parts of the elephant  Trial and error (intelligently) is often your best tool  Keep up with the main five sites, and you’ll have a pretty good idea of what is happening and available

Editor's Notes

  1. 25
  2. 28
  3. 30
  4. 31