SlideShare a Scribd company logo
Welcome to BIOINFORMATICS
                   -MiRON
Outline
   Workshops chronology on hands out
   Brief background information
   Applications & role
   Bioinformatics tools
   Practical classes
   Problem solving exercises
   What’s expected of you ?
   Questions/comments are welcome at all
    points
Aims
   To introduce the concepts and language of
    bioinformatics.
   To provide an understanding of how nucleic acid
    and protein sequence data is obtained and
    analysed.
   To develop skills in utilising online databases and
    interpreting data.
   To develop an understanding of how bioinformatics
    can be applied to solve specific problems in
    biomedical science.
   To develop transferable IT and communications
    skills.
In this workshop…..
   You will learn about how data is
    generated and analysed
   As well as what the generated data can
    tell us about the molecular biology of
    organisms
   And various practical applications of
    this knowledge
What is bioinformatics?
Why bioinformatics?
   Over the past decade massive amounts
    of sequence data have been generated
   This has more recently been joined by
    gene expression data obtained from
    microarrays and proteomic technologies
   This vast amount of data can only be
    analysed using various specialised
    computer algorithms
Main Topics (Review............)
   Genome organisation and analysis
   Functional genomics
   Advanced techniques in molecular biology
   Archives, information retrieval and alignments:
   Nucleic acid sequence databases; genome
    databases; protein sequence databases; database
    searching
   Dot plots (SIMILARITY MATRX) and sequence
    alignments (PSI BLAST);
   Genome expression: Microarray analysis,
    proteomics, eukaryotic genome expression
What bioinformatcian think
they are
What they do
Examples of Bioinformatics
    Database interfaces
        Genbank/EMBL/DDBJ, Medline, SwissProt, PDB,
         …
    Sequence alignment
        BLAST, FASTA
    Multiple sequence alignment
        Clustal W, MultAlin, DiAlign
    Gene finding
        Genscan, GenomeScan, GeneMark, GRAIL
    Protein Domain analysis and identification
        pfam, BLOCKS, ProDom,
    Pattern Identification/Characterization
        Gibbs Sampler, AlignACE, MEME
    Protein Folding prediction
        PredictProtein, SwissModeler
Five W that all biologists
    should know
   NCBI (The National Center for Biotechnology Information;
       http://www.ncbi.nlm.nih.gov/
   EBI (The European Bioinformatics Institute)
       http://www.ebi.ac.uk/
   The Canadian Bioinformatics Resource
       http://www.cbr.nrc.ca/
   SwissProt/ExPASy (Swiss Bioinformatics Resource)
       http://expasy.cbr.nrc.ca/sprot/
   PDB (The Protein Databank)
       http://www.rcsb.org/PDB/
Remember while using web
    server-based tools

   You are using someone else’s
    computer
   You are (probably) getting a reduced
    set of options or capacity
   Servers are great for sporadic or proof-
    of-principle work, but for intensive work,
    the software should be obtained and
    run locally
Human Gene Index Database
   HGI is a database of expressed DNA
    sequences, mostly made of ESTs, which are
    a type of partial cDNA
   EST stands for Expressed Sequence Tag
   These short sequences were created using
    essentially the same method used to make
    cDNAs
   As such they represent the expressed part of
    a genome and are made from mRNA which is
    ultimately expressed from GENES
Gene Structure
Similarity Searching
   There are a variety of computer
    programs that are used for making
    comparisons between DNA sequences.
   The most popular is known as BLAST
    (Basic Local Alignment Search Tool)
   BLAST is free at the NCBI website
BLAST is Complex
   Similarity searching relies on the concepts of
    alignment and distance between pairs of
    sequences.
   Distances can only be measured between
    aligned sequences (match vs. mismatch at
    each position).
   A similarity search is a process of testing the
    best alignment of a query sequence with
    every sequence in a database.
Workshop -1 (database search & inference of possible
     homology)

     Please refer to getting started with bioinformatics




    INTRO TO BLAST
   Basic Local Alignment Search Tool
   It is used to compare a query sequence with those contained in
    nucleotide databases by aligning the query sequence with
    previously characterised genes, therefore helping in identifying
    genes.
   The emphasis of this tool is to find regions of sequence
    similarity between two different genes.
   These sequence alignments can yield clues about the structure
    and function of a novel sequence, and about its evolutionary
    history and homology with other sequences in the database.
BLAST has Automatic
Translation
   BLASTX makes automatic translation (in all
    6 reading frames) of your DNA query
    sequence to compare with protein
    databanks
   TBLASTN makes automatic translation of
    an entire DNA database to compare with
    your protein query sequence
   Only make a DNA-DNA search if you are
    working with a sequence that does not code
    for protein.
A typical sequence ready for
        submission to BLAST
>THC2465887
GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCC
TATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAAC
TCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCA
AGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCT
TCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCC
AGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGA
AGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTG
ATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAA
GCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
BLAST OUTPUT
BLAST line-up of human v canine partial cDNAs for
hexokinase 1


  Query:   3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086
                |||||| | |||||| ||||||||    |   ||| ||||||||||| |||||||| |||
  Sbjct:     75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134

  Query:   3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146
                |||| | | | ||||||| || ||||||||||||||||||    ||||| ||| |||| |
  Sbjct:    135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194

  Query:   3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205
                || | | |||||||| |||| ||||   ||||| |||||||||   | | |||||||||
  Sbjct:    195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253

  Query:   3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263
                ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | |
  Sbjct:    254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311

  Query:   3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319
                ||    |   | | ||||     ||||| || | ||| | | | |||| | || | |
  Sbjct:    312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370

  Query:   3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378
                | | || || | | ||||      | ||    || | || ||| | | ||    || |
  Sbjct:    371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422

  Query:   3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437
                | | | || || ||| | ||| | | | | ||                || ||||| ||
  Sbjct:    423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478

  Query:   3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489
                 |||   | | |||| |||||||||| ||||| |||| |||| ||||||| || ||||
  Sbjct:    479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537

  Query:   3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541
                  ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| ||
  Sbjct:    538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597

  Query:   3542 TTTGT 3546
                 || |
  Sbjct:    598 CTTNT 602
Understand the
Statistics!
   BLAST produces an E-value for every match
       This is the same as the P value in a statistical test
   A match is generally considered significant if the
    E-value < 0.05 (smaller numbers are more significant)
   Very low E-values (e-100) are homologs or
    identical genes
   Moderate E-values are related genes
   Long regions of moderate similarity are more
    important than short regions of high identity.
BLAST is Approximate
   BLAST makes similarity searches very
    quickly because it takes shortcuts.
       looks for short, nearly identical “words” (11 bases)

   It also makes errors
       misses some important similarities
       makes many incorrect matches
            easily fooled by repeats or skewed composition
Bad Genome
Annotation
   Gene finding is at best only 90%
    accurate.
   New sequences are automatically
    annotated with BLAST scores.
   Bad annotations propagate
   Its going to take us 10-20 years or more
    to sort this mess out!
Conclusions
   We have only touched small parts of
    the elephant
   Trial and error (intelligently) is often
    your best tool
   Keep up with the main five sites, and
    you’ll have a pretty good idea of what is
    happening and available

More Related Content

What's hot

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Databricks
 
BLAST
BLASTBLAST
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
Sobia
 
BLAST
BLASTBLAST
BLAST
Rabia W.
 
BLAST
BLASTBLAST
BLAST
rishabhaks
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
contactsoorya
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BiotechOnline
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
University of California, Davis
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
Mahmoud Parsian
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
c.titus.brown
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
Sean Davis
 
31931 31941
31931 3194131931 31941
31931 31941
Amit Gupta
 

What's hot (20)

blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
BLAST
BLASTBLAST
BLAST
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
BLAST
BLASTBLAST
BLAST
 
BLAST
BLASTBLAST
BLAST
 
Harvester I
Harvester IHarvester I
Harvester I
 
Harvester Ii
Harvester IiHarvester Ii
Harvester Ii
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
 
Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Finding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/HadoopFinding Allelic Frequencies Using MapReduce/Hadoop
Finding Allelic Frequencies Using MapReduce/Hadoop
 
Myers CV_2015
Myers CV_2015Myers CV_2015
Myers CV_2015
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Arraygen_Brochure
Arraygen_BrochureArraygen_Brochure
Arraygen_Brochure
 
31931 31941
31931 3194131931 31941
31931 31941
 

Similar to Bioinformatics MiRON

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Article
ArticleArticle
Article
MisbahAlwi
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
Prof. Wim Van Criekinge
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
RitikaChoudhary57
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
KAUSHAL SAHU
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
xRowlet
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Arockiyajainmary
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Ahmed Abdellatif
 
Ncbi
NcbiNcbi
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
IJRTEMJOURNAL
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
ericndunek
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
Database Searching
Database SearchingDatabase Searching
Database Searching
Meghaj Mallick
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
alizain9604
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
Syed Lokman
 

Similar to Bioinformatics MiRON (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Article
ArticleArticle
Article
 
2012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les12012 03 01_bioinformatics_ii_les1
2012 03 01_bioinformatics_ii_les1
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Blasta
BlastaBlasta
Blasta
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Ncbi
NcbiNcbi
Ncbi
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 

Recently uploaded

2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 

Recently uploaded (20)

2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 

Bioinformatics MiRON

  • 2. Outline  Workshops chronology on hands out  Brief background information  Applications & role  Bioinformatics tools  Practical classes  Problem solving exercises  What’s expected of you ?  Questions/comments are welcome at all points
  • 3. Aims  To introduce the concepts and language of bioinformatics.  To provide an understanding of how nucleic acid and protein sequence data is obtained and analysed.  To develop skills in utilising online databases and interpreting data.  To develop an understanding of how bioinformatics can be applied to solve specific problems in biomedical science.  To develop transferable IT and communications skills.
  • 4. In this workshop…..  You will learn about how data is generated and analysed  As well as what the generated data can tell us about the molecular biology of organisms  And various practical applications of this knowledge
  • 6. Why bioinformatics?  Over the past decade massive amounts of sequence data have been generated  This has more recently been joined by gene expression data obtained from microarrays and proteomic technologies  This vast amount of data can only be analysed using various specialised computer algorithms
  • 7. Main Topics (Review............)  Genome organisation and analysis  Functional genomics  Advanced techniques in molecular biology  Archives, information retrieval and alignments:  Nucleic acid sequence databases; genome databases; protein sequence databases; database searching  Dot plots (SIMILARITY MATRX) and sequence alignments (PSI BLAST);  Genome expression: Microarray analysis, proteomics, eukaryotic genome expression
  • 10. Examples of Bioinformatics  Database interfaces  Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …  Sequence alignment  BLAST, FASTA  Multiple sequence alignment  Clustal W, MultAlin, DiAlign  Gene finding  Genscan, GenomeScan, GeneMark, GRAIL  Protein Domain analysis and identification  pfam, BLOCKS, ProDom,  Pattern Identification/Characterization  Gibbs Sampler, AlignACE, MEME  Protein Folding prediction  PredictProtein, SwissModeler
  • 11. Five W that all biologists should know  NCBI (The National Center for Biotechnology Information;  http://www.ncbi.nlm.nih.gov/  EBI (The European Bioinformatics Institute)  http://www.ebi.ac.uk/  The Canadian Bioinformatics Resource  http://www.cbr.nrc.ca/  SwissProt/ExPASy (Swiss Bioinformatics Resource)  http://expasy.cbr.nrc.ca/sprot/  PDB (The Protein Databank)  http://www.rcsb.org/PDB/
  • 12. Remember while using web server-based tools  You are using someone else’s computer  You are (probably) getting a reduced set of options or capacity  Servers are great for sporadic or proof- of-principle work, but for intensive work, the software should be obtained and run locally
  • 13. Human Gene Index Database  HGI is a database of expressed DNA sequences, mostly made of ESTs, which are a type of partial cDNA  EST stands for Expressed Sequence Tag  These short sequences were created using essentially the same method used to make cDNAs  As such they represent the expressed part of a genome and are made from mRNA which is ultimately expressed from GENES
  • 14.
  • 16. Similarity Searching  There are a variety of computer programs that are used for making comparisons between DNA sequences.  The most popular is known as BLAST (Basic Local Alignment Search Tool)  BLAST is free at the NCBI website
  • 17. BLAST is Complex  Similarity searching relies on the concepts of alignment and distance between pairs of sequences.  Distances can only be measured between aligned sequences (match vs. mismatch at each position).  A similarity search is a process of testing the best alignment of a query sequence with every sequence in a database.
  • 18. Workshop -1 (database search & inference of possible homology) Please refer to getting started with bioinformatics INTRO TO BLAST  Basic Local Alignment Search Tool  It is used to compare a query sequence with those contained in nucleotide databases by aligning the query sequence with previously characterised genes, therefore helping in identifying genes.  The emphasis of this tool is to find regions of sequence similarity between two different genes.  These sequence alignments can yield clues about the structure and function of a novel sequence, and about its evolutionary history and homology with other sequences in the database.
  • 19. BLAST has Automatic Translation  BLASTX makes automatic translation (in all 6 reading frames) of your DNA query sequence to compare with protein databanks  TBLASTN makes automatic translation of an entire DNA database to compare with your protein query sequence  Only make a DNA-DNA search if you are working with a sequence that does not code for protein.
  • 20. A typical sequence ready for submission to BLAST >THC2465887 GGCTGCGGAGGACCGACCGTCCCCACGCCTGCCGCCCCGCGACCCCGACCGCCAGCATGATCGCCGCGCAGCTCCTGGCC TATTACTTCACGGAGCTGAAGGATGACCAGGTCAAAAAGATTGACAAGTATCTCTATGCCATGCGGCTCTCCGATGAAAC TCTCATAGATATCATGACTCGCTTCAGGAAGGAGATGAAGAATGGCCTCTCCCGGGATTTTAATCCAACAGCCACAGTCA AGATGTTGCCAACATTCGTAAGGTCCATTCCTGATGGCTCTGAAAAGGGAGATTTCATTGCCCTGGATCTTGGTGGGTCT TCCTTTCGAATTCTGCGGGTGCAAGTGAATCATGAGAAAAACCAGAATGTTCACATGGAGTCCGAGGTTTATGACACCCC AGAGAACATCGTGCACGGCAGTGGAAGCCAGCTTTTTGATCATGTTGCTGAGTGCCTGGGAGATTTCATGGAGAAAAGGA AGATCAAGGACAAGAAGTTACCTGTGGGATTCACGTTTTCTTTTCCTTGCCAACAATCCAAAATAGATGAGGCCATCCTG ATCACCTGGACAAAGCGATTTAAAGCGAGCGGAGTGGAAGGAGCAGATGTGGTCAAACTGCTTAACAAAGCCATCAAAAA GCGAGGGGACTATGATGCCAACATCGTAGCTGTGGTGAA
  • 22. BLAST line-up of human v canine partial cDNAs for hexokinase 1 Query: 3034 TGCATGGTTTGATTTTGACCTGGTC---C---CCC-ACGTGTGAAGTGTAGTGGCATCCA 3086 |||||| | |||||| |||||||| | ||| ||||||||||| |||||||| ||| Sbjct: 75 TGCATGATCTGATTTCAACCTGGTCGTACGCTCCCCACGTGTGAAGTTTAGTGGCACCCA 134 Query: 3087 TTTCTAATGTATGCATTCATCCAACAGAGTTATTTATTGGCTGGAGATGGAAAATCACAC 3146 |||| | | | ||||||| || |||||||||||||||||| ||||| ||| |||| | Sbjct: 135 TTTCCAGTCTCTGCATTCGTCTGACAGAGTTATTTATTGGCCCAAGATGAAAAGTCACGC 194 Query: 3147 CACCTGACAGGCCTTCTGGG-CCTCCAAAGCCCATCCTTGGGGTTCCCCCTCCCTGTGTG 3205 || | | |||||||| |||| |||| ||||| ||||||||| | | ||||||||| Sbjct: 195 CATCCGCCAGGCCTTATGGGGCCTCTGCAGCCCGTCCTTGGGGACACATC-CCCTGTGTG 253 Query: 3206 AAATGTATTATCACCAGCAGACACTGCCGGGCCTCC-C-TCCCGGGGGCACTGCCTGAAG 3263 ||||||||||||||||||||||||||||||| |||| | |||| |||||| | | | Sbjct: 254 AAATGTATTATCACCAGCAGACACTGCCGGGACTCCTCCTCCCAGGGGCA-T-CTTAGCT 311 Query: 3264 GCGAG-TGTGGGCATAGCATTAGCTGCTTCCTCCCCTCCTG-GCA-CCCACTGTGGCC-T 3319 || | | | |||| ||||| || | ||| | | | |||| | || | | Sbjct: 312 GCTTCCTCCCGTCCCAGCACCCACTGCTGTCTGGCGTCCCGAGGATCCCA-TCAGGACGT 370 Query: 3320 GGC-ATCGCATCGTGGTGTGTCAATGCCACAAAATCGTGTGTCCGTGGAACCAGTCCTAG 3378 | | || || | | |||| | || || | || ||| | | || || | Sbjct: 371 GTCCATGCCACTGAGTCGTGTG--T-CCGTGGAA-C-TG-GTCAGAGCCACT--TCGTGA 422 Query: 3379 CCGCGTGTGACAGTCTTGCATTCTGTTTGTCTCGTGGGGGGAGGTGGACAG-TCCTGCGG 3437 | | | || || ||| | ||| | | | | || || ||||| || Sbjct: 423 CAGTCT-TG-CATTCTGTCTGTCT--TGGGGTGGNNGGNAAGNNNNNCCANNTCCTGTGG 478 Query: 3438 -AAAT--GTGTCTTGTCTCCATTTGGA-TAAAA-GGAA-CCAA--CCAACAAACAATGCC 3489 ||| | | |||| |||||||||| ||||| |||| |||| ||||||| || |||| Sbjct: 479 GAAAAAGGGGCCTTGGCTCCATTTGGGGTAAAAAGGAAACCAAACCCAACAA-CAGTGCC 537 Query: 3490 A-TCACTGG-AATTTCCC-ACCG-CTTT--GTGAGCCGTG-TCGTATGA-CCTAGTAAAC 3541 ||| ||| |||| ||| | | |||| ||||||| || | |||||| ||||| || Sbjct: 538 CCTCATTGGGAATTCCCCCATTGGCTTTTTGTGAGCCATGGTTGTATGAACCTAGGTAAA 597 Query: 3542 TTTGT 3546 || | Sbjct: 598 CTTNT 602
  • 23. Understand the Statistics!  BLAST produces an E-value for every match  This is the same as the P value in a statistical test  A match is generally considered significant if the E-value < 0.05 (smaller numbers are more significant)  Very low E-values (e-100) are homologs or identical genes  Moderate E-values are related genes  Long regions of moderate similarity are more important than short regions of high identity.
  • 24. BLAST is Approximate  BLAST makes similarity searches very quickly because it takes shortcuts.  looks for short, nearly identical “words” (11 bases)  It also makes errors  misses some important similarities  makes many incorrect matches  easily fooled by repeats or skewed composition
  • 25. Bad Genome Annotation  Gene finding is at best only 90% accurate.  New sequences are automatically annotated with BLAST scores.  Bad annotations propagate  Its going to take us 10-20 years or more to sort this mess out!
  • 26. Conclusions  We have only touched small parts of the elephant  Trial and error (intelligently) is often your best tool  Keep up with the main five sites, and you’ll have a pretty good idea of what is happening and available

Editor's Notes

  1. 25
  2. 28
  3. 30
  4. 31