Bioinfomatics laboratory

P a g e | 1
BGE 313
Bioinformatics Laboratory
Submitted To, Submitted By,
Dr. Siraje Arif Mahmud Effat Jahan Tamanna
Associate Professor Roll No: 131647
Dept. of Biotechnology and Genetic Engineering Reg No: 36401
Jahangirnagar University Session: 2012 – 13

P a g e | 2
INDEX
Serial
No
Date Name of the Experiment Page Remarks
01 21.04.2016 Searching basics, AND, OR, NOT,
“keywords together”, *
3 – 8
02 21.04.2016 Searching PMC and PubMed Using
Authors name, fields, limits
8 – 10
03 02.05.2016 Retrieving protein sequences using
UniProt and creating multi-fasta files
11
04 02.05.2016 Retrieving relevant DNA sequences
using nucleotide and creating multi
fasta-file. (search by ldh1 NOT
hypothetical)
12
05 02.05.2016 Performing DNA and protein BLAST
and analyzing result
13 – 16
06 03.05.2016 Pairwise alignment (global, end gap
free), calculate identities, dotplot
using BioEdit
17 – 20
07 03.05.2016 Nucleotide composition, complement,
reverse complement, DNA to RNA,
translate, restriction map, six frame
translation using BioEdit
21 – 27
08 03.05.2016 Multiple sequence analysis using
BioEdit
27 – 28
09 03.05.2016 Tree Generation with MEGA 29 – 32
10 09.05.2016 Working with single protein sequence:
Analyzing protein composition
(pepdigest, pepstats), Protein
secondary structure by mEmboss:
(garnier for protein secondary
structure), helixturnhelix for motifs,
pepcoil for coiled coil regions
34 – 36
11 09.05-2016 RNA structure prediction using
RNAstructure
36 – 37

P a g e | 3
Experiment No 01
Searching basics, AND, OR, NOT, “keywords together”, *
i. Searching Basics
Methods:
 Open PubMed home page
 In PubMed search box write alpha amylase and click search. 10639 results will be shown.
 To filter the result click on free full text from text availability section. Results will be reduced into
3108 in number.
 Then click on 5 years from Publication Dates section. Rest will be reduced into 792 in number.
 Click on Review from Article type section. Rest will be reduced into 16 in number.
 If we want to clear filter, we have to click clear on the right side of all filter type, or clear all.
Result:
Interpretation:
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts
on life sciences and biomedical topics. PubMed is maintained and updated by the National Library of
Medicine on a weekly basis. A search on alpha amylase shows results all articles related to alpha
amylase. If we want to filer results we can click on review which will show reviewed articles. Full free
text will reduce result by filtering free articles and 5 years will reduce results by showing articles which
were published in previous 5 years.

P a g e | 4
ii. Boolean Operator Using
a. AND
Methods:
 Write gyrase AND topoisomerase and search. 1986 results are found.
 Then filtering. Click Free full text, Review and 5 years. Results will be reduced into 1032, 31 and 6.
Result:
Interpretation:
AND requires both terms to be in each item returned. If one is contained in the document and the other is
not, the item is not included in the resulting list (narrows the search). A search on gyrase AND
topoisomerase includes results that all articles will include both keywords.
b. OR
Methods:
 Search antibody OR immunoglobulin.
 1247231 results are found.
 Now filtering. Free full text – 363108, Review - 17602 , 5 years – 7068

P a g e | 5
Result:
Interpretation:
Either term (or both) will be in the returned document (Broadens the search). Search on antibody OR
immunoglobulin includes results contains that the articles containing the word antibody (but not
immunoglobulin) and other articles containing the word immunoglobulin (but not antibody) as well as
articles with antibody OR immunoglobulin in either order or number of uses.
c. NOT
Methods:
 Search immunoglobulin NOT IgG NOT IgA NOT IgM
 693211 results are found.
 Now filtering
 Free full text - 181743
 Review - 10047
 5 years – 3618

P a g e | 6
Result:
Interpretation:
When the first term is searched, then any records containing the term after the operators are subtracted
from the results. A search on immunoglobulin NOT IgG NOT IgA NOT IgM includes results contains
that the articles about immunoglobulin will exclude IgG, IgA and IgM.
iii. Inverted (“ ”) search
Methods:
 Search “alpha amylase”
 8015 results are found
 Now Filtering
 Free full text – 2393
 Review – 25
 5 years – 12

P a g e | 7
Result:
Interpretation:
When any term is searched, then any records containing the term exactly will be shown. A search on
“alpha amylase” includes results contains that the articles will contain the word exactly and the result will
be more specific.
iv. * search
Methods:
 Search ldh*
 Now filtering
 Free full text – 5691
 Review – 96
 5 years – 46

P a g e | 8
Result:
Interpretation:
When any term is searched with *, then the records containing all the subclasses of the term exactly will
be shown. A search on ldh* includes results contains that the articles will contain all the subclasses of
ldh.
Experiment No 02:
Searching PMC and PubMed Using Authors name, fields, limits
i. Searching PubMed using author name
Methods:
 Write Schilling CH (1999) in search box and search.

P a g e | 9
Result:
Interpretation:
PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts
on life sciences and biomedical topics. PubMed is maintained and updated by the National Library of
Medicine on a weekly basis. Search on Schilling CH (1999) shows the result contain the free article by
the author Schilling CH in 1999.
ii. Searching PMC using author name
Methods:
 Select PMC.
 Write Schilling CH in search box and search.

P a g e | 10
Result:
Interpretation:
PubMed Central is a free digital archive of articles, accessible to anyone from anywhere via a basic web
browser. The full text of all PubMed Central articles is free to read, with varying provisions for reuse.
Search on Schilling CH shows the result containing the free article by the author Schilling CH.

P a g e | 11
Experiment No 03:
Retrieving protein sequences using UniProt and creating multi-fasta files
Methods:
 Open UniProt home page.
 Search by writing ldh1.
 Filter this by clicking Reviewed (893) from filter by option.
 In left side in the box of other organisms write lactobacillus and click go. 16 results will be shown.
 Now click on the box of Entry for selecting all.
 Now click on download → uncompressed → go
 Select all sequences. (Ctrl + A)
 Copy all sequences. (Ctrl + C)
 Open Notepad.
 Paste All Sequences (Ctrl + V)
 Now save these sequences. Click file → save as ( Ctrl + S) → select location → write ldh1P.fasta in
file name → click save
Result:
Interpretation:
UniProt is the Universal Protein resource, a central repository of protein data created by combining the
Swiss-Prot, TrEMBL and PIR-PSD databases. We can search any protein sequence from UniProt. We can
create multi fasta files and save them in notepad. We can further use this fasta files when we need them.

P a g e | 12
Experiment No: 4
Retrieving relevant DNA sequences using nucleotide and creating multi fasta-
file. (Search by ldh1 NOT hypothetical)
Methods:
 Open PubMed Home Page.
 Select nucleotide.
 Search by writing ldh1 NOT hypothetical. 51 results will be found.
 Select Number 4, 5, 6, 9.
 Select on the right arrow of Summary. From here select FASTA (text).
 Select all sequences (Ctrl + A).
 Copy all sequences (Ctrl + C).
 Open Notepad.
 Paste All Sequences (Ctrl + V)
 Now save these sequences. Click file → save as ( Ctrl + S) → select location → write ldh1.fasta
in file name → click save
Result:
Interpretation:
We can search any nucleotide sequence from PubMed. We can create multi fasta files and save them in
notepad. We can further use this fasta files when we need them.

P a g e | 13
Experiment No: 05
Performing DNA and protein BLAST and analyzing result
i. Blastn
Methods:
 Open blast home page.
 Select nucleotide blast
 Copy a nucleotide sequence from previously saved ldh1.fasta file.
 Paste the sequence into Enter accession number(s), gi(s), or FASTA sequences(s) box.
 From Choose Search Set options select nucleotide collection (nr/nt).
 From Algorithm parameters select: Max target sequences (50)→ Expect threshold (0.1)
 Then click show results in new window and click BLAST.
Methods:
ii. Blastp
Methods:
 Select nucleotide blast.
 Select blastp.
 Copy a protein sequence from previously saved ldh1P.fasta file.
 From Choose Search Set options select non-redundant protein sequences (nr).
 From Algorithm parameters select: Max target sequences (50)→ Expect threshold (0.1)

P a g e | 14
Results:
iii. blastx
Methods:
 Select nucleotide blast.
 Select blastx.
 Copy a nucleotide sequence from previously saved ldh1.fasta file.
 From Choose Search Set options select non-redundant protein sequences (nr).
 From Algorithm parameters select: Max target sequences (50) → Expect threshold (0.1).
Results:

P a g e | 15
iv. tlastn
Methods:
 Select “nucleotide blast”
 Select “tblastn”.
 Copy a protein sequence from previously saved “ldh1P.fasta” file.
 Paste the sequence into “Enter accession number(s), gi(s), or FASTA sequences(s)” box.
 From “Choose Search Set” options select “nucleotide collection (nr/nt)”.
 From “Algorithm parameters” select: Max target sequences (50)→ Expect threshold (0.1)
 Then click “show results in new window” and click “BLAST”
Results:

P a g e | 16
v. tlastx
Methods:
 Select “nucleotide blast”
 Select “tblastx”.
 Copy a nucleotide sequence from previously saved “ldh1.fasta” file.
 Paste the sequence into “Enter accession number(s), gi(s), or FASTA sequences(s)” box.
 From “Choose Search Set” options select “nucleotide collection (nr/nt)”.
 From “Algorithm parameters” select: Max target sequences (50)→ Expect threshold (0.1)
 Then click “show results in new window” and click “BLAST”
Interpretation:
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The
program compares nucleotide or protein sequences to sequence databases and calculates the statistical
significance of matches. blastn- Search a nucleotide database using a nucleotide query, blastp- Search
protein database using a protein query, blastx- Search protein database using a translated nucleotide
query, tblast- Search translated nucleotide database using a protein query, tblastx- Search translated
nucleotide database using a translated nucleotide query. During the result of blast, in graphic summary
the red portion indicates the query coverage. From blast result we can download fasta files.

P a g e | 17
Experiment No- 06:
Pairwise alignment (global, end gap free), calculate identities, dotplot using BioEdit
i. Pairwise alignment- global
Methods:
 Open BioEdit
 Click file → open
 Select All Files (*.*) from files of type
 Open ldh1.fasta file.
 Select two sequences.
 Click on sequence → pairwise alignment → align two sequences (optimal GLOBAL alignment)
 Can click “Shade identities and similarities in alignment window” for color shade or “normal
view” mode for normal view.
Results:
Interpretation:
Global alignments, which attempt to align every residue in every sequence, are most useful when the
sequences in the query set are similar and of roughly equal size. In BioEdit we aligned to sequences
globally. By color shading we noticed identities and similarities in alignment window.

P a g e | 18
ii. Pairwise alignment- end gap free
Methods:
 Open BioEdit
 Click on sequence → pairwise alignment → align two sequence (allow ends to slide)
 Can click “Shade identities and similarities in alignment window” for color shade or “normal
view” mode for normal view.
Results:
Interpretation:
In end gap free alignment Gaps that appear before the ﬁrst or after the last letter of the sequence are for.
Especially preferable whenever one of the sequences is signiﬁcantly shorter than the other. We used two
sequences to align end gap free. By color shading we observed identities and similarities in alignment
window.

P a g e | 19
iii. Calculate identities
Methods:
 Open BioEdit
 Click on sequence → pairwise alignment → calculate identity/similarity for two sequences.
Interpretation:
Sequence identity is the amount of characters which match exactly between two different sequences.
Hereby, gaps are not counted and the measurement is relational to the shorter of the two sequences. In this
experiment, we found identities between two sequences by BioEdit.
iiv. Dotplot
Methods:
 Open BioEdit
 Click on sequence → pairwise alignment → Dot plot (Pairwise comparison)

P a g e | 20
Result:
Window – 20, mismatch limit - 10
Interpretation:
Dot plot is a graphical method that allows the comparison of two sequences and identify regions of close
similarity between them. The convenience of using dot-plot analysis is that the one graphics shows all
significant pairwise alignments simultaneously. We constructed a dot plot window using BioEdit. We can
see a lot of dots along a diagonal line, which indicates that the two protein sequences contain many
identical amino acids at the same (or very similar) positions along their lengths. This is what we would
expect, because we know that these two proteins are homologues (related proteins).

P a g e | 21
Experiment No – 07
Nucleotide composition, complement, reverse complement, DNA to RNA,
translate, restriction map, six frame translation using BioEdit.
i. Nucleotide Composition
Methods:
 Open BioBdit
 Click file → open → Select All Files (*.*) from files of type
 Open ldh1.fasta file
 Select one sequence
 Click on sequence → nucleic acid → nucleotide composition
Result:
Interpretation:
Nucleotide composition summaries and plots may be obtained by choosing “Nucleotide Composition”
form the “Nucleic Acid” submenu of the “Sequence” menu, respectively. Bar plots show the Molar
percent of each residue in the sequence. For nucleic acids, degenerate nucleotide designations are added
to the plot if and as they are encountered. For example, a sequence that has only A, G, C and T will have
four bars on the graph. We can observe molecular weight, A+T content, G+C content.

P a g e | 22
ii. Complement
Methods:
 Open BioEdit.
 Click file → open.
 Select All Files (*.*) from files of type.
 Select one sequence.
 Click on sequence → nucleic acid → complement.
 For undo select the sequence →Click on sequence → nucleic acid → complement.
Result:
Interpretation:
We can get the complement sequence of the given sequence. If we want to get back the previous
sequence, we have to complement the sequence again.
iii. Reverse Complement
Methods:
 Open BioEdit
 Click on sequence → nucleic acid → reverse complement
 For undo select the sequence →Click on sequence → nucleic acid → reverse complement

P a g e | 23
Results:
Interpretation:
We can get the reverse complement sequence of the given sequence. If we want to get back the previous
sequence, we have to reverse complement the sequence again.
iv. DNA to RNA
Methods:
 Open BioEdit
 Click on sequence → nucleic acid → DNA - > RNA
 For undo select the sequence →Click on sequence → nucleic acid → RNA -> DNA
Result:
Interpretation:
DNA sequence converts into RNA sequence. In RNA sequence there is no thymine (T), Instead of
thymine there is Uracil (U).

P a g e | 24
v. Translate
Methods:
 Open BioEdit
 Click on sequence → nucleic acid → translate → frame 1, frame 2, frame 3
 To get the remaining 3 frames select the sequence →nucleic acid → reverse complement →
again select sequence → nucleic acid → translate → frame 1, frame 2, frame 3
Result:

P a g e | 25
Interpretation:
We know that there are six frames. Three are forward frames and three are reverse frames. Selecting a
sequence and clicking by frame 1, frame 2, frame 3 we can get all forward frames. We can also observe
that every three nucleotides code which amino acid. By reverse complement of a sequence we can get
remaining three reverse frame and every three nucleotides code the specific amino acid. In the experiment
we got 3 forward and 3 reverse frames of a selected sequence.
vi. Restriction Map
Methods:
 Open BioEdit
 Click on sequence → nucleic acid → restriction map → cancel enzyme with degenerate
recognition and large recognition sites → select all enzymes from manufacturer → select circular
DNA (ends joint) → generate map
Results:

P a g e | 26
Interpretation:
A restriction map is a map of known restriction sites within a sequence of DNA. Restriction mapping
requires the use of restriction enzymes. Restriction Map accepts a DNA sequence and returns a textual
map showing the positions of restriction endonuclease cut sites. From this map which we found in
experiment show different restriction site which are cut by different restriction enzyme.
vii. Six frame translation
Sorted six frame translation:
 Open BioEdit
 Click on sequence → nucleic acid → sorted six frame translation → minimum OFR size 40→
start codon ATG → translate
Result:
Unsorted six frame translation:
 Open BioEdit
 Click on sequence → nucleic acid → unsorted six frame translation → minimum OFR size 40 →
start codon ATG → translate

P a g e | 27
Interpretation:
A DNA sequence may be translated in all six reading frames into all possible open reading frames (simple
codon stretches, actually) by highlighting the sequence title in the document window and choosing either
“Sorted Six-Frame Translation” or “Unsorted Six-Frame Translation”
Sorted: ORFs will be reported in order of start position. Negative-frame sequences are sorted according
to their end positions (first position along the positive sequence). The number of sequences which can be
translated and sorted is limited to something above 10,500 sequences. If a sorted translation becomes too
large, resources for storing the sequences to be sorted runs out. If this happens, BioEdit will tell you, then
present the sequences it was able to translate. Multiple sequences may be translated into a single ORF list
suitable for BLAST database creation.
Unsorted: Sequences are reported in the order that their stop codons are encountered in a once through,
6-frame simultaneous pass through the entire sequence. The codon stretches are written into a file as they
are encountered and therefore do not need to be stored in memory. Very long lists can thus be generated.
Currently, only one sequence at a time may be translated this way.
Experiment No 08:
Multiple sequence analysis using BioEdit
Multiple nucleotide sequence analysis:
 Open BioEdit
 Select all sequences.
 Click on accessory application → ClustalW Multiple alignment → Run ClustalW → Shade
identities and similarities in alignment window

P a g e | 28
Result:
Multiple nucleotide sequence analysis:
 Open BioEdit
 Open ldh1P.fasta file.
 Click on accessory application → ClustalW Multiple alignment → Run ClustalW → Shade
identities and similarities in alignment window
Result:
Interpretation:
Multiple Sequence Alignment (MSA) is generally the alignment of three or more biological sequences
(protein or nucleic acid) of similar length. From the output, homology can be inferred and the
evolutionary relationships between the sequences studied. ClustalW is a multiple sequence alignment tool
for the alignment of DNA or protein sequences. ClustalW calculates the best match for the input
sequences based on the parameters entered and generates an easy to interpret report. In previous
experiment we observed the alignment of more than two sequences for both nucleotide and protein
sequences. By the color shading we can find similarities an identities among the sequences.

P a g e | 29
Experiment No 09:
Tree Generation with MEGA.
i. Construct/test maximum likelihood Tree
Methods:
 Open BioEdit
 Open ldh1P.fasta file.
 Click on accessory application → ClustalW Multiple alignment → Run ClustalW
 Click on file → save as → select location → File name (ldh-P-aln.fasta) → save as type : Fasta
(*.fas, *.fst, *.fsa) → save
 Open MEGA 6
 Click file → open → ldh-P-aln.fasta → analyze → select protein sequences → Ok
 Click phylogeny →construct/test maximum likelihood Tree → Yes
 Test of Phylogeny (Bootstrap method) → No. of Bootstrap Replications (50) → substitution type
(amino acid) → Model/ method (dayhoff model) → rates among sites (Gamma distributed with
Invariant sites- G+1) → Compute
Result:

P a g e | 30
ii. Construct/ test neighbor joining tree
Methods:
 Click phylogeny →construct/ test neighbor joining tree → Yes
(amino acid) → Model/ method (dayhoff model) → rates among sites (Gamma distributed- G) →
Compute
Result:
iii. Construct/ test minimum- evolution tree
Methods:
 Click phylogeny →construct/ test minimum- evolution tree → Yes
Compute

P a g e | 31
Result:
iv. Construct/ test UPGMA tree
Methods:
 Click phylogeny →construct/ test UPGMA tree → Yes
Compute
Result:

P a g e | 32
v. Construct/ test maximum parsimony tree
Methods:
 Click phylogeny →construct/ test UPGMA tree → Yes
(amino acid) → Compute
Result:
Interpretation:
A phylogeny, or evolutionary tree, represents the evolutionary relationships among a set of organisms or
groups of organisms, called taxa (singular: taxon). The tips of the tree represent groups of descendent taxa
(often species) and the nodes on the tree represent the common ancestors of those descendants. Two
descendants that split from the same node are called sister groups. With molecular evolutionary genetics
analysis (MEGA) we constructed different types of phylogeny by inputing protein sequences.
Experiment No 10:
Working with single protein sequence: Analyzing protein composition
(pepdigest, pepstats), Protein secondary structure by mEmboss: (garnier for
protein secondary structure, helixturnhelix for motifs, pepcoil for coiled coil
regions.
i. Analyzing protein composition (pepstats)
Methods:
 Open mEMBOSS
 Click protein → composition → pepstats calculate statistics of protein properties
 In input section click on paste → cut and paste protein sequence → Go

P a g e | 33
Result:
Interpretation:
pepstats reads one or more protein sequences and writes an output file with various statistics on
the protein properties. This includes: molecular weight, number of residues, average residue
weight, charge, isoelectric point, for each type of amino acid: number, molar percent,
DayhoffStat, for each physico-chemical class of amino acid: number, molar percent; probability
of protein expression in E. coli inclusion bodies, molar extinction coefficient (A280), extinction
coefficient at 1 mg/ml (A280). In previous experiment we input a protein sequence and got these
data.
ii. Analyzing protein composition (pepdigest)- trypsin
Methods:
 Open mEMBOSS
 Click protein → composition → pepdigest report on protein proteolytic enzyme or reagent
cleavage sites
 In input section click on paste → cut and paste protein sequence
 In required section select trypsin → Go

P a g e | 34
Result:
Analyzing protein composition (pepdigest)- chymotrypsin
Methods:
 Open mEMBOSS
 Click protein → composition → pepdigest report on protein proteolytic enzyme or reagent
cleavage sites
 In input section click on paste → cut and paste protein sequence
 In required section select chymotrypsin → Go
Result:

P a g e | 35
Interpretation:
This programs allows to input one or more protein sequences and to specify one proteolytic
agent from a list, which might be a proteolytic enzyme or other reagent. It will then write a report
file containing the positions where the agent cuts, together with the peptides produced. The rest
of the file consists of columns holding the following data: start position of the fragment, end
position of the fragment, molecular weight of the fragment, residue before the cut site ('.' if start
of sequence), residue after the second cut site ('.' if end of sequence), sequence of the fragment.
In previous experiment we input a protein sequence and selected the proteolytic enzyme trypsin
and chymotrypsin and finally got these data as result.
iii.Protein secondary structure by mEmboss: ( garnier for protein secondary structure)
Methods:
 Open mEMBOSS
 Click protein → 2D STRUCTURE → garnier predict protein secondary structure using GOR
method
 In input section click on paste → cut and paste protein sequence → Go
Result:
Interpretation:
Garnier is an implementation of the original Garnier Osguthorpe Robson algorithm (GOR I) for
predicting protein secondary structure. It reads an input protein sequence and writes a standard EMBOSS
report file with the predicted secondary structure. The Garnier method is not regarded as the most
accurate prediction, but is simple to calculate on most workstations. In this experiment we input protein
sequence and got secondary structure.

P a g e | 36
iv.Protein secondary structure by mEmboss: helixturnhelix for motifs
Methods:
 Open mEMBOSS
 Click protein → 2D STRUCTURE → helixturnhelix identify nucleic acid-binding motifs in
protein sequences
 In input section click on paste → cut and paste any protein sequence → Go
Result:
Interpretation:
helixturnhelix uses the method of Dodd and Egan to identify helix-turn-helix nucleic acid binding motifs
in an input protein sequence. The output is a standard EMBOSS report file describing the location, size
and score of any putative motifs. For the sequence we input we found the output which identify nucleic
acid-binding motifs in protein sequences
Experiment No 11:
RNA structure prediction using RNAstructure
Methods:
 Open RNAstructure
 Click file→ new sequence
 Title- ldh1RNA → sequence (copy and paste 2 lines of sequences from ldh1.fasta) → fold as
RNA → yes → select location → file name - ldh1RNA → save → start → draw structures
 Draw → go to structure number/ zoom

P a g e | 37
Interpretation:
RNAstructure is a software package for RNA secondary structure prediction and analysis. It predicts
lowest free energy structures and low free energy structures either by using a heuristic or by determining
all possible low free energy structures. From this process we can find RNA secondary structure with
different energy level. The structure with lowest energy is more stable. To perform this process we should
use accurate maximum & minimum energy different, maximum number of structure, window size etc.

Bioinfomatics laboratory

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bioinfomatics laboratory

Similar to Bioinfomatics laboratory (20)

Recently uploaded

Recently uploaded (20)

Bioinfomatics laboratory