In silico analysis for unknown data

In-silico Analysis for Unknown
Data
-Tata Santosh Rama Bhadra Rao
Agri Biotech Foundation

What is Bioinformatics?
Mathematics
and Statistics
Biology
Computer
Science

"All aspects of gathering, storing, handling,
analyzing, interpreting and spreading vast amounts
of biological information in databases. The
information involved includes gene sequences,
biological activity/function, pharmacological activity,
biological structure, molecular structure, protein-
protein interactions, and gene expression.
Bioinformatics uses powerful computers and
statistical techniques to accomplish research
objectives, for example, to discover a new
pharmaceutical or herbicide."
What is bioinformatics?

Task flow
• Data what we have
• Search for simlar data in available data base
• Clustal- W
• Phylogenetic analysis
• Classification
• Structural analysis
• Functional analysis
• Reporting

Data Outcome
• That may be a nucleotide sequence such as m-
RNA or gene or genome or protein sequence.
• Mostly 16s m-RNA is used to classify a gene or
species.
• With Forward and reverse sequences it will
more accurate.
• We can check with protein also.

Sample for DNA isolation
1
DNA
2 3

DNA
Symbol Meaning Explanation
G G Guanine
A A Adenine
T T Thymine
C C Cytosine
R A or G puRine
Y C or T pYrimidine
N A, C, G or T Any base
Double helix
5’
3’
3’
5’
A C G T C A T G
T G C A G T A C
RNA
5’ 3’A C G U C A U G
template
U U Uracil

Isolation of the gene of interest from
unknown sample
cDNA library construction kit from Stratagene
1st strand cDNA preparation
and mRNA removal
AAAA
AAAA
AAAA
TTTT
AAAA
TTTT
Removal of commonly hybridized population by
magnetic separation
Differentially up-regulated
mRNA population
Commonly expressed mRNA population
Control mRNA
AAAA
TTTT
TTTT
AAAA
TTTT
AAAA
AAAA
TTTT
AAAA
AAAA
AAAA
AAAA
TTTT
TTTT
TTTT
TTTT
stress mRNA
Hybridization of stress mRNA with excess of
complementary 1st strand control cDNA
TTTT TTTT

Gene and protein of EIF4A
ATGGCGGCGSCCACCACSTCCCGCCGCGGCGCCGGCGCCTCCCGCAGCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCGGGTG
TCGAGGTCGTCAGCAGCTTCGACCAGATGGGGATCAAGGACGACCTCCTCCGCGGCATCTACGGCTACGGGTTCGAGAAGCCCTCCGC
CATCCAGCAGCGCGCCGTCCTCCCCATCATCAACGGACGCGACGTCATCGCGCAGGCCCAGTCCGGCACCGGGAAGTCATCCATGATC
TCACTCACCGTATGCCAGATCGTCGACACCGCAGTCCGCGAGGTCCAGGCTCTGATCCTCTCACCCACCAGGGAGCTCGCTTCGCAGA
CAGAGAAGGTTATGCTGGCTGTCGGCGACTACCTCAATATCCAAGTGCACGCTTGCATTGGTGGGAAAAGTATCAGCGAGGATATCAG
GAGGCTTGAGAACGGAGTCCATGTTGTCTCTGGGACTCCGGGCAGAGTCTGCGATATGATCAAGAGGAGGACCCTGCGGACAAGAGCC
ATCAAGCTTCTAGTTCTGGATGAGGCTGATGAGATGTTGAGCAGAGGCTTTAAGGATCAGATTTACGATGTCTACAGATACCTCCCAC
CCGAACTTCAGGTCGTTTTGATCTCCGCCACTCTTCCTCACGAGATCCTAGAGATGACTAGCAAGTTCATGACCGAACCAGTTAGGAT
CCTTGTGAAGCGTGATGAGTTGACCCTGGAGGGTATCAAACAATTCTTCGTTGCTGTTGAGAAAGAGGAATGGAAGTTTGATACGCTG
TGTGATCTTTATGATACGTTGACCATCACCCAAGCTGTTATTTTCTGCAATACTAAGAGAAAGGTGGATTGGCTTACTGAAAGAATGC
GCAGCAATAACTTCACAGTATCAGCTATGCATGGTGACATGCCCCAACAGGAAAGGGATGCCATCATGACAGAGTTCAGGTCTGGTGC
AACTCGTGTGCTAATCACTACGGATGTTTGGGCTCGAGGGCTGGATGTTCAGCAGGTTTCACTTGTCATAAATTATGATCTCCCAAAT
AATCGTGAGCTTTACATCCATCGCATCGGTCGCTCTGGTCGTTTTGGGCGCAAGGGTGTGGCGATCAATTTTGTGCGCAAGGATGACA
TCCGTATCCTGAGGGATATAGAACAGTACTACAGCACACAAATTGATGAGATGCCAATGAATGTTGCTGATCTAATTTGA
"MAAXTTSRRGAGASRSMDDENLTFETSPGVEVVSSFDQMGIKDDLLRGIYGYGFEKPSAIQQRAVLPIINGRDVIAQAQSGTGKSSM
ISLTVCQIVDTAVREVQALILSPTRELASQTEKVMLAVGDYLNIQVHACIGGKSISEDIRRLENGVHVVSGTPGRVCDMIKRRTLRTR
AIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEPVRILVKRDELTLEGIKQFFVAVEKEEWKFDT
LCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQQERDAIMTEFRSGATRVLITTDVWARGLDVQQVSLVINYDLP
NNRELYIHRIGRSGRFGRKGVAINFVRKDDIRILRDIEQYYSTQIDEMPMNVADLI"

In-silico generated protein structures

13
ABOUT THE GENE AND PROTEINE
GENE LENGTH : 1224bp
INTRONS NUMBER : 7
EXON NUMBER : 8
GENE MOLECULAR WEIGHT : 378411.66 - 378491.72 Daltons
PROTEIN LENGTH : 407 AA
MOLECULAR WEIGHT : 45.2KDA
ISO ELECTIC POINT : 6.10

Search for simlar data in available data base
• The date will subjected for similar data search
in NCBI or Phytozome or some more available
databases with BLAST tool.
• Download the data from the data base.
Note:
• always keep data in notepad for working
convenience.
• Now we are presenting unpublished data.

Clustal- W
• Now the finalized data will subject to Clustal
alignment for sequence similarity.
• Clustal- W is the tool for searching and
mapping more similarities in sequences.
• This may allow for nucleotide sequences and
proteins.
• Mostly protein sequences are subjected for
the alignment for accuracy.

SB4g RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232
SACetif RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232
ZEAMMB73 RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 231
SIDb RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEITSKFMTEP 232
PgeiF4a RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 232
OS3g RTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDP 229
H RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHDILEITSKFMTDP 237
Phys RTRSIKLLILDESDEMLSRGFKDQIYDVYRYLPPELQVVLVSATLPHEILEMTNKFMTDP 222
Jat RTRAIRLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 235
RC RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 232
GM RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 232
PHAVU RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231
CA RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231
M RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231
CS-EIF4A-3-like RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTNKFMTDP 235
MD RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTNKFMTEP 227
***:*::*:***:****.****************:*** *:*****::***:*.****:*
Alignment for retrieved sequences

Phylogenetic analysis
• After alignment the data will subject for the
phylogenetic analysis.
• Here the relation between the data source will
be evaluated.
• Most similar sequence will place near the
sequence less similar sequence will place in
distance.
• By counting the distance we can measure the
relation between data source.

22
Fig 1: 20-404: P-LOOP COTAINIG NUCLIOSIDE TRIOSE PHOSPATE
HYDROLASE(ipr027417).
34-62: RNA- HELICASE, DEAD BOX TYPE Q-MOTIF (IPR014014).
246-407: HELICASE C-TERMINAL (IPR001650)
183-186: REPRESENCE OF DEAD AMINO ACIDS
ATG GCG GCG SCC ACC ACS TCC CGC CGC GGC GCC GGC GCC TCC CGC AGC ATG GAC GAC GAG AAC CTC ACC TTC
M A A X T T S R R G A G A S R S M D D E N L T F 24
GAG ACC TCC CCG GGT GTC GAG GTC GTC AGC AGC TTC GAC CAG ATG GGG ATC AAG GAC GAC CTC CTC CGC GGC
E T S P G V E V V S S F D Q M G I K D D L L R G 48
ATC TAC GGC TAC GGG TTC GAG AAG CCC TCC GCC ATC CAG CAG CGC GCC GTC CTC CCC ATC ATC AAC GGA CGC
I Y G Y G F E K P S A I Q Q R A V L P I I N G R
GAC GTC ATC GCG CAG GCC CAG TCC GGC ACC GGG AAG TCA TCC ATG ATC TCA CTC ACC GTA TGC CAG ATC GTC
D V I A Q A Q S G T G K S S M I S L T V C Q I V
GAC ACC GCA GTC CGC GAG GTC CAG GCT CTG ATC CTC TCA CCC ACC AGG GAG CTC GCT TCG CAG ACA GAG AAG
D T A V R E V Q A L I L S P T R E L A S Q T E K
GTT ATG CTG GCT GTC GGC GAC TAC CTC AAT ATC CAA GTG CAC GCT TGC ATT GGT GGG AAA AGT ATC AGC GAG
V M L A V G D Y L N I Q V H A C I G G K S I S E
GAT ATC AGG AGG CTT GAG AAC GGA GTC CAT GTT GTC TCT GGG ACT CCG GGC AGA GTC TGC GAT ATG ATC AAG
D I R R L E N G V H V V S G T P G R V C D M I K
AGG AGG ACC CTG CGG ACA AGA GCC ATC AAG CTT CTA GTT CTG GAT GAG GCT GAT GAG ATG TTG AGC AGA GGC
R R T L R T R A I K L L V L D E A D E M L S R G
TTT AAG GAT CAG ATT TAC GAT GTC TAC AGA TAC CTC CCA CCC GAA CTT CAG GTC GTT TTG ATC TCC GCC ACT
F K D Q I Y D V Y R Y L P P E L Q V V L I S A T
CTT CCT CAC GAG ATC CTA GAG ATG ACT AGC AAG TTC ATG ACC GAA CCA GTT AGG ATC CTT GTG AAG CGT GAT
L P H E I L E M T S K F M T E P V R I L V K R D
GAG TTG ACC CTG GAG GGT ATC AAA CAA TTC TTC GTT GCT GTT GAG AAA GAG GAA TGG AAG TTT GAT ACG CTG
E L T L E G I K Q F F V A V E K E E W K F D T L
TGT GAT CTT TAT GAT ACG TTG ACC ATC ACC CAA GCT GTT ATT TTC TGC AAT ACT AAG AGA AAG GTG GAT TGG
C D L Y D T L T I T Q A V I F C N T K R K V D W
CTT ACT GAA AGA ATG CGC AGC AAT AAC TTC ACA GTA TCA GCT ATG CAT GGT GAC ATG CCC CAA CAG GAA AGG
L T E R M R S N N F T V S A M H G D M P Q Q E R
GAT GCC ATC ATG ACA GAG TTC AGG TCT GGT GCA ACT CGT GTG CTA ATC ACT ACG GAT GTT TGG GCT CGA GGG
D A I M T E F R S G A T R V L I T T D V W A R G
CTG GAT GTT CAG CAG GTT TCA CTT GTC ATA AAT TAT GAT CTC CCA AAT AAT CGT GAG CTT TAC ATC CAT CGC
L D V Q Q V S L V I N Y D L P N N R E L Y I H R
ATC GGT CGC TCT GGT CGT TTT GGG CGC AAG GGT GTG GCG ATC AAT TTT GTG CGC AAG GAT GAC ATC CGT ATC
I G R S G R F G R K G V A I N F V R K D D I R I
CTG AGG GAT ATA GAA CAG TAC TAC AGC ACA CAA ATT GAT GAG ATG CCA ATG AAT GTT GCT GAT CTA ATT TGA
L R D I E Q Y Y S T Q I D E M P M N V A D L I *

Structural analysis
• Structural analysis will conduct for protein
through homology modeling & docking.
• The protein sequence secondary structure and
tertiary structure analysis must be done.
• This structure analysis must be evaluated
under Nuclear magnetic resonance score and
X-Ray crystallographic score.
• Ramachandra plot is more important for
structural validation.

24
Insilco analysis of eIF4A
Homology modeling:
by using Modeller 9.12 version we have designed structure of eIF4A
Pennisetum glaucum
α-helics
β- pleated
sheets
DEAD box
motif
Fig: Homology modeling of amino acid sequence of eiF4A from P. glaucum
revealing the signature motifs of DEAD box and Mg2+ binding sites. eiF4A
showed the ----helices and --------sheets.

Nuclear magnetic resonance analysis
for protein structure

REPRESENTATION OF RAMA CHANDRAN PLOT FOR RICE AND PEARL
MILLET EIF4A STRUCTURES DONE BY PROCHECK

Functional analysis
• Functional analysis will be done with domain
and conserved motifs and active site analysis.
• These are evaluated with docking and amino
acid composition.
• Depend on αhelices β-pleated sheets the
protein structure can be obtained.

29
Docking analysis and motif localization in Pennisetum glaccum EIF4A
Docking analysis was
performed by using Sybil
6.7 version for motif
analysis and structural
stability.

Rice and pearl millet Active sites, Motifs and Domains of eif4a
respectively done by docking studies

Classification
• Functional analysis and structural analysis can
classify our protein.
• At first we got the relation of the protein
through phylogenetic analysis.
• Now with structural and functional characters
can be include and clear classification will be
performed.

Reporting
• Now the data which was evaluated in a way
with accuracy you can publish or report.
• So many submissions and sequence uploads
are taking place at various levels.
• Genes are reporting, proteins are reporting,
genomes are also reporting to those
databases.
• Those will be available for further research
aspects.

Conclusion
• With In-silico studies you will get 60 to 70%
accuracy of the information regarding your
work.
• With this you can confirm whether you are
working on proper thing or not before starting
your In-vitro studies.
• So you can proceed towards your work with
70% of In-silico information and complete the
project with 100% success in .

Acknowledgement
• Agri biotech foundation
• Department of Biotechnology
• Prof . G. Pakkireddy,
• Dr. J. S. Bentur
• Dr. G. Mallikarjun
• My Friends and colleagues
• Dearest participants (transformed with high
energy and patience)

In silico analysis for unknown data

In silico analysis for unknown data

More Related Content

What's hot

Similar to In silico analysis for unknown data

Recently uploaded

In silico analysis for unknown data

Editor's Notes