In-silico Analysis for Unknown
Data
-Tata Santosh Rama Bhadra Rao
Agri Biotech Foundation
What is Bioinformatics?
Mathematics
and Statistics
Biology
Computer
Science
"All aspects of gathering, storing, handling,
analyzing, interpreting and spreading vast amounts
of biological information in databases. The
information involved includes gene sequences,
biological activity/function, pharmacological activity,
biological structure, molecular structure, protein-
protein interactions, and gene expression.
Bioinformatics uses powerful computers and
statistical techniques to accomplish research
objectives, for example, to discover a new
pharmaceutical or herbicide."
What is bioinformatics?
Task flow
• Data what we have
• Search for simlar data in available data base
• Clustal- W
• Phylogenetic analysis
• Classification
• Structural analysis
• Functional analysis
• Reporting
Data Outcome
• That may be a nucleotide sequence such as m-
RNA or gene or genome or protein sequence.
• Mostly 16s m-RNA is used to classify a gene or
species.
• With Forward and reverse sequences it will
more accurate.
• We can check with protein also.
Genetic code table
Sample for DNA isolation
1
DNA
2 3
DNA
Symbol Meaning Explanation
G G Guanine
A A Adenine
T T Thymine
C C Cytosine
R A or G puRine
Y C or T pYrimidine
N A, C, G or T Any base
Double helix
5’
3’
3’
5’
A C G T C A T G
T G C A G T A C
RNA
5’ 3’A C G U C A U G
template
U U Uracil
Isolation of the gene of interest from
unknown sample
cDNA library construction kit from Stratagene
1st strand cDNA preparation
and mRNA removal
AAAA
AAAA
AAAA
TTTT
AAAA
TTTT
Removal of commonly hybridized population by
magnetic separation
Differentially up-regulated
mRNA population
Commonly expressed mRNA population
Control mRNA
AAAA
TTTT
TTTT
AAAA
TTTT
AAAA
AAAA
TTTT
AAAA
AAAA
AAAA
AAAA
TTTT
TTTT
TTTT
TTTT
stress mRNA
Hybridization of stress mRNA with excess of
complementary 1st strand control cDNA
TTTT TTTT
Gene and protein of EIF4A
ATGGCGGCGSCCACCACSTCCCGCCGCGGCGCCGGCGCCTCCCGCAGCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCGGGTG
TCGAGGTCGTCAGCAGCTTCGACCAGATGGGGATCAAGGACGACCTCCTCCGCGGCATCTACGGCTACGGGTTCGAGAAGCCCTCCGC
CATCCAGCAGCGCGCCGTCCTCCCCATCATCAACGGACGCGACGTCATCGCGCAGGCCCAGTCCGGCACCGGGAAGTCATCCATGATC
TCACTCACCGTATGCCAGATCGTCGACACCGCAGTCCGCGAGGTCCAGGCTCTGATCCTCTCACCCACCAGGGAGCTCGCTTCGCAGA
CAGAGAAGGTTATGCTGGCTGTCGGCGACTACCTCAATATCCAAGTGCACGCTTGCATTGGTGGGAAAAGTATCAGCGAGGATATCAG
GAGGCTTGAGAACGGAGTCCATGTTGTCTCTGGGACTCCGGGCAGAGTCTGCGATATGATCAAGAGGAGGACCCTGCGGACAAGAGCC
ATCAAGCTTCTAGTTCTGGATGAGGCTGATGAGATGTTGAGCAGAGGCTTTAAGGATCAGATTTACGATGTCTACAGATACCTCCCAC
CCGAACTTCAGGTCGTTTTGATCTCCGCCACTCTTCCTCACGAGATCCTAGAGATGACTAGCAAGTTCATGACCGAACCAGTTAGGAT
CCTTGTGAAGCGTGATGAGTTGACCCTGGAGGGTATCAAACAATTCTTCGTTGCTGTTGAGAAAGAGGAATGGAAGTTTGATACGCTG
TGTGATCTTTATGATACGTTGACCATCACCCAAGCTGTTATTTTCTGCAATACTAAGAGAAAGGTGGATTGGCTTACTGAAAGAATGC
GCAGCAATAACTTCACAGTATCAGCTATGCATGGTGACATGCCCCAACAGGAAAGGGATGCCATCATGACAGAGTTCAGGTCTGGTGC
AACTCGTGTGCTAATCACTACGGATGTTTGGGCTCGAGGGCTGGATGTTCAGCAGGTTTCACTTGTCATAAATTATGATCTCCCAAAT
AATCGTGAGCTTTACATCCATCGCATCGGTCGCTCTGGTCGTTTTGGGCGCAAGGGTGTGGCGATCAATTTTGTGCGCAAGGATGACA
TCCGTATCCTGAGGGATATAGAACAGTACTACAGCACACAAATTGATGAGATGCCAATGAATGTTGCTGATCTAATTTGA
"MAAXTTSRRGAGASRSMDDENLTFETSPGVEVVSSFDQMGIKDDLLRGIYGYGFEKPSAIQQRAVLPIINGRDVIAQAQSGTGKSSM
ISLTVCQIVDTAVREVQALILSPTRELASQTEKVMLAVGDYLNIQVHACIGGKSISEDIRRLENGVHVVSGTPGRVCDMIKRRTLRTR
AIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEPVRILVKRDELTLEGIKQFFVAVEKEEWKFDT
LCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQQERDAIMTEFRSGATRVLITTDVWARGLDVQQVSLVINYDLP
NNRELYIHRIGRSGRFGRKGVAINFVRKDDIRILRDIEQYYSTQIDEMPMNVADLI"
In-silico generated protein structures
13
ABOUT THE GENE AND PROTEINE
GENE LENGTH : 1224bp
INTRONS NUMBER : 7
EXON NUMBER : 8
GENE MOLECULAR WEIGHT : 378411.66 - 378491.72 Daltons
PROTEIN LENGTH : 407 AA
MOLECULAR WEIGHT : 45.2KDA
ISO ELECTIC POINT : 6.10
Search for simlar data in available data base
• The date will subjected for similar data search
in NCBI or Phytozome or some more available
databases with BLAST tool.
• Download the data from the data base.
Note:
• always keep data in notepad for working
convenience.
• Now we are presenting unpublished data.
BLAST
Clustal- W
• Now the finalized data will subject to Clustal
alignment for sequence similarity.
• Clustal- W is the tool for searching and
mapping more similarities in sequences.
• This may allow for nucleotide sequences and
proteins.
• Mostly protein sequences are subjected for
the alignment for accuracy.
Mega
SB4g RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232
SACetif RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232
ZEAMMB73 RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 231
SIDb RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEITSKFMTEP 232
PgeiF4a RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 232
OS3g RTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDP 229
H RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHDILEITSKFMTDP 237
Phys RTRSIKLLILDESDEMLSRGFKDQIYDVYRYLPPELQVVLVSATLPHEILEMTNKFMTDP 222
Jat RTRAIRLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 235
RC RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 232
GM RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 232
PHAVU RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231
CA RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231
M RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231
CS-EIF4A-3-like RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTNKFMTDP 235
MD RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTNKFMTEP 227
***:*::*:***:****.****************:*** *:*****::***:*.****:*
Alignment for retrieved sequences
Phylogenetic analysis
• After alignment the data will subject for the
phylogenetic analysis.
• Here the relation between the data source will
be evaluated.
• Most similar sequence will place near the
sequence less similar sequence will place in
distance.
• By counting the distance we can measure the
relation between data source.
Phylogenetic tree
22
Fig 1: 20-404: P-LOOP COTAINIG NUCLIOSIDE TRIOSE PHOSPATE
HYDROLASE(ipr027417).
34-62: RNA- HELICASE, DEAD BOX TYPE Q-MOTIF (IPR014014).
246-407: HELICASE C-TERMINAL (IPR001650)
183-186: REPRESENCE OF DEAD AMINO ACIDS
ATG GCG GCG SCC ACC ACS TCC CGC CGC GGC GCC GGC GCC TCC CGC AGC ATG GAC GAC GAG AAC CTC ACC TTC
M A A X T T S R R G A G A S R S M D D E N L T F 24
GAG ACC TCC CCG GGT GTC GAG GTC GTC AGC AGC TTC GAC CAG ATG GGG ATC AAG GAC GAC CTC CTC CGC GGC
E T S P G V E V V S S F D Q M G I K D D L L R G 48
ATC TAC GGC TAC GGG TTC GAG AAG CCC TCC GCC ATC CAG CAG CGC GCC GTC CTC CCC ATC ATC AAC GGA CGC
I Y G Y G F E K P S A I Q Q R A V L P I I N G R
GAC GTC ATC GCG CAG GCC CAG TCC GGC ACC GGG AAG TCA TCC ATG ATC TCA CTC ACC GTA TGC CAG ATC GTC
D V I A Q A Q S G T G K S S M I S L T V C Q I V
GAC ACC GCA GTC CGC GAG GTC CAG GCT CTG ATC CTC TCA CCC ACC AGG GAG CTC GCT TCG CAG ACA GAG AAG
D T A V R E V Q A L I L S P T R E L A S Q T E K
GTT ATG CTG GCT GTC GGC GAC TAC CTC AAT ATC CAA GTG CAC GCT TGC ATT GGT GGG AAA AGT ATC AGC GAG
V M L A V G D Y L N I Q V H A C I G G K S I S E
GAT ATC AGG AGG CTT GAG AAC GGA GTC CAT GTT GTC TCT GGG ACT CCG GGC AGA GTC TGC GAT ATG ATC AAG
D I R R L E N G V H V V S G T P G R V C D M I K
AGG AGG ACC CTG CGG ACA AGA GCC ATC AAG CTT CTA GTT CTG GAT GAG GCT GAT GAG ATG TTG AGC AGA GGC
R R T L R T R A I K L L V L D E A D E M L S R G
TTT AAG GAT CAG ATT TAC GAT GTC TAC AGA TAC CTC CCA CCC GAA CTT CAG GTC GTT TTG ATC TCC GCC ACT
F K D Q I Y D V Y R Y L P P E L Q V V L I S A T
CTT CCT CAC GAG ATC CTA GAG ATG ACT AGC AAG TTC ATG ACC GAA CCA GTT AGG ATC CTT GTG AAG CGT GAT
L P H E I L E M T S K F M T E P V R I L V K R D
GAG TTG ACC CTG GAG GGT ATC AAA CAA TTC TTC GTT GCT GTT GAG AAA GAG GAA TGG AAG TTT GAT ACG CTG
E L T L E G I K Q F F V A V E K E E W K F D T L
TGT GAT CTT TAT GAT ACG TTG ACC ATC ACC CAA GCT GTT ATT TTC TGC AAT ACT AAG AGA AAG GTG GAT TGG
C D L Y D T L T I T Q A V I F C N T K R K V D W
CTT ACT GAA AGA ATG CGC AGC AAT AAC TTC ACA GTA TCA GCT ATG CAT GGT GAC ATG CCC CAA CAG GAA AGG
L T E R M R S N N F T V S A M H G D M P Q Q E R
GAT GCC ATC ATG ACA GAG TTC AGG TCT GGT GCA ACT CGT GTG CTA ATC ACT ACG GAT GTT TGG GCT CGA GGG
D A I M T E F R S G A T R V L I T T D V W A R G
CTG GAT GTT CAG CAG GTT TCA CTT GTC ATA AAT TAT GAT CTC CCA AAT AAT CGT GAG CTT TAC ATC CAT CGC
L D V Q Q V S L V I N Y D L P N N R E L Y I H R
ATC GGT CGC TCT GGT CGT TTT GGG CGC AAG GGT GTG GCG ATC AAT TTT GTG CGC AAG GAT GAC ATC CGT ATC
I G R S G R F G R K G V A I N F V R K D D I R I
CTG AGG GAT ATA GAA CAG TAC TAC AGC ACA CAA ATT GAT GAG ATG CCA ATG AAT GTT GCT GAT CTA ATT TGA
L R D I E Q Y Y S T Q I D E M P M N V A D L I *
Structural analysis
• Structural analysis will conduct for protein
through homology modeling & docking.
• The protein sequence secondary structure and
tertiary structure analysis must be done.
• This structure analysis must be evaluated
under Nuclear magnetic resonance score and
X-Ray crystallographic score.
• Ramachandra plot is more important for
structural validation.
24
Insilco analysis of eIF4A
Homology modeling:
by using Modeller 9.12 version we have designed structure of eIF4A
Pennisetum glaucum
α-helics
β- pleated
sheets
DEAD box
motif
Fig: Homology modeling of amino acid sequence of eiF4A from P. glaucum
revealing the signature motifs of DEAD box and Mg2+ binding sites. eiF4A
showed the ----helices and --------sheets.
Nuclear magnetic resonance analysis
for protein structure
REPRESENTATION OF RAMA CHANDRAN PLOT FOR RICE AND PEARL
MILLET EIF4A STRUCTURES DONE BY PROCHECK
Peptide position and bonds
Functional analysis
• Functional analysis will be done with domain
and conserved motifs and active site analysis.
• These are evaluated with docking and amino
acid composition.
• Depend on αhelices β-pleated sheets the
protein structure can be obtained.
29
Docking analysis and motif localization in Pennisetum glaccum EIF4A
Docking analysis was
performed by using Sybil
6.7 version for motif
analysis and structural
stability.
Rice and pearl millet Active sites, Motifs and Domains of eif4a
respectively done by docking studies
Classification
• Functional analysis and structural analysis can
classify our protein.
• At first we got the relation of the protein
through phylogenetic analysis.
• Now with structural and functional characters
can be include and clear classification will be
performed.
Reporting
• Now the data which was evaluated in a way
with accuracy you can publish or report.
• So many submissions and sequence uploads
are taking place at various levels.
• Genes are reporting, proteins are reporting,
genomes are also reporting to those
databases.
• Those will be available for further research
aspects.
Conclusion
• With In-silico studies you will get 60 to 70%
accuracy of the information regarding your
work.
• With this you can confirm whether you are
working on proper thing or not before starting
your In-vitro studies.
• So you can proceed towards your work with
70% of In-silico information and complete the
project with 100% success in .
Acknowledgement
• Agri biotech foundation
• Department of Biotechnology
• Prof . G. Pakkireddy,
• Dr. J. S. Bentur
• Dr. G. Mallikarjun
• My Friends and colleagues
• Dearest participants (transformed with high
energy and patience)
In silico analysis for unknown data
In silico analysis for unknown data

In silico analysis for unknown data

  • 2.
    In-silico Analysis forUnknown Data -Tata Santosh Rama Bhadra Rao Agri Biotech Foundation
  • 3.
    What is Bioinformatics? Mathematics andStatistics Biology Computer Science
  • 4.
    "All aspects ofgathering, storing, handling, analyzing, interpreting and spreading vast amounts of biological information in databases. The information involved includes gene sequences, biological activity/function, pharmacological activity, biological structure, molecular structure, protein- protein interactions, and gene expression. Bioinformatics uses powerful computers and statistical techniques to accomplish research objectives, for example, to discover a new pharmaceutical or herbicide." What is bioinformatics?
  • 5.
    Task flow • Datawhat we have • Search for simlar data in available data base • Clustal- W • Phylogenetic analysis • Classification • Structural analysis • Functional analysis • Reporting
  • 6.
    Data Outcome • Thatmay be a nucleotide sequence such as m- RNA or gene or genome or protein sequence. • Mostly 16s m-RNA is used to classify a gene or species. • With Forward and reverse sequences it will more accurate. • We can check with protein also.
  • 7.
  • 8.
    Sample for DNAisolation 1 DNA 2 3
  • 9.
    DNA Symbol Meaning Explanation GG Guanine A A Adenine T T Thymine C C Cytosine R A or G puRine Y C or T pYrimidine N A, C, G or T Any base Double helix 5’ 3’ 3’ 5’ A C G T C A T G T G C A G T A C RNA 5’ 3’A C G U C A U G template U U Uracil
  • 10.
    Isolation of thegene of interest from unknown sample cDNA library construction kit from Stratagene 1st strand cDNA preparation and mRNA removal AAAA AAAA AAAA TTTT AAAA TTTT Removal of commonly hybridized population by magnetic separation Differentially up-regulated mRNA population Commonly expressed mRNA population Control mRNA AAAA TTTT TTTT AAAA TTTT AAAA AAAA TTTT AAAA AAAA AAAA AAAA TTTT TTTT TTTT TTTT stress mRNA Hybridization of stress mRNA with excess of complementary 1st strand control cDNA TTTT TTTT
  • 11.
    Gene and proteinof EIF4A ATGGCGGCGSCCACCACSTCCCGCCGCGGCGCCGGCGCCTCCCGCAGCATGGACGACGAGAACCTCACCTTCGAGACCTCCCCGGGTG TCGAGGTCGTCAGCAGCTTCGACCAGATGGGGATCAAGGACGACCTCCTCCGCGGCATCTACGGCTACGGGTTCGAGAAGCCCTCCGC CATCCAGCAGCGCGCCGTCCTCCCCATCATCAACGGACGCGACGTCATCGCGCAGGCCCAGTCCGGCACCGGGAAGTCATCCATGATC TCACTCACCGTATGCCAGATCGTCGACACCGCAGTCCGCGAGGTCCAGGCTCTGATCCTCTCACCCACCAGGGAGCTCGCTTCGCAGA CAGAGAAGGTTATGCTGGCTGTCGGCGACTACCTCAATATCCAAGTGCACGCTTGCATTGGTGGGAAAAGTATCAGCGAGGATATCAG GAGGCTTGAGAACGGAGTCCATGTTGTCTCTGGGACTCCGGGCAGAGTCTGCGATATGATCAAGAGGAGGACCCTGCGGACAAGAGCC ATCAAGCTTCTAGTTCTGGATGAGGCTGATGAGATGTTGAGCAGAGGCTTTAAGGATCAGATTTACGATGTCTACAGATACCTCCCAC CCGAACTTCAGGTCGTTTTGATCTCCGCCACTCTTCCTCACGAGATCCTAGAGATGACTAGCAAGTTCATGACCGAACCAGTTAGGAT CCTTGTGAAGCGTGATGAGTTGACCCTGGAGGGTATCAAACAATTCTTCGTTGCTGTTGAGAAAGAGGAATGGAAGTTTGATACGCTG TGTGATCTTTATGATACGTTGACCATCACCCAAGCTGTTATTTTCTGCAATACTAAGAGAAAGGTGGATTGGCTTACTGAAAGAATGC GCAGCAATAACTTCACAGTATCAGCTATGCATGGTGACATGCCCCAACAGGAAAGGGATGCCATCATGACAGAGTTCAGGTCTGGTGC AACTCGTGTGCTAATCACTACGGATGTTTGGGCTCGAGGGCTGGATGTTCAGCAGGTTTCACTTGTCATAAATTATGATCTCCCAAAT AATCGTGAGCTTTACATCCATCGCATCGGTCGCTCTGGTCGTTTTGGGCGCAAGGGTGTGGCGATCAATTTTGTGCGCAAGGATGACA TCCGTATCCTGAGGGATATAGAACAGTACTACAGCACACAAATTGATGAGATGCCAATGAATGTTGCTGATCTAATTTGA "MAAXTTSRRGAGASRSMDDENLTFETSPGVEVVSSFDQMGIKDDLLRGIYGYGFEKPSAIQQRAVLPIINGRDVIAQAQSGTGKSSM ISLTVCQIVDTAVREVQALILSPTRELASQTEKVMLAVGDYLNIQVHACIGGKSISEDIRRLENGVHVVSGTPGRVCDMIKRRTLRTR AIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEPVRILVKRDELTLEGIKQFFVAVEKEEWKFDT LCDLYDTLTITQAVIFCNTKRKVDWLTERMRSNNFTVSAMHGDMPQQERDAIMTEFRSGATRVLITTDVWARGLDVQQVSLVINYDLP NNRELYIHRIGRSGRFGRKGVAINFVRKDDIRILRDIEQYYSTQIDEMPMNVADLI"
  • 12.
  • 13.
    13 ABOUT THE GENEAND PROTEINE GENE LENGTH : 1224bp INTRONS NUMBER : 7 EXON NUMBER : 8 GENE MOLECULAR WEIGHT : 378411.66 - 378491.72 Daltons PROTEIN LENGTH : 407 AA MOLECULAR WEIGHT : 45.2KDA ISO ELECTIC POINT : 6.10
  • 14.
    Search for simlardata in available data base • The date will subjected for similar data search in NCBI or Phytozome or some more available databases with BLAST tool. • Download the data from the data base. Note: • always keep data in notepad for working convenience. • Now we are presenting unpublished data.
  • 15.
  • 17.
    Clustal- W • Nowthe finalized data will subject to Clustal alignment for sequence similarity. • Clustal- W is the tool for searching and mapping more similarities in sequences. • This may allow for nucleotide sequences and proteins. • Mostly protein sequences are subjected for the alignment for accuracy.
  • 18.
  • 19.
    SB4g RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232 SACetifRTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTEP 232 ZEAMMB73 RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 231 SIDb RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEITSKFMTEP 232 PgeiF4a RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTSKFMTEP 232 OS3g RTRAIKLLILDEADEMLGRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTSKFMTDP 229 H RTRAIKLLVLDEADEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHDILEITSKFMTDP 237 Phys RTRSIKLLILDESDEMLSRGFKDQIYDVYRYLPPELQVVLVSATLPHEILEMTNKFMTDP 222 Jat RTRAIRLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 235 RC RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPNEILEMTSKFMTDP 232 GM RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 232 PHAVU RTRAIKMLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231 CA RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231 M RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPDLQVCLISATLPHEILEMTNKFMTDP 231 CS-EIF4A-3-like RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVVLISATLPHEILEMTNKFMTDP 235 MD RTRAIKLLVLDESDEMLSRGFKDQIYDVYRYLPPELQVCLISATLPHEILEMTNKFMTEP 227 ***:*::*:***:****.****************:*** *:*****::***:*.****:* Alignment for retrieved sequences
  • 20.
    Phylogenetic analysis • Afteralignment the data will subject for the phylogenetic analysis. • Here the relation between the data source will be evaluated. • Most similar sequence will place near the sequence less similar sequence will place in distance. • By counting the distance we can measure the relation between data source.
  • 21.
  • 22.
    22 Fig 1: 20-404:P-LOOP COTAINIG NUCLIOSIDE TRIOSE PHOSPATE HYDROLASE(ipr027417). 34-62: RNA- HELICASE, DEAD BOX TYPE Q-MOTIF (IPR014014). 246-407: HELICASE C-TERMINAL (IPR001650) 183-186: REPRESENCE OF DEAD AMINO ACIDS ATG GCG GCG SCC ACC ACS TCC CGC CGC GGC GCC GGC GCC TCC CGC AGC ATG GAC GAC GAG AAC CTC ACC TTC M A A X T T S R R G A G A S R S M D D E N L T F 24 GAG ACC TCC CCG GGT GTC GAG GTC GTC AGC AGC TTC GAC CAG ATG GGG ATC AAG GAC GAC CTC CTC CGC GGC E T S P G V E V V S S F D Q M G I K D D L L R G 48 ATC TAC GGC TAC GGG TTC GAG AAG CCC TCC GCC ATC CAG CAG CGC GCC GTC CTC CCC ATC ATC AAC GGA CGC I Y G Y G F E K P S A I Q Q R A V L P I I N G R GAC GTC ATC GCG CAG GCC CAG TCC GGC ACC GGG AAG TCA TCC ATG ATC TCA CTC ACC GTA TGC CAG ATC GTC D V I A Q A Q S G T G K S S M I S L T V C Q I V GAC ACC GCA GTC CGC GAG GTC CAG GCT CTG ATC CTC TCA CCC ACC AGG GAG CTC GCT TCG CAG ACA GAG AAG D T A V R E V Q A L I L S P T R E L A S Q T E K GTT ATG CTG GCT GTC GGC GAC TAC CTC AAT ATC CAA GTG CAC GCT TGC ATT GGT GGG AAA AGT ATC AGC GAG V M L A V G D Y L N I Q V H A C I G G K S I S E GAT ATC AGG AGG CTT GAG AAC GGA GTC CAT GTT GTC TCT GGG ACT CCG GGC AGA GTC TGC GAT ATG ATC AAG D I R R L E N G V H V V S G T P G R V C D M I K AGG AGG ACC CTG CGG ACA AGA GCC ATC AAG CTT CTA GTT CTG GAT GAG GCT GAT GAG ATG TTG AGC AGA GGC R R T L R T R A I K L L V L D E A D E M L S R G TTT AAG GAT CAG ATT TAC GAT GTC TAC AGA TAC CTC CCA CCC GAA CTT CAG GTC GTT TTG ATC TCC GCC ACT F K D Q I Y D V Y R Y L P P E L Q V V L I S A T CTT CCT CAC GAG ATC CTA GAG ATG ACT AGC AAG TTC ATG ACC GAA CCA GTT AGG ATC CTT GTG AAG CGT GAT L P H E I L E M T S K F M T E P V R I L V K R D GAG TTG ACC CTG GAG GGT ATC AAA CAA TTC TTC GTT GCT GTT GAG AAA GAG GAA TGG AAG TTT GAT ACG CTG E L T L E G I K Q F F V A V E K E E W K F D T L TGT GAT CTT TAT GAT ACG TTG ACC ATC ACC CAA GCT GTT ATT TTC TGC AAT ACT AAG AGA AAG GTG GAT TGG C D L Y D T L T I T Q A V I F C N T K R K V D W CTT ACT GAA AGA ATG CGC AGC AAT AAC TTC ACA GTA TCA GCT ATG CAT GGT GAC ATG CCC CAA CAG GAA AGG L T E R M R S N N F T V S A M H G D M P Q Q E R GAT GCC ATC ATG ACA GAG TTC AGG TCT GGT GCA ACT CGT GTG CTA ATC ACT ACG GAT GTT TGG GCT CGA GGG D A I M T E F R S G A T R V L I T T D V W A R G CTG GAT GTT CAG CAG GTT TCA CTT GTC ATA AAT TAT GAT CTC CCA AAT AAT CGT GAG CTT TAC ATC CAT CGC L D V Q Q V S L V I N Y D L P N N R E L Y I H R ATC GGT CGC TCT GGT CGT TTT GGG CGC AAG GGT GTG GCG ATC AAT TTT GTG CGC AAG GAT GAC ATC CGT ATC I G R S G R F G R K G V A I N F V R K D D I R I CTG AGG GAT ATA GAA CAG TAC TAC AGC ACA CAA ATT GAT GAG ATG CCA ATG AAT GTT GCT GAT CTA ATT TGA L R D I E Q Y Y S T Q I D E M P M N V A D L I *
  • 23.
    Structural analysis • Structuralanalysis will conduct for protein through homology modeling & docking. • The protein sequence secondary structure and tertiary structure analysis must be done. • This structure analysis must be evaluated under Nuclear magnetic resonance score and X-Ray crystallographic score. • Ramachandra plot is more important for structural validation.
  • 24.
    24 Insilco analysis ofeIF4A Homology modeling: by using Modeller 9.12 version we have designed structure of eIF4A Pennisetum glaucum α-helics β- pleated sheets DEAD box motif Fig: Homology modeling of amino acid sequence of eiF4A from P. glaucum revealing the signature motifs of DEAD box and Mg2+ binding sites. eiF4A showed the ----helices and --------sheets.
  • 25.
    Nuclear magnetic resonanceanalysis for protein structure
  • 26.
    REPRESENTATION OF RAMACHANDRAN PLOT FOR RICE AND PEARL MILLET EIF4A STRUCTURES DONE BY PROCHECK
  • 27.
  • 28.
    Functional analysis • Functionalanalysis will be done with domain and conserved motifs and active site analysis. • These are evaluated with docking and amino acid composition. • Depend on αhelices β-pleated sheets the protein structure can be obtained.
  • 29.
    29 Docking analysis andmotif localization in Pennisetum glaccum EIF4A Docking analysis was performed by using Sybil 6.7 version for motif analysis and structural stability.
  • 30.
    Rice and pearlmillet Active sites, Motifs and Domains of eif4a respectively done by docking studies
  • 31.
    Classification • Functional analysisand structural analysis can classify our protein. • At first we got the relation of the protein through phylogenetic analysis. • Now with structural and functional characters can be include and clear classification will be performed.
  • 32.
    Reporting • Now thedata which was evaluated in a way with accuracy you can publish or report. • So many submissions and sequence uploads are taking place at various levels. • Genes are reporting, proteins are reporting, genomes are also reporting to those databases. • Those will be available for further research aspects.
  • 33.
    Conclusion • With In-silicostudies you will get 60 to 70% accuracy of the information regarding your work. • With this you can confirm whether you are working on proper thing or not before starting your In-vitro studies. • So you can proceed towards your work with 70% of In-silico information and complete the project with 100% success in .
  • 34.
    Acknowledgement • Agri biotechfoundation • Department of Biotechnology • Prof . G. Pakkireddy, • Dr. J. S. Bentur • Dr. G. Mallikarjun • My Friends and colleagues • Dearest participants (transformed with high energy and patience)

Editor's Notes

  • #23 Fig: Stuctural organization of eif4A gene from Pennisetum glaucum: A schematic representation of PgeiF4A protein structure containing three motifs: 34-62: RNA-helicase, DEAD box type Q-motif (indicated as strait line); 246-407; Helicase C-terminal (dotted lines); Deduced amino acid sequence is placed beneath the nucleotide sequence (single letter code). Various functional domains in the sequence have been significantly marked, such as N-terminal ATPase domain (regular font), linker region (bold font), substrate-binding domain (shaded gray) and c-terminal domain (italics). Deduced amino acid sequence containing four binding sites:1. ATP binding site (), 2. Mg2+ binding site, 3. Nucleotide binding site, 4. ATP binding site (). Deduced amino acid sequence is placed beneath the nucleotide sequence (single letter code). Various functional domains in the sequence have been significantly marked, such as N-terminal ATPase domain (regular font), linker region (bold font), substrate-binding domain (shaded gray) and c-terminal domain (italics).
  • #27 Ramachandran plot for protein structure analysis
  • #30 docking analysis was performed by using Sybil 6.7 version for motif analysis and structural stability.