IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 20-22
www.iosrjournals.org
DOI: 10.9790/0661-1803012022 www.iosrjournals.org 20 | Page
Computational Analysis of Sequences to Determine Expectation
Value Commonly Used in Bioinformatics Database.
.Uma Kumari1
,Ashok Kumar Choudhary2
1
Department of Biotechnology, Jharkhand Rai University,Ranchi-835222,Jharkhand,India.
2
Department of Botany,Ranchi University,Ranchi-834008,Jharkhand,India.
Abstract: Solanum lycopersicum economically important crop world wide, intensively investigated and
model system for genetic studies in plant ,variability is a measure spread of data set. Genome analysis and
annotation using genome from the libraries, automatic annotation using the blast (basic local alignment
search tools )low complexity sequence have unusual composition that can create a problem in sequence
similarity searching the color bars in the graphic summarize the BLAST tools. Blast have been developed to
provided the sequence in the form of customized data extraction utilities for some of customized data
extraction utilities for some of commonly used database such as NCBI,FASTA ,BLAST,ORF ,NEB Cutter. Blast
has a bioinformatics algorithm to align sequences as if they were found in the database search .when expect -
value is increases from default value ,a larger list with more scoring hits can be reported. Ncbi thus provided a
common data extraction platform using sequence analysis when decrease exponentially as the score of the
match increased .when desired subset of the data compiled using Blast can be subsequently used for observed
the expectation value to analyses and knowledge discovery.
Keywords: Bioinformatics ---Biological database--- Customized data retrieved—Sequence analysis—Data
compiled---Expectation Value.
I. Introduction
The data available for biological system are diverse in nature and include various types such as
sequences,structures,expression data,interaction,pathway,system data.The rate at which the data are generated
has increased exponentially due to technological advances in field of genomics,transcriptomics,
proteomics,structural genomics, system biology etc.As the biological data constitutes an important component
of big data it demands to genome sequencing to curate ,compile,organize,archive,and query and analyse the
sequences( Higgins Des,Taylor Willie 2000). Various types of primary data along with annotation continue to
be useful for processing, analysis, interpretation of data so as to generate higher order information and
knowledge. The existing databases archiving molecular data and built around the focused them and after
relevant annotations. The tools facilitate analysis of the data and integration of diverse data type are therefore
the need of the hour. A number of online as well as offline tools and server are available for accessing and
retrieving large amount of data from public domain resources.NCBI E-utilities ,for an instance provide
customized data utilities for various databases available at NCBI.These utilities require generation of
URLs(Utility web services (http://www.cbi.nlm.nih.gov) can be generated for the user in a format specific for
respective database either by manually or by writing scripts. NCBI computational biology branch focus on
theoretical, analytical and applied computational approaches to a broad range of fundamental problem in
molecular biology. A sequence in FASTA format is represented as a series of lines, each of which should no
longer then 120 character and usually do not exceed 80 character. Although, multiple URL can be generated. As
the initial step of reach this goal, Casey, R. M. (2005). NCBI, BLAST, FASTA tools have been developed to
make the existing search utilities more effective and productive towards computational analysis of sequence.
II. Method
Computational analysis of sequence alignment computer programming for bioinformatics and data
management .NCBI focuses on theoretical, analytical and applied computational approaches and widely used
primary database such as European nucleotide archive, Uniport kb/Swiss –prot,a widely used method for
assignment of secondary structure. A fasta sequence alignment software package used to functional and
evolutionary relationship between sequences .Operating system—UNIX, LINUX,Ms-Windows.
2.1 Database And Corresponding Webservices
Database name Web services type: URL
NCBI E—Utility web services (http://www.cbi.nlm.nih.gov
BLAST www.ebi.ac.uk/tools/sss/ncbiblast
FASTA www.ebi.ac.uk/tools
Computational Analysis Of Sequences To Determining Expectation Value Commonly Used In
DOI: 10.9790/0661-1803012022 www.iosrjournals.org 21 | Page
EMBL/EBI EMBL-EBI web services (http://www.ebi.ac.uk/tools/
Uniprot KB Programmatic access services (http://www.uniprot.org)
III. Results And Discussion
Searching and browsing the database and generating curated datasets is an essential for processing
analysis and interpretation.NCBI,FASTA,BLAST,this need for searching subset of sequence/data from few
widely used database providing e-value in blast.occuring bychance with the observed the score/high score in E-
Value.NCBI-BLAST provide a platform with a user-friendly interface.some of the common features of all
utilities computational analysis of the sequence by using NCBI,BLAST,and FASTA.All utilities support data
from single entry as well as multiple entries. The data is exchanged among these database on the daily basis
.The Ncbi houses a series of databases relevant to bioinformatics tools and services. Major sequence include
gene bank for dna sequences and pubmed. Epigenomic database of the ncbi (National center of biotechnology
information) at NIH (National institute of health) means to collect the maps of epigenetic modification and the
occurrence across the human genome. List of accession number may be provided in an interactive mode of a
uploading a text file.
>gi|1002623395|ref|NM_001320673.1| Solanum lycopersicum cysteine proteinase inhibitor A (LOC543632),
mRNA
GCTTTAATCAAACGCGCTCCATTAAATTCGTTGATTGTGACTGACTATTCTTCTTCTTCTTCTTATAT
AT
CTCAAAAACCCCATTTACAGAGACTCAAAAATGGCGACATTAGGAGGAATTCGTGAAGCTGGAGG
ATCAG
AAAACAGCCTAGAGATCAACGATCTTGCTCGTTTTGCTGTTGATGAACACAATAAGAAACAGAAT
GCTCT
TTTGGAGTTTGGAAAGGTTGTGAATGTGAAGGAACAAGTGGTTGCTGGAACCATGTACTACATAA
CACTA
GAGGCGACTGAAGGTGGTAAGAAGAAAGCATACGAAGCCAAAGTCTGGGTGAAGCCATGGCAGA
ACTTCA
AGCAAGTTGAAGACTTCAAGCTTATTGGGGATGCTGCTACTGCTTAACAAGCGCTGAACGATGTAT
GACT
CTTATGTCCTGAAAATAAAGCTAAACATATTTTAGCTTGTTCGTATTTGAATATCATAAAGTAAGTT
CAT
AACTCTATCGTGGATCTAAATTACGGATAACTATAGCTTTACAACGTTCCTTTTTCGTTCTATGCTC
TTA
TCTTATATACGATTTTGCTTTTCTGTTGCTAATAATATCTGAGAAACACAAGC
(nucleotide sequence)
(Sources—Fasta sequence related to solanum lycopersicum retrived from NCBI.)
3.1 Advanced Features Of Sequence Analysis
The features table block is an important section in a nucleotide sequence entry .the sequence analysis of
database accession number and cross linking is carried out by using the base URLs to relevant entries.
An e-value of 1e-3 is annotate that there is a 0.001 chance that alignment would exist in the database by
chance..if the database 610 sequence, then might expect that alignment occur may be 7 times. the score is
measure of similarity between the sequences. It is a statical calculation based on the quality of alignment
obtained from one database. These soft links could be dynamically established using properties such as
homology, structural, functional similarities, membership to a certain biological process etc. An e-value of 1e-3
is saying that there is a 0.001 chance that that alignment would exist in the database by chance, that is, if the
database contains 10000 sequences. An e-value of 0 is actually a rounded down probability (maybe 1e-250 or
something), and is simply saying that there is (almost) no chance that alignment can occur by chance.
IV. Conclusion
Analysis of the sequence has been developed with the objective of providing a single platform for
customizable data from the some of the major biological database .E value is increased from default value,
larger list with more low scoring hits can be reported based on quality of alignment (the score) and size of the
database by applying the sequence alignment method and bioinformatics tools.. The closer the E-value is
towards 0, the better the alignment.
V. Acknowledgement
We extended our sincere thanks to Dr.Savita ―vice chancellor ―of Jharkhand rai university, Ranchi, India for kindly providing me the
platform to carry out the research.
Computational Analysis Of Sequences To Determining Expectation Value Commonly Used In
DOI: 10.9790/0661-1803012022 www.iosrjournals.org 22 | Page
References
[1]. Baxevanis D.Andreas,Quellete Fracis B.F. ,A Practical guide to the Analysis of gene and Proteins.,3rd
Eddition October
2004,Published by Wiley, john and Sons
[2]. Brudno M, Malde2.S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S (2003)."Glocal alignment: finding rearrangements
during alignment". Bioinformatics. 19. Suppl 1 (90001): i54–62. doi:10.1093/bioinformatics/btg1005. PMID 12855437.
[3]. Casey, R. M. (2005). "BLAST Sequences Aid in Genomics and Proteomics". Business Intelligence Network.
[4]. Eddy SR; Rost, Burkhard (2008). Rost, Burkhard, ed. "A probabilistic model of local sequence alignment that simplifies statistical
significance estimation". PLoS Comput Biol4 (5): e1000069. doi:10.1371/journal.pcbi.1000069. PMC 2396288.PMID 18516236.
[5]. Higgins Des,Taylor Willie 2000,Bioinformatics:Sequence structure and database practical approach‖ 1st
Eddition october
2000,Published by oxford University Press.
[6]. Jean-Michel Claverie & Cedric Notredame, Bioinformatics for Dummies, Wiley Publishing
[7]. Lipman, DJ; Pearson, WR (1985). "Rapid and sensitive protein similarity searches". Science 227 (4693): 1435–
41. doi:10.1126/science.2983426. PMID 2983426.
[8]. Mount .David 2004,Bioinformatics:-sequence $ Genome Analysis‖, published by Cold spring Harbour laboratory press.
[9]. Oehmen, C.; Nieplocha, J. (2006). "ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive
Bioinformatics Analysis". IEEE Transactions on Parallel and Distributed Systems 17 (8): 740. doi:10.1109/TPDS.2006.112.
[10]. "Program Selection Tables of the Blast NCBI web site".
[11]. Pearson, WR; Lipman, DJ (1988). "Improved tools for biological sequence comparisons". Proceedings of the National Academy of
Sciences of the United States of America 85 (8): 2444–8.doi:10.1073/pnas.85.8.2444. PMC 280013. PMID 3162770.
[12]. Rick CM.Yader JT,Annu Rev Genet 1988 Classical and Molecular Genetics of tomato.
[13]. http://www.ncbi.nlm.nih.gov/BLAST/full_options.html
[14]. Scott Jw,Harbaugh Bk:Micro Tom: A miniature dwarf tomato. Florid Agar Experiment 1989.
[15]. Taylor Willie, Higgins Des 2000,Bioinformatics :Sequence structure and database practical approach ―,1st
Edition October 2000
,Published by Oxford university press.
[16]. Whitworth, W.A. (1901) Choice and Chance with One Thousand Exercises. Fifth edition. Deighton Bell, Cambridge. [Reprinted by
Hafner Publishing Co., New York, 1959.]
[17]. Zhao,K.;Chu,X.(2014). "G-
BLASTN:acceleratingnucleotidealignmentbygraphicsprocessors". Bioinformatics 30 (10):138491. doi:10.1093/bioinformatics/btu0
47.PMID 24463183.

D1803012022

  • 1.
    IOSR Journal ofComputer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. I (May-Jun. 2016), PP 20-22 www.iosrjournals.org DOI: 10.9790/0661-1803012022 www.iosrjournals.org 20 | Page Computational Analysis of Sequences to Determine Expectation Value Commonly Used in Bioinformatics Database. .Uma Kumari1 ,Ashok Kumar Choudhary2 1 Department of Biotechnology, Jharkhand Rai University,Ranchi-835222,Jharkhand,India. 2 Department of Botany,Ranchi University,Ranchi-834008,Jharkhand,India. Abstract: Solanum lycopersicum economically important crop world wide, intensively investigated and model system for genetic studies in plant ,variability is a measure spread of data set. Genome analysis and annotation using genome from the libraries, automatic annotation using the blast (basic local alignment search tools )low complexity sequence have unusual composition that can create a problem in sequence similarity searching the color bars in the graphic summarize the BLAST tools. Blast have been developed to provided the sequence in the form of customized data extraction utilities for some of customized data extraction utilities for some of commonly used database such as NCBI,FASTA ,BLAST,ORF ,NEB Cutter. Blast has a bioinformatics algorithm to align sequences as if they were found in the database search .when expect - value is increases from default value ,a larger list with more scoring hits can be reported. Ncbi thus provided a common data extraction platform using sequence analysis when decrease exponentially as the score of the match increased .when desired subset of the data compiled using Blast can be subsequently used for observed the expectation value to analyses and knowledge discovery. Keywords: Bioinformatics ---Biological database--- Customized data retrieved—Sequence analysis—Data compiled---Expectation Value. I. Introduction The data available for biological system are diverse in nature and include various types such as sequences,structures,expression data,interaction,pathway,system data.The rate at which the data are generated has increased exponentially due to technological advances in field of genomics,transcriptomics, proteomics,structural genomics, system biology etc.As the biological data constitutes an important component of big data it demands to genome sequencing to curate ,compile,organize,archive,and query and analyse the sequences( Higgins Des,Taylor Willie 2000). Various types of primary data along with annotation continue to be useful for processing, analysis, interpretation of data so as to generate higher order information and knowledge. The existing databases archiving molecular data and built around the focused them and after relevant annotations. The tools facilitate analysis of the data and integration of diverse data type are therefore the need of the hour. A number of online as well as offline tools and server are available for accessing and retrieving large amount of data from public domain resources.NCBI E-utilities ,for an instance provide customized data utilities for various databases available at NCBI.These utilities require generation of URLs(Utility web services (http://www.cbi.nlm.nih.gov) can be generated for the user in a format specific for respective database either by manually or by writing scripts. NCBI computational biology branch focus on theoretical, analytical and applied computational approaches to a broad range of fundamental problem in molecular biology. A sequence in FASTA format is represented as a series of lines, each of which should no longer then 120 character and usually do not exceed 80 character. Although, multiple URL can be generated. As the initial step of reach this goal, Casey, R. M. (2005). NCBI, BLAST, FASTA tools have been developed to make the existing search utilities more effective and productive towards computational analysis of sequence. II. Method Computational analysis of sequence alignment computer programming for bioinformatics and data management .NCBI focuses on theoretical, analytical and applied computational approaches and widely used primary database such as European nucleotide archive, Uniport kb/Swiss –prot,a widely used method for assignment of secondary structure. A fasta sequence alignment software package used to functional and evolutionary relationship between sequences .Operating system—UNIX, LINUX,Ms-Windows. 2.1 Database And Corresponding Webservices Database name Web services type: URL NCBI E—Utility web services (http://www.cbi.nlm.nih.gov BLAST www.ebi.ac.uk/tools/sss/ncbiblast FASTA www.ebi.ac.uk/tools
  • 2.
    Computational Analysis OfSequences To Determining Expectation Value Commonly Used In DOI: 10.9790/0661-1803012022 www.iosrjournals.org 21 | Page EMBL/EBI EMBL-EBI web services (http://www.ebi.ac.uk/tools/ Uniprot KB Programmatic access services (http://www.uniprot.org) III. Results And Discussion Searching and browsing the database and generating curated datasets is an essential for processing analysis and interpretation.NCBI,FASTA,BLAST,this need for searching subset of sequence/data from few widely used database providing e-value in blast.occuring bychance with the observed the score/high score in E- Value.NCBI-BLAST provide a platform with a user-friendly interface.some of the common features of all utilities computational analysis of the sequence by using NCBI,BLAST,and FASTA.All utilities support data from single entry as well as multiple entries. The data is exchanged among these database on the daily basis .The Ncbi houses a series of databases relevant to bioinformatics tools and services. Major sequence include gene bank for dna sequences and pubmed. Epigenomic database of the ncbi (National center of biotechnology information) at NIH (National institute of health) means to collect the maps of epigenetic modification and the occurrence across the human genome. List of accession number may be provided in an interactive mode of a uploading a text file. >gi|1002623395|ref|NM_001320673.1| Solanum lycopersicum cysteine proteinase inhibitor A (LOC543632), mRNA GCTTTAATCAAACGCGCTCCATTAAATTCGTTGATTGTGACTGACTATTCTTCTTCTTCTTCTTATAT AT CTCAAAAACCCCATTTACAGAGACTCAAAAATGGCGACATTAGGAGGAATTCGTGAAGCTGGAGG ATCAG AAAACAGCCTAGAGATCAACGATCTTGCTCGTTTTGCTGTTGATGAACACAATAAGAAACAGAAT GCTCT TTTGGAGTTTGGAAAGGTTGTGAATGTGAAGGAACAAGTGGTTGCTGGAACCATGTACTACATAA CACTA GAGGCGACTGAAGGTGGTAAGAAGAAAGCATACGAAGCCAAAGTCTGGGTGAAGCCATGGCAGA ACTTCA AGCAAGTTGAAGACTTCAAGCTTATTGGGGATGCTGCTACTGCTTAACAAGCGCTGAACGATGTAT GACT CTTATGTCCTGAAAATAAAGCTAAACATATTTTAGCTTGTTCGTATTTGAATATCATAAAGTAAGTT CAT AACTCTATCGTGGATCTAAATTACGGATAACTATAGCTTTACAACGTTCCTTTTTCGTTCTATGCTC TTA TCTTATATACGATTTTGCTTTTCTGTTGCTAATAATATCTGAGAAACACAAGC (nucleotide sequence) (Sources—Fasta sequence related to solanum lycopersicum retrived from NCBI.) 3.1 Advanced Features Of Sequence Analysis The features table block is an important section in a nucleotide sequence entry .the sequence analysis of database accession number and cross linking is carried out by using the base URLs to relevant entries. An e-value of 1e-3 is annotate that there is a 0.001 chance that alignment would exist in the database by chance..if the database 610 sequence, then might expect that alignment occur may be 7 times. the score is measure of similarity between the sequences. It is a statical calculation based on the quality of alignment obtained from one database. These soft links could be dynamically established using properties such as homology, structural, functional similarities, membership to a certain biological process etc. An e-value of 1e-3 is saying that there is a 0.001 chance that that alignment would exist in the database by chance, that is, if the database contains 10000 sequences. An e-value of 0 is actually a rounded down probability (maybe 1e-250 or something), and is simply saying that there is (almost) no chance that alignment can occur by chance. IV. Conclusion Analysis of the sequence has been developed with the objective of providing a single platform for customizable data from the some of the major biological database .E value is increased from default value, larger list with more low scoring hits can be reported based on quality of alignment (the score) and size of the database by applying the sequence alignment method and bioinformatics tools.. The closer the E-value is towards 0, the better the alignment. V. Acknowledgement We extended our sincere thanks to Dr.Savita ―vice chancellor ―of Jharkhand rai university, Ranchi, India for kindly providing me the platform to carry out the research.
  • 3.
    Computational Analysis OfSequences To Determining Expectation Value Commonly Used In DOI: 10.9790/0661-1803012022 www.iosrjournals.org 22 | Page References [1]. Baxevanis D.Andreas,Quellete Fracis B.F. ,A Practical guide to the Analysis of gene and Proteins.,3rd Eddition October 2004,Published by Wiley, john and Sons [2]. Brudno M, Malde2.S, Poliakov A, Do CB, Couronne O, Dubchak I, Batzoglou S (2003)."Glocal alignment: finding rearrangements during alignment". Bioinformatics. 19. Suppl 1 (90001): i54–62. doi:10.1093/bioinformatics/btg1005. PMID 12855437. [3]. Casey, R. M. (2005). "BLAST Sequences Aid in Genomics and Proteomics". Business Intelligence Network. [4]. Eddy SR; Rost, Burkhard (2008). Rost, Burkhard, ed. "A probabilistic model of local sequence alignment that simplifies statistical significance estimation". PLoS Comput Biol4 (5): e1000069. doi:10.1371/journal.pcbi.1000069. PMC 2396288.PMID 18516236. [5]. Higgins Des,Taylor Willie 2000,Bioinformatics:Sequence structure and database practical approach‖ 1st Eddition october 2000,Published by oxford University Press. [6]. Jean-Michel Claverie & Cedric Notredame, Bioinformatics for Dummies, Wiley Publishing [7]. Lipman, DJ; Pearson, WR (1985). "Rapid and sensitive protein similarity searches". Science 227 (4693): 1435– 41. doi:10.1126/science.2983426. PMID 2983426. [8]. Mount .David 2004,Bioinformatics:-sequence $ Genome Analysis‖, published by Cold spring Harbour laboratory press. [9]. Oehmen, C.; Nieplocha, J. (2006). "ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis". IEEE Transactions on Parallel and Distributed Systems 17 (8): 740. doi:10.1109/TPDS.2006.112. [10]. "Program Selection Tables of the Blast NCBI web site". [11]. Pearson, WR; Lipman, DJ (1988). "Improved tools for biological sequence comparisons". Proceedings of the National Academy of Sciences of the United States of America 85 (8): 2444–8.doi:10.1073/pnas.85.8.2444. PMC 280013. PMID 3162770. [12]. Rick CM.Yader JT,Annu Rev Genet 1988 Classical and Molecular Genetics of tomato. [13]. http://www.ncbi.nlm.nih.gov/BLAST/full_options.html [14]. Scott Jw,Harbaugh Bk:Micro Tom: A miniature dwarf tomato. Florid Agar Experiment 1989. [15]. Taylor Willie, Higgins Des 2000,Bioinformatics :Sequence structure and database practical approach ―,1st Edition October 2000 ,Published by Oxford university press. [16]. Whitworth, W.A. (1901) Choice and Chance with One Thousand Exercises. Fifth edition. Deighton Bell, Cambridge. [Reprinted by Hafner Publishing Co., New York, 1959.] [17]. Zhao,K.;Chu,X.(2014). "G- BLASTN:acceleratingnucleotidealignmentbygraphicsprocessors". Bioinformatics 30 (10):138491. doi:10.1093/bioinformatics/btu0 47.PMID 24463183.