SlideShare a Scribd company logo
BioScope
  Advanced Search Grammar Tool for identification of Functional
                        Noncoding Elements

          Principal Investigator - Hariharane Ramasamy
                         Sanjeev Mishra
                          Tulasi Ravuri
Summary
The completion of several genomic sequences has provided the
motivation for development of a tool that can aid in locating
and analyzing transcription factor binding sites (TFBS)
responsible for regulating the gene transcritption. TFBS are
short sequences 4-20 in length, and often located near the genes
they regulate. These sequences occur in groups or modules also
called enhancer or cisRegulatory modules (CRM). CRM contain one
or more TFBS and interact with a specific combination of
transcription factors to regulate gene expression. Such
sequences are often abundant near the genes they regulate. The
goal of developmental biologists is to understand how these CRM
are organized in a genome, and regulate the gene. Laboratory
methods, that are performed to locate CRM, are often laborious
and time consuming. Hence computational methods have become an
invaluable tool. The success of computational methods depends on
how well they can be utilized in a lab environment. Several
computational tools exist to locate motifs in a genomic
sequence. These tools fall under two categories. The first
category tools employ statistical and probabilistic methods
using known motifs and the frequencies of codons in a genomics
sequence. Although some motifs have been discovered using these
tools, often they yield more false positives. Tools in the
second category employ fundamental principles of the
combinatorial logic underlying the occurrence of the enhancers /
cisRegulatory modules (CRM). It is believed that genes with
similar temporal and spatial expression patterns are controlled
by similar CRM. The experimental biologists who are
knowledgeable about CRM occurrences need an efficient tool to
locate them by applying the combinatorial knowledge such as
counts of the binding site occurrences within a specified width,
logical combination of one of more binding sites, orientation
and more. The tools should be efficient, scalable, and fast. The
aim of this proposal is to build such tools.

1 Introduction
Several genomes including the human and the mouse genomes have
been sequenced close to completion. In this post-genomic era, it
is imperative that researchers are equipped with novel
methodologies that will facilitate them to rapidly and
accurately identify, annotate and functionally characterize
genes. Thus, mining of genomics and proteomics data using
computational approaches seems to be the superior way to extract
information from these resources in a short time frame. The
transcriptional regulation of a gene depends on the concerted
action of multiple transcription factors that bind to cis-
regulatory modules located in the vicinity of the gene. Cis-
regulatory modules are regulatory elements that occur close to
each other and control the spatial and temporal expression of
genes. The regulatory language that the genome uses to dictate
transcriptional dynamics can be revealed by identifying these
cis-regulatory elements. Often these elements are transferred
evolutionarily across organisms with little mutations but
without losing their functional value. Knowledge of these motifs
may help drive discovery of similar genes in other closely
related organisms. The availability of accurate models along
with useful search methods with enhanced sensitivity and
specificity will be the first step in being able to detect
putative regulatory elements in a genome-wide manner.

2 Background

The identification of regulatory sequences and their location in
a genome is an important step in understanding the gene
expression. Genes that have similar expression are believed to
have similar regulatory logic. Such genes are governed by unique
combinatorial transcriptional codes known as cis-acting
regulatory modules (CRMs) or enhancers. CRMs are oligonucleotide
sequences that act together to activate or suppress the gene. In
the past, several studies have been performed in understanding
the behavior of enhancers and their role in developmental
biology. The experiments, performed to study the expression of
the gene in a developmental stage, are often time consuming,
and laborious. Computational tools are often sought by
biologists to scan the whole genome for better candidate
selection of these regulatory regions.

Several computational methods exist to predict the regulatory
motif sequences. The motifs are overly represented near the gene
they transcribe. Using the earlier knowledge and position based
probabilities, several tools were built to predict new
regulatory motifs. CisAnalyst, developed by Berman et. al., has
been successfully applied for fruitfly to find new clusters
using a purely computational approach. Bioprospector uses Gibb
sampler to predict regulatory sequences. The main problem with
these tools are the presence of background noise and the
inability to differentiate between a true regulatory motif
versus a false positive. Besides, the variations in genomic
sequence across species further increases the noise. Although
computational methods have served well for purposes of finding
genes and even individual exons in genomic data, regulatory
element predictions have proven difficult.

Markstein [1] developed a tool for biologists to search using
the previous knowledge of enhancers. The tool allows the
biologists to input desired regular expressions using {A,T,G,C},
gene name, width, and proximity constraints. However, the tool
is genome-specific and does not contain some important
constraints like distance to the next binding site, orientation
and order of the motifs, low affinity sequences, variable length
regular expression, and user-defined overlap constraints.

A brief survey for computational identification of regulatory
DNA is described in Dmitri Papatsenko and Michael Levine. The
paper elucidates the need for computational tools providing a
comparison of available tools without going into the specific
details of the algorithms. The article however emphasizes the
need for a fast and efficient computational tools.

3 Project Proposal

The project aims to provide the following :
1.restrictive search capabilities like distance to the next
motif, orientation of the motif, low affinity motif, order of
motif occurrence [5],
2.limited integrated information like nearby genes/exons, gene
expression data, annotation details around the target once it is
located [5],
3.interactive chain search where a search for a target on an
organism can be linked to intra species or cross species search.
4.Scalable, and efficient


More importantly, our proposed module will be highly flexible,
allowing constant integration of newer genomes and at the same
time being a powerful tool that will allow the researcher to
search for complex gene clusters.

To that end we developed a software program that will more
precisely locate the regulatory region with far more ease for
the researcher than programs that are currently available. The
control, more importantly, of the result of the program will be
given to developmental biologist. The tool is very ideal for a
lab environment.
3.1 Phase I Specific Aims
1.To develop a web-based module that allows the researcher to
search for cisregulatory elements. The tool will input motif
and search constraints as mentioned in figure 1 and will display
results as shown in figure 2 and 3. The search feature of the
program will provide
        ◦ability to enter 10 regular expressions using A,T,G,C
        and letters given in the table below.
        ◦an option to allow self overlap
        ◦capacity to input a name for the motif
        ◦a box to specify width constraint
        ◦flexibility to input logical combination of motifs typed
        in (1) such as (2A and 2B), (A or B or C)
        ◦ability to disallow overlap across motifs type in first
        item.
        ◦To type name of the gene within a specified distance
        once a cluster is found using the above rules
        ◦a name to save the results. The name will/can be used in
        SuperCluster


       Letter       Codon
       B            C,G,T
       D            A,G,T
       H            A,C,T
       K            G,T
       M            A,C
       N            A,C,G,T
       R            A,G
       S            C,G
       V            A,C,G
       W            A,T
       Y            C,


4 Summary: Significance of proposed work
The tool will also provide integration and maintenance that
include
1. Update to new versions of genomics sequences when they are
available from the public site.
2. Rerun the program on old results and inform automatically via
email on new results.
3. Integrate with Gene Ontology information and other useful
databases as advised by biologists.
4. Provide a work_ow like tool which takes the query run on an
organism and apply it another organism with a single key
5. Storage and maintenance of results.

5 Commercialization Strategy
After Phase I launch, every person who visits the site will be
requested to fill their profile before access to use their
program along with the purpose of the visit. The visitor will
also be requested to give feedback which will be collected and
used as leads to prepare the BioRegulatory Appliance in Phase II.

6 KEY PERSONNEL

1)Hariharane Ramasamy is pursing his PhD Computer Science, at
Illinois Institute of Technology, IL., and has more than 15
years of experience in developing applied computational tools
for biomedical engineering. Few relevant tools include
•implemented motif search system for genomic sequences that
displays the results graphically on the screen along with the
sequence annotation.
•developed surveillance system to detect novel sequences.
•Developed a program that calculates the digest of peptides for
user input proteins and also performs differential combination
of post-translational modification along with pI/Mw calculations.
•Pattern induced Multiple alignment using properties of amino
acids.
•New Extended Genetic Algorithm for 3D lattice simulation of
protein folding using conflicting criteria,
•Simulation of human stand-sit movement using 3 link stick figure
model.

Sanjeev Mishra
Sanjeev Mishra is a seasoned professional having about 20 years
of industry experience. Half of his industry life is spent doing
startups in the field of business activity management, business
intelligence and mobile application and management platforms.
Rest half in research and development. He is awarded with one US
patent. Sanjeev is passionate about biking, hiking, running,
meditation and gardening. Sanjeev holds a masters degree in
Physics from DBS College Dehradun, India.


Tulasi Ravuri
Tulasi Ravuri is an experienced software engineering manager
with 23 years of experience at several Silicon Valley companies
such as Unisys, Novell, McAfee, DoCoMo Labs and others. Through
his broad career he has helped bring several products to market.
His most recent work is in Life Sciences Regulatory Compliance
and Administration software suite used by Universities like
Stanford, Berkeley, Harvard; Pharma companies such as GSK,
Hospitals such as Palo Alto Medical Foundation and Government.
He advises several software companies and is an advocate of open
source software. He has an MSCS from University of Louisiana &
BS (Chemical Engg.) from Andhra University, India.


7 Consultants
In phase I, the following help will be used to guide the program
to Phase II
1. two student interns for refining the search and gathering
data on the abilities of the program
2. Consultant for designing user interface and graphics display

8 Prior Support
The proposal has no prior or current support.

References cited
[1] Marc S. Ha_on, Yonaton Grad, George M. Church, Alan M.
Michelson, computation-Based Discovery of Related
Transcriptional Regulatory Modules and Motifs Using an
Experimentally Validated Combinatorial Model Howard Hughes
Medical Institute and Department of
Medicine, Brigham and Women's Hospital, Link®oping University,
Sweden.
[2] Dimitri Papatsenko, Michael Levine, Computational
Identification of regulatory DNAs underlying animal development
Nature Methods, Vol. 2 No. 7:529-534, 2005.
[3] Markstein, M., Markstein, P., Markstein, V. Levine, M.S.,
ìGenome-wide analysis of clustered Dorsal binding sites
identifies putative target genes in the Drosophila embryo,
Proc.Natl Acad. Sci. USA, Vol. 99:763-768, 2002.
[4] Benjamin P. Berman, Barret D. Pfeiffer, Todd R. Laverty,
Steven L.Salzberg, Gerald M.Rubin, Michael B. Eisen and Susan E.
Celniker, Computational identification of developmental
enhancers : conservation and function of transcription factor
binding-site clusters in Drosophila melanogaster and Drosophila
pseudoobscura. Genome Biology, Vol. 5:R81, 2004.
[5] Alan M. Michelson,Deciphering genetic regulatory codes : A
challenge for functional genomics. PNAS, Vol. 99 No. 2, 546-548,
2002.
[6] Matthias Harbers, Piero Carninci, Tag-based approaches for
transcriptiome research and genome annotation. Nature Methods,
Vol. 2, No 7, 499-502, 2005.
[7] Yueyi Liu, Liping Wei, Sera_m Batzaglou, Douglas L. Brutlag,
Jun S. Liu and X.Shirley Liu A suite of web-based programs to
search for transcriptional regulatory motifs. Nucleic Acids
Research, Vol. 32 Web Server Issue, 2004.
[8] Mike P. Liang, Olga G. Troyanskaya, Alain Laederach, Douglas
L Brutlag, and Russ B. Altman Computational Functional Genomics.
IEEE Signal Processing Magazine, 2004.

Budget


   Description                    Expense Amount for 6 months
   Salary for Principal           $36,000
   Investigator
   Salary for Software engineer   $30,000
   Salary for 2 student interns   $24,000
   Salary for Biology             $24,000
   consultant
   Hardware and Software cost     $24,000
   (4)
   Internet & Cloud hosting       $12,000
   services
   Miscellaneous expenses         $6,000
   Office rent & expenses         $15,000
   Travel                         $5,000
   Total Cost                     $176,000
Figure 1: Input web form to search the genomic sequence using
                   user defined constraints
Figure 2: Results summary




Figure 3: Detailed results display for
Figure 4: Flow chart describing the flow of the algorithm
Figure 5: Diagram describing the Phase I flow
Appendix

 The ultimate goal is to build a self-contained BioRegulatory
appliance that supports automatic updates of the genomic
sequences, rerun the old queries on the new sequences and inform
users of new results, thereby saving enormous amount of time for
the developmental biologist who depend on computers to locate
the target.

Phase II Plan

Specific Aims - To enhance the available module, Biocis so that
the module is user friendly and easy to navigate by a
researcher. Phase II will also aim to create a work_ow module
that will allow easy storage and retrieval of data from
disparate sources and will integrate with useful information.
The phase II feature will include
1.Advanced Regular Expression Search Tool for genomic sequences
that uses the prebuilt index positions for 4 length bases (AAAA,
AAAG, ,,,, GCGC, ...,TTTT) to locate the motifs.
2.Advance multithreaded server tool to perform fast parallel
search of the motif sequences.
3.Advanced caching in memory/disk and database to avoid repeated
search of previous sequences
4.Automated daemon process to get new releases and rerun the
saved searches, inform via email to scientists on new results.
5.Link to GeneOntology database that provides gene function
information
6.Cross species ortholog results from existing public annotated
database.
7.simple statitical tools to look at the motif occurrences on
the whole genome from the interesting results
8.creation of BioRegulatroy software package and plan for
designing a spec for BioRegulatory Appliance.
9.to provide supercluster tool which will perform a similar
search as in Aim I.
10.The input in A -J are the names of the search performed in Aim
I. The tool will help supporting the theory where cluster of
enhancers act to in regulating the gene. A sample input form is
shown in 6



3.1.2 Phase III
The phase III
•Creating a sound computing infrastructure. The infrastructure
requires writing(?) a separate server to perform the
search/caching capabilities. The search module will not be run
via a web server like some of the existing tools. Every request
to perform a search on the web server indicates the whole genome
sequence will be read in memory. The length of genomic sequence
varies from 1 Megabytes to 200 Megabytes in length. If the
number of users on the system grows, the system will run out of
memory, thus imposing a limit on the number of users. Using a
web server to preload the data during startup is not advisable.
Hence a separate server, to perform the search for any generic
genome sequence is needed. The caching in phase I is achieved in
two levels - memory, and disk.
• will concentrate on adding more features to the query, creating
a continuity in search.

For example, once one performs a search, the result will display
genes along with the other species orthologs. The search can be
immediately performed for the same enhancer for the species that
has the closest orthologs. Phase III will also look at improving
the performance of the BioRegulatory appliance.
Figure 6: SuperCluster - Web form for user input

More Related Content

What's hot

Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Data Con LA
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
GenomeInABottle
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
GenomeInABottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
GenomeInABottle
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
GenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
GenomeInABottle
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
GenomeInABottle
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
GenomeInABottle
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
GenomeInABottle
 
Giab jan2016 analysis team breakout SNP indel update zook
Giab jan2016 analysis team breakout SNP indel update zookGiab jan2016 analysis team breakout SNP indel update zook
Giab jan2016 analysis team breakout SNP indel update zook
GenomeInABottle
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
GenomeInABottle
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
GenomeInABottle
 
Giab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGiab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summary
GenomeInABottle
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
GenomeInABottle
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
GenomeInABottle
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
GenomeInABottle
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Genome Reference Consortium
 

What's hot (20)

Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
Usual Questions with Unusual Answers: Application of Multi-class Supervised A...
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin171114 best practices for benchmarking variant calls justin
171114 best practices for benchmarking variant calls justin
 
161115 precision fda giab
161115 precision fda giab161115 precision fda giab
161115 precision fda giab
 
Giab jan2016 analysis team breakout SNP indel update zook
Giab jan2016 analysis team breakout SNP indel update zookGiab jan2016 analysis team breakout SNP indel update zook
Giab jan2016 analysis team breakout SNP indel update zook
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
 
Giab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summaryGiab jan2016 analysis team breakout summary
Giab jan2016 analysis team breakout summary
 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
 
Jan2016 bina giab
Jan2016 bina giabJan2016 bina giab
Jan2016 bina giab
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 

Similar to Bio Scope

COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
csandit
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
ijcsit
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
Editor IJCATR
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
maulikchaudhary8
 
Software Testing Using Genetic Algorithms
Software Testing Using Genetic AlgorithmsSoftware Testing Using Genetic Algorithms
Software Testing Using Genetic Algorithms
IJCSES Journal
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
ijitcs
 
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
rahulmonikasharma
 
Genetic algorithm fitness function
Genetic algorithm fitness functionGenetic algorithm fitness function
Genetic algorithm fitness function
Prof Ansari
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
IJSTA
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
PhD Assistance
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
ChijiokeNsofor
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
Respa Peter
 
Gene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptxGene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptx
University of Petroleum and Energy studies
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
International Journal of Engineering Inventions www.ijeijournal.com
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
Damian R. Mingle, MBA
 
Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...
IJCSEA Journal
 
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...
ijseajournal
 
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
rahulmonikasharma
 

Similar to Bio Scope (20)

COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Software Testing Using Genetic Algorithms
Software Testing Using Genetic AlgorithmsSoftware Testing Using Genetic Algorithms
Software Testing Using Genetic Algorithms
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
 
Genetic algorithm fitness function
Genetic algorithm fitness functionGenetic algorithm fitness function
Genetic algorithm fitness function
 
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceEfficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxBTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptx
 
Applications of bioinformatics
Applications of bioinformaticsApplications of bioinformatics
Applications of bioinformatics
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Gene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptxGene identification using bioinformatic tools.pptx
Gene identification using bioinformatic tools.pptx
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...Classification of Microarray Gene Expression Data by Gene Combinations using ...
Classification of Microarray Gene Expression Data by Gene Combinations using ...
 
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...
 
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

Bio Scope

  • 1. BioScope Advanced Search Grammar Tool for identification of Functional Noncoding Elements Principal Investigator - Hariharane Ramasamy Sanjeev Mishra Tulasi Ravuri Summary The completion of several genomic sequences has provided the motivation for development of a tool that can aid in locating and analyzing transcription factor binding sites (TFBS) responsible for regulating the gene transcritption. TFBS are short sequences 4-20 in length, and often located near the genes they regulate. These sequences occur in groups or modules also called enhancer or cisRegulatory modules (CRM). CRM contain one or more TFBS and interact with a specific combination of transcription factors to regulate gene expression. Such sequences are often abundant near the genes they regulate. The goal of developmental biologists is to understand how these CRM are organized in a genome, and regulate the gene. Laboratory methods, that are performed to locate CRM, are often laborious and time consuming. Hence computational methods have become an invaluable tool. The success of computational methods depends on how well they can be utilized in a lab environment. Several computational tools exist to locate motifs in a genomic sequence. These tools fall under two categories. The first category tools employ statistical and probabilistic methods using known motifs and the frequencies of codons in a genomics sequence. Although some motifs have been discovered using these tools, often they yield more false positives. Tools in the second category employ fundamental principles of the combinatorial logic underlying the occurrence of the enhancers / cisRegulatory modules (CRM). It is believed that genes with similar temporal and spatial expression patterns are controlled by similar CRM. The experimental biologists who are knowledgeable about CRM occurrences need an efficient tool to locate them by applying the combinatorial knowledge such as counts of the binding site occurrences within a specified width, logical combination of one of more binding sites, orientation and more. The tools should be efficient, scalable, and fast. The aim of this proposal is to build such tools. 1 Introduction Several genomes including the human and the mouse genomes have been sequenced close to completion. In this post-genomic era, it is imperative that researchers are equipped with novel methodologies that will facilitate them to rapidly and
  • 2. accurately identify, annotate and functionally characterize genes. Thus, mining of genomics and proteomics data using computational approaches seems to be the superior way to extract information from these resources in a short time frame. The transcriptional regulation of a gene depends on the concerted action of multiple transcription factors that bind to cis- regulatory modules located in the vicinity of the gene. Cis- regulatory modules are regulatory elements that occur close to each other and control the spatial and temporal expression of genes. The regulatory language that the genome uses to dictate transcriptional dynamics can be revealed by identifying these cis-regulatory elements. Often these elements are transferred evolutionarily across organisms with little mutations but without losing their functional value. Knowledge of these motifs may help drive discovery of similar genes in other closely related organisms. The availability of accurate models along with useful search methods with enhanced sensitivity and specificity will be the first step in being able to detect putative regulatory elements in a genome-wide manner. 2 Background The identification of regulatory sequences and their location in a genome is an important step in understanding the gene expression. Genes that have similar expression are believed to have similar regulatory logic. Such genes are governed by unique combinatorial transcriptional codes known as cis-acting regulatory modules (CRMs) or enhancers. CRMs are oligonucleotide sequences that act together to activate or suppress the gene. In the past, several studies have been performed in understanding the behavior of enhancers and their role in developmental biology. The experiments, performed to study the expression of the gene in a developmental stage, are often time consuming, and laborious. Computational tools are often sought by biologists to scan the whole genome for better candidate selection of these regulatory regions. Several computational methods exist to predict the regulatory motif sequences. The motifs are overly represented near the gene they transcribe. Using the earlier knowledge and position based probabilities, several tools were built to predict new regulatory motifs. CisAnalyst, developed by Berman et. al., has been successfully applied for fruitfly to find new clusters using a purely computational approach. Bioprospector uses Gibb sampler to predict regulatory sequences. The main problem with these tools are the presence of background noise and the inability to differentiate between a true regulatory motif
  • 3. versus a false positive. Besides, the variations in genomic sequence across species further increases the noise. Although computational methods have served well for purposes of finding genes and even individual exons in genomic data, regulatory element predictions have proven difficult. Markstein [1] developed a tool for biologists to search using the previous knowledge of enhancers. The tool allows the biologists to input desired regular expressions using {A,T,G,C}, gene name, width, and proximity constraints. However, the tool is genome-specific and does not contain some important constraints like distance to the next binding site, orientation and order of the motifs, low affinity sequences, variable length regular expression, and user-defined overlap constraints. A brief survey for computational identification of regulatory DNA is described in Dmitri Papatsenko and Michael Levine. The paper elucidates the need for computational tools providing a comparison of available tools without going into the specific details of the algorithms. The article however emphasizes the need for a fast and efficient computational tools. 3 Project Proposal The project aims to provide the following : 1.restrictive search capabilities like distance to the next motif, orientation of the motif, low affinity motif, order of motif occurrence [5], 2.limited integrated information like nearby genes/exons, gene expression data, annotation details around the target once it is located [5], 3.interactive chain search where a search for a target on an organism can be linked to intra species or cross species search. 4.Scalable, and efficient More importantly, our proposed module will be highly flexible, allowing constant integration of newer genomes and at the same time being a powerful tool that will allow the researcher to search for complex gene clusters. To that end we developed a software program that will more precisely locate the regulatory region with far more ease for the researcher than programs that are currently available. The control, more importantly, of the result of the program will be given to developmental biologist. The tool is very ideal for a lab environment.
  • 4. 3.1 Phase I Specific Aims 1.To develop a web-based module that allows the researcher to search for cisregulatory elements. The tool will input motif and search constraints as mentioned in figure 1 and will display results as shown in figure 2 and 3. The search feature of the program will provide ◦ability to enter 10 regular expressions using A,T,G,C and letters given in the table below. ◦an option to allow self overlap ◦capacity to input a name for the motif ◦a box to specify width constraint ◦flexibility to input logical combination of motifs typed in (1) such as (2A and 2B), (A or B or C) ◦ability to disallow overlap across motifs type in first item. ◦To type name of the gene within a specified distance once a cluster is found using the above rules ◦a name to save the results. The name will/can be used in SuperCluster Letter Codon B C,G,T D A,G,T H A,C,T K G,T M A,C N A,C,G,T R A,G S C,G V A,C,G W A,T Y C, 4 Summary: Significance of proposed work The tool will also provide integration and maintenance that include 1. Update to new versions of genomics sequences when they are available from the public site. 2. Rerun the program on old results and inform automatically via email on new results. 3. Integrate with Gene Ontology information and other useful databases as advised by biologists.
  • 5. 4. Provide a work_ow like tool which takes the query run on an organism and apply it another organism with a single key 5. Storage and maintenance of results. 5 Commercialization Strategy After Phase I launch, every person who visits the site will be requested to fill their profile before access to use their program along with the purpose of the visit. The visitor will also be requested to give feedback which will be collected and used as leads to prepare the BioRegulatory Appliance in Phase II. 6 KEY PERSONNEL 1)Hariharane Ramasamy is pursing his PhD Computer Science, at Illinois Institute of Technology, IL., and has more than 15 years of experience in developing applied computational tools for biomedical engineering. Few relevant tools include •implemented motif search system for genomic sequences that displays the results graphically on the screen along with the sequence annotation. •developed surveillance system to detect novel sequences. •Developed a program that calculates the digest of peptides for user input proteins and also performs differential combination of post-translational modification along with pI/Mw calculations. •Pattern induced Multiple alignment using properties of amino acids. •New Extended Genetic Algorithm for 3D lattice simulation of protein folding using conflicting criteria, •Simulation of human stand-sit movement using 3 link stick figure model. Sanjeev Mishra Sanjeev Mishra is a seasoned professional having about 20 years of industry experience. Half of his industry life is spent doing startups in the field of business activity management, business intelligence and mobile application and management platforms. Rest half in research and development. He is awarded with one US patent. Sanjeev is passionate about biking, hiking, running, meditation and gardening. Sanjeev holds a masters degree in Physics from DBS College Dehradun, India. Tulasi Ravuri Tulasi Ravuri is an experienced software engineering manager with 23 years of experience at several Silicon Valley companies such as Unisys, Novell, McAfee, DoCoMo Labs and others. Through
  • 6. his broad career he has helped bring several products to market. His most recent work is in Life Sciences Regulatory Compliance and Administration software suite used by Universities like Stanford, Berkeley, Harvard; Pharma companies such as GSK, Hospitals such as Palo Alto Medical Foundation and Government. He advises several software companies and is an advocate of open source software. He has an MSCS from University of Louisiana & BS (Chemical Engg.) from Andhra University, India. 7 Consultants In phase I, the following help will be used to guide the program to Phase II 1. two student interns for refining the search and gathering data on the abilities of the program 2. Consultant for designing user interface and graphics display 8 Prior Support The proposal has no prior or current support. References cited [1] Marc S. Ha_on, Yonaton Grad, George M. Church, Alan M. Michelson, computation-Based Discovery of Related Transcriptional Regulatory Modules and Motifs Using an Experimentally Validated Combinatorial Model Howard Hughes Medical Institute and Department of Medicine, Brigham and Women's Hospital, Link®oping University, Sweden. [2] Dimitri Papatsenko, Michael Levine, Computational Identification of regulatory DNAs underlying animal development Nature Methods, Vol. 2 No. 7:529-534, 2005. [3] Markstein, M., Markstein, P., Markstein, V. Levine, M.S., ìGenome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo, Proc.Natl Acad. Sci. USA, Vol. 99:763-768, 2002. [4] Benjamin P. Berman, Barret D. Pfeiffer, Todd R. Laverty, Steven L.Salzberg, Gerald M.Rubin, Michael B. Eisen and Susan E. Celniker, Computational identification of developmental enhancers : conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biology, Vol. 5:R81, 2004. [5] Alan M. Michelson,Deciphering genetic regulatory codes : A challenge for functional genomics. PNAS, Vol. 99 No. 2, 546-548, 2002. [6] Matthias Harbers, Piero Carninci, Tag-based approaches for transcriptiome research and genome annotation. Nature Methods, Vol. 2, No 7, 499-502, 2005.
  • 7. [7] Yueyi Liu, Liping Wei, Sera_m Batzaglou, Douglas L. Brutlag, Jun S. Liu and X.Shirley Liu A suite of web-based programs to search for transcriptional regulatory motifs. Nucleic Acids Research, Vol. 32 Web Server Issue, 2004. [8] Mike P. Liang, Olga G. Troyanskaya, Alain Laederach, Douglas L Brutlag, and Russ B. Altman Computational Functional Genomics. IEEE Signal Processing Magazine, 2004. Budget Description Expense Amount for 6 months Salary for Principal $36,000 Investigator Salary for Software engineer $30,000 Salary for 2 student interns $24,000 Salary for Biology $24,000 consultant Hardware and Software cost $24,000 (4) Internet & Cloud hosting $12,000 services Miscellaneous expenses $6,000 Office rent & expenses $15,000 Travel $5,000 Total Cost $176,000
  • 8. Figure 1: Input web form to search the genomic sequence using user defined constraints
  • 9. Figure 2: Results summary Figure 3: Detailed results display for
  • 10. Figure 4: Flow chart describing the flow of the algorithm
  • 11. Figure 5: Diagram describing the Phase I flow
  • 12. Appendix The ultimate goal is to build a self-contained BioRegulatory appliance that supports automatic updates of the genomic sequences, rerun the old queries on the new sequences and inform users of new results, thereby saving enormous amount of time for the developmental biologist who depend on computers to locate the target. Phase II Plan Specific Aims - To enhance the available module, Biocis so that the module is user friendly and easy to navigate by a researcher. Phase II will also aim to create a work_ow module that will allow easy storage and retrieval of data from disparate sources and will integrate with useful information. The phase II feature will include 1.Advanced Regular Expression Search Tool for genomic sequences that uses the prebuilt index positions for 4 length bases (AAAA, AAAG, ,,,, GCGC, ...,TTTT) to locate the motifs. 2.Advance multithreaded server tool to perform fast parallel search of the motif sequences. 3.Advanced caching in memory/disk and database to avoid repeated search of previous sequences 4.Automated daemon process to get new releases and rerun the saved searches, inform via email to scientists on new results. 5.Link to GeneOntology database that provides gene function information 6.Cross species ortholog results from existing public annotated database. 7.simple statitical tools to look at the motif occurrences on the whole genome from the interesting results 8.creation of BioRegulatroy software package and plan for designing a spec for BioRegulatory Appliance. 9.to provide supercluster tool which will perform a similar search as in Aim I. 10.The input in A -J are the names of the search performed in Aim I. The tool will help supporting the theory where cluster of enhancers act to in regulating the gene. A sample input form is shown in 6 3.1.2 Phase III The phase III
  • 13. •Creating a sound computing infrastructure. The infrastructure requires writing(?) a separate server to perform the search/caching capabilities. The search module will not be run via a web server like some of the existing tools. Every request to perform a search on the web server indicates the whole genome sequence will be read in memory. The length of genomic sequence varies from 1 Megabytes to 200 Megabytes in length. If the number of users on the system grows, the system will run out of memory, thus imposing a limit on the number of users. Using a web server to preload the data during startup is not advisable. Hence a separate server, to perform the search for any generic genome sequence is needed. The caching in phase I is achieved in two levels - memory, and disk. • will concentrate on adding more features to the query, creating a continuity in search. For example, once one performs a search, the result will display genes along with the other species orthologs. The search can be immediately performed for the same enhancer for the species that has the closest orthologs. Phase III will also look at improving the performance of the BioRegulatory appliance.
  • 14. Figure 6: SuperCluster - Web form for user input