Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ArthropodEST: K-State Bioinformatics EST analysis pipeline * Sanjay Chellapilla 1 , Yoonseong Park 2 , Doina Caragea 3  an...
Upcoming SlideShare
Loading in …5

Arthropod es tpipeline_poster


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Arthropod es tpipeline_poster

  1. 1. ArthropodEST: K-State Bioinformatics EST analysis pipeline * Sanjay Chellapilla 1 , Yoonseong Park 2 , Doina Caragea 3 and Susan J. Brown 1 1 Bioinformatics Center, Division of Biology 2 Department of Entomology 3 Department of Computing and Information Sciences Kansas State University, Manhattan KS 66506 ABSTRACT Expressed Sequence Tags (ESTs), produced by single-pass end-sequencing of cDNA clones, generate large datasets that are instrumental in gene discovery and gene sequence determination. Although several EST data analysis pipelines are available on the WWW ( e.g. ESTpass, EGassembler, ESTexplorer etc.), the WWW-accessible K-State Bioinformatics EST analysis pipeline ‘ArthropodEST’ goes further than these existing pipelines in providing more options and analyses, along with a user-friendly interface. The pipeline was developed utilizing freely available bioinformatics and system software (academic or F/OSS licenses). Available options in the pipeline include input sequence cleaning and screening for vectors and contaminants, masking repetitive sequences using repeat databases, clustering and assembly into contigs, computing ORFs (Open Reading Frames) and/or signal-peptide predictions, and assigning functional annotations to the contigs and singletons. The pipeline sends out automatic result notification email(s) containing a unique URL to download results from, to the user‘s email address. A summary report (automatically generated) of the analyses is included in the results available for download. The pipeline is accessible at Acknowledgements: Supported by KSU-TE-AGC (SC), KSU Bioinformatics Center (DC, SC) and K-INBRE (DC, SC). KANSAS STATE UNIVERSITY KSU BIOINFORMATICS CENTER KSU ARTHROPOD GENOMICS CENTER K-INBRE Input sequences cleaning Vector/contaminant screening Assembly with optional prior clustering into contigs, singletons User downloads results and report from unique URL automatically sent by email Process user inputs, display project-receipt confirmation and summary, send automatic confirmation email, invoke pipeline shell script Further analyses: functional annotations and/or signal-peptide predictions server-side CGI script server-side Pipeline shell-script client-side (User) client-side (User) ArthropodEST homepage COMPONENTS OF THE PIPELINE (a) System software: GNU/Linux Ubuntu 2.6.24-23-server, bash 3.2.39, Apache 2.2.8 with mod_perl/2.0.3, PERL 5.8.8 with PERL modules CGI 3.29, Mail:Mailer 1.74, File::Temp 0.18, MySQL 5.0 and Postfix 2.5.4 Mail Transport Agent (MTA). (b) Bioinformatics software: - TGICL software suite [ ] - Vector databases: NCBI UniVec [ ] EMBL EmVec [ ] - RepeatMasker [ ] and associated RepBase libraries [ ] requires either cross_match [ ] or wu-blastall [ ] - CAP3 sequence-assembly program [ ]     - NCBI BLAST suite [ ] and/or wu-blastall [ ] - blast2GO pipeline version B2G4PIPE [ ] - signalp [ ] and EMBOSS [ ] (c) In-house developed software: WWW-interface HTML/CSS, server-side CGI, PERL, bash shell and awk scripts User-input: project name, e-mail address, input files and options/parameters for analyses Repeat-masking with standard RepBase libraries WORKFLOW