SlideShare a Scribd company logo
1 of 15
SUBMITTING DNA
SEQUENCES TO THE
DATABASES, SEQUIN
P:1 U:2
Vedanti S. Gharat
Roll No. :- 09
M.Sc. Biotech part 1
• DNA sequence records from the public databases
(DDBJ/EMBL/GenBank) are essential components of
computational analysis in molecular biology.
• The sequence records are also reagents for improved curated
resources like LocusLink or many of the protein databases.
• Accurate and informative biological annotation of sequence
records is critical in determining the function of a disease gene
by sequence similarity search.
• The names or functions of the encoded protein products, the
name of the genetic locus, and the link to the original
publication of that sequence make a sequence record of
immediate value to the scientist who retrieves it as the result of a
BLAST or Entrez search.
• The submission process is governed by an international,
collaborative agreement. Sequences submitted to any one of the
three databases participating in this collaboration will appear in
the other two databases within a few days of their release to the
public.
WHY, WHERE, AND WHAT TO
SUBMIT?
• One should submit to whichever of the three public databases is most
convenient.
• This may be the database that is closest geographically, it may be the
repository one has always used in the past, or it may simply be the
place one’s submission is likely to receive the best attention.
• Under normal circumstances, an accession number will be returned
within one workday, and a finished record should be available
within 5–10 working days, depending on the information provided
by the submitter.
• Submitting data to the database is not the end of one’s scientific
obligation. Updating the record as more information becomes
available will ensure that the information within the record will
survive time and scientific rigor.
• Submissions of sequences are done electronically: via the World
Wide Web, by electronic mail, or on a computer disk sent via regular
postal mail.
DNA/RNA
• The submission process is quite simple, but care must be taken
to provide information that is accurate and as biologically sound
as possible, to ensure maximal usability by the scientific
community.
1]Nature of the Sequence.
Is it of genomic or mRNA origin?
2] Is the Sequence Synthetic, But Not Artificial?
There is a special division in the nucleotide databases for synthetic
molecules, sequences put together experimentally that do not
occur naturally in the environment. The DNA sequence databases
do not accept computer-generated sequences.
3] How Accurate is the Sequence?
The assumption that the submitted sequence is as accurate as
possible usually means at least two-pass coverage on the whole
submitted sequence. Equally important is the verification of the
final submitted sequence.
• Organism
All DNA sequence records must show the organism from which the
sequence was derived. Many inferences are made from the phylogenetic
position of the records present in the databases.
• Citation
Having a citation in the submission being prepared is of great
importance, even if it consists of just a temporary list of authors and a
working title. Updating these citations at publication time is also
important to the value of the record.
• Coding Sequence(s)
A submission of nucleotide also means the inclusion of the protein
sequences it encodes. This is important for two reasons:
• Protein databases (e.g., SWISS-PROT and PIR) are almost entirely
populated by protein sequences present in DNA sequence database
records.
• The inclusion of the protein sequence serves as an important, if not
essential, validation step in the submission process.
The coding sequence features, or CDS, are the links between the DNA
or RNA and the protein sequences, and their correct positioning is
central in the validation, as is the correct genetic code.
POPULATION, PHYLOGENETIC, AND MUTATION
STUDIES
• The nucleotide databases are now accepting population,
phylogenetic, and mutational studies as submitted sequence sets,
and, although this information is not adequately represented in
the flatfile records, it is appearing in the various databases.
• This allows the submission of a group of related sequences
together, with entry of shared information required only once.
• Sequin also allows the user to include the alignment generated
with a favorite alignment tool and to submit this information
with the DNA sequence.
PROTEIN-ONLY SUBMISSIONS
• In most cases, protein sequences come with a DNA sequence.
There are some exceptions—people do sequence proteins
directly—and such sequences must be submitted without a
corresponding DNA sequence. SWISS-PROT presently is the
best venue for these submissions.
HOW TO SUBMITON THE WORLDWIDEWEB
• The World Wide Web is now the most common interface used to submit sequences to the
three databases. The Web-based submission systems include Sakura at DDBJ, WebIn at
EBI, and BankIt at the NCBI.
• Some 75–80% of individual submissions to NCBI are done via the Web.
• On entering a BankIt submission, the user is asked about the length of the nucleotide
sequence to be submitted. The next BankIt form is straightforward: it asks about the contact
person, the citations, the organism, the location, some map information, and the nucleotide
sequence itself.
• At the end of the form, there is a BankIt button, which calls up the next form. At this point,
some validation is made, and, if any necessary fields were not filled in, the form is
presented again. If all is well, the next form asks how many features are to be added and
prompts the user to indicate their types.
• If no features were added, BankIt will issue a warning and ask for confirmation that not
even one CDS is to be added to the submission. The user can say no (zero new CDSs) or
take the opportunity to add one or more CDS.
• To begin to save a record, press the BankIt button again. The view that now appears must be
approved before the submission is completed; that is, more changes may be made, or other
features may be added. To finish, press BankIt one more time.
• The final screen will then appear; after the user toggles the Update/Finished set of buttons
and hits BankIt one last time, the submission will go to NCBI for processing. A copy of the
just-finished submission should arrive promptly via E-mail.
HOW TO SUBMIT WITH SEQUIN
• Sequin is designed for preparing new sequence records and updating
existing records for submission to DDBJ, EMBL, and GenBank.
• It is a tool that works on most computer platforms and is suitable for a
wide range of sequence lengths and complexities, including traditional
(gene-sized) nucleotide sequences, segmented entries, long (genome-
sized) sequences with many annotated features, and sets of related
sequences (i.e., population, phylogenetic, or mutation studies of a
particular gene, region, or viral genome).
• Sequin is more practical for more complex cases. Certain types of
submission (e.g., segmented sets) cannot be made via the Web unless
explicit instructions to the database staff are inserted.
• For sets of related or similar sequences (e.g., population or phylogenetic
studies), Sequin accepts information from the submitter on how the
multiple sequences are aligned to each other.
• Finally, Sequin can be used to edit and resubmit a record that already
exists in GenBank, either by extending (or replacing) the sequence or by
annotating additional features or alignments.
SUBMISSION MADE EASY
• Sequin has a number of attributes that greatly simplify the process of
building and annotating a record.
• The most profound aspect is automatic calculation of the intervals on
a CDS feature given only the nucleotide sequence, the sequence of
the protein product, and the genetic code. This ‘‘Suggest Intervals’’
process takes consensus splice sites into account in its calculations.
• Another important attribute is the ability to enter relevant annotation
in a simple format in the definition line of the sequence data file.
• Sequin recognizes and extracts this information when reading the
sequences and then puts it in the proper places in the record. This is
especially important for population and phylogenetic studies, where
the source modifiers are necessary to distinguish one component
from another.
STARTING A NEW SUBMISSION
• Sequin begins with a window that allows the user to start a new
submission or load a file containing a saved record. If Sequin has been
configured to be network aware, this window also allows the
downloading of existing database records that are to be updated.
• A new submission is made by filling out several forms. The forms use
folder tabs to subdivide a window into several pages, allowing all the
requested data to be entered without the need for a huge computer
screen. These entry forms have buttons for Previous Page and Next
Page. When the user arrives at the last page on a form, the Next Page
button changes to Next Form.
• The Submitting Authors form requests a tentative title, information
on the contact person, the authors of the sequence, and their
institutional affiliations.
• The Sequence Format form asks for the type of submission (single
sequence, segmented sequence, or population, phylogenetic, or
mutation study)
• The Organism and Sequences form asks for the biological data. On
the Organism page, as the user starts to type the scientific name, the list
of frequently used organisms scrolls automatically.
Entering a Single Nucleotide Sequence
and its Protein Products
• For a single sequence or a segmented sequence, the rest of the
Organism and Sequences form contains Nucleotide and Protein folder
tabs.
• The Nucleotide page has controls for setting the molecule type (e.g.,
genomic DNA or mRNA) and topology (usually linear, occasionally
circular) and for indicating whether the sequence is incomplete at the
5 or 3 ends.
• For each protein sequence, Suggest Intervals is run against the
nucleotide sequence, and a CDS feature is made with the resulting
intervals. A Gene feature is generated, with a single interval spanning
the CDS intervals. A protein product sequence is made, with a Protein
feature to give it a name.
• In most cases, it is much easier to enter the protein sequence and let
Sequin construct the record automatically than to manually add a CDS
feature later.
Entering an Aligned Set of Sequences
• A growing class of submissions involves sets of related sequences. A large number of HIV
sequences come in as population studies. A common phylogenetic study involves ribulose-
1,5-bisphosphate carboxylase (RUBISCO).
• The same submission information form is used to enter author and contact information.
• In the Sequence Format form, the user chooses the desired type of submission. Population
studies are generally from different individuals in the same (crossbreeding) species.
Phylogenetic studies are from different species.
• Multiple sequence studies can be submitted in FASTA format, in which case Sequin should
later be called on to calculate an alignment.
• The Organism and Sequences form is slightly different for sets of sequences. The
Organism page for phylogenetic studies allows the setting of a default genetic code only
for organisms not in Sequin’s local list of popular species. Instead of a Protein page, there
is now an Annotation page.
• As a final step, Sequin displays an editor that allows all organism and source modifiers on
each sequence to be edited .On confirmation of the modifiers, Sequin finishes assembling
the record into the proper structure.
Viewing the Sequence Record
• Sequin provides a number of different views of a sequence record.
The traditional flatfile can be presented in FASTA, GenBank or
EMBL format.
• There is a more detailed view that shows the features on the actual
sequence. For records containing alignments one can request either
a graphical overview showing insertions, deletions, and mismatches
or a detailed view showing the alignment of sequence letters.
• Clicking on a feature, a sequence, or the graphical representation of
an alignment between sequences will highlight that object.
Validation
• To ensure the quality of data being submitted, Sequin has a built-in
validator that searches for missing organism information, incorrect
coding region lengths, internal stop codons in coding regions,
mismatched amino acids, and non consensus splice sites.
• The validator also checks for inconsistent use of ‘‘partial’’
indications, especially among coding regions, the protein product,
and the protein feature on the product.
SENDING THE SUBMISSION
• A finished submission can be saved to disk and E-mailed to one of the
databases. It is also a good practice to save frequently throughout the
Sequin session, to make sure nothing is inadvertently lost.
CONCLUDING REMARKS
• The act of depositing records into a database and seeing these records
made public has always been an exercise of pride on the part of
submitters, a segment of the scientific activity from their laboratory
that they present to the scientific community. In this process,
submitters always hope to provide information in the most complete
and useful fashion, allowing maximum use of their data by the
scientific community.
• The databases strongly encourage the submission of sequence data
and of all appropriate updates. Many tools are available to facilitate
this task, and together the databases support Sequin as the tool to
use for new submissions, in addition to their respective Web
submissions tools. Submitting data to the databases has now become
a manageable (and sometimes enjoyable) task, with scientists no
longer having good excuses for neglecting.
REFERENCE
• NOTES

More Related Content

What's hot

Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis Nitin Naik
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Mark Pallen
 
Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Erin Davis
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptxPiyushBehgal1
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partiiSumatiHajela
 
Phage stratagies
Phage stratagiesPhage stratagies
Phage stratagiesAmith Reddy
 
nucleic acid hybridization
nucleic acid hybridizationnucleic acid hybridization
nucleic acid hybridizationPragati Randive
 
Lectut btn-202-ppt-l23. labeling techniques for nucleic acids
Lectut btn-202-ppt-l23. labeling techniques for nucleic acidsLectut btn-202-ppt-l23. labeling techniques for nucleic acids
Lectut btn-202-ppt-l23. labeling techniques for nucleic acidsRishabh Jain
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshitaHarshita Bhawsar
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncingSAIFALI444
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentRamya S
 
cDNA Library Construction
cDNA Library ConstructioncDNA Library Construction
cDNA Library ConstructionStella Evelyn
 

What's hot (20)

Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
 
Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell) Transfection methods (DNA to host cell)
Transfection methods (DNA to host cell)
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
Introduction to sequence alignment partii
Introduction to sequence alignment partiiIntroduction to sequence alignment partii
Introduction to sequence alignment partii
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
YEAST TWO HYBRID SYSTEM
 YEAST TWO HYBRID SYSTEM YEAST TWO HYBRID SYSTEM
YEAST TWO HYBRID SYSTEM
 
Phage stratagies
Phage stratagiesPhage stratagies
Phage stratagies
 
nucleic acid hybridization
nucleic acid hybridizationnucleic acid hybridization
nucleic acid hybridization
 
M13 phage
M13 phageM13 phage
M13 phage
 
Lectut btn-202-ppt-l23. labeling techniques for nucleic acids
Lectut btn-202-ppt-l23. labeling techniques for nucleic acidsLectut btn-202-ppt-l23. labeling techniques for nucleic acids
Lectut btn-202-ppt-l23. labeling techniques for nucleic acids
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Plant expression vectors
Plant expression vectorsPlant expression vectors
Plant expression vectors
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
smith - waterman algorithm.pptx
smith - waterman algorithm.pptxsmith - waterman algorithm.pptx
smith - waterman algorithm.pptx
 
cDNA Library Construction
cDNA Library ConstructioncDNA Library Construction
cDNA Library Construction
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 

Similar to Submitting DNA sequences to the databases, SEQUIN.pptx

Biological databases
Biological databasesBiological databases
Biological databasesAshfaq Ahmad
 
How to submit a sequence in NCBI
How to submit a sequence in NCBIHow to submit a sequence in NCBI
How to submit a sequence in NCBIMinhaz Ahmed
 
SOME OTHER TOOLS USED IN BIOINFORMATICS.pptx
SOME OTHER TOOLS USED IN BIOINFORMATICS.pptxSOME OTHER TOOLS USED IN BIOINFORMATICS.pptx
SOME OTHER TOOLS USED IN BIOINFORMATICS.pptxdhanyalakshmi11
 
De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...taxonbytes
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
Presentation on entrez as used in bioinformatics
Presentation on entrez as used in bioinformaticsPresentation on entrez as used in bioinformatics
Presentation on entrez as used in bioinformaticsCharityAyebale
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Networking Summit
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224GenomeInABottle
 
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...Thitichai Sripan
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acidsvibhakumari12
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptxscience lover
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECAProject
 

Similar to Submitting DNA sequences to the databases, SEQUIN.pptx (20)

Biological databases
Biological databasesBiological databases
Biological databases
 
How to submit a sequence in NCBI
How to submit a sequence in NCBIHow to submit a sequence in NCBI
How to submit a sequence in NCBI
 
SOME OTHER TOOLS USED IN BIOINFORMATICS.pptx
SOME OTHER TOOLS USED IN BIOINFORMATICS.pptxSOME OTHER TOOLS USED IN BIOINFORMATICS.pptx
SOME OTHER TOOLS USED IN BIOINFORMATICS.pptx
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Presentation on entrez as used in bioinformatics
Presentation on entrez as used in bioinformaticsPresentation on entrez as used in bioinformatics
Presentation on entrez as used in bioinformatics
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 
KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017KnetMiner Overview Oct 2017
KnetMiner Overview Oct 2017
 
Giab ashg webinar 160224
Giab ashg webinar 160224Giab ashg webinar 160224
Giab ashg webinar 160224
 
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
A Reliable Password-based User Authentication Scheme for Web-based Human Geno...
 
Primary sequencing of nucleic acids
Primary sequencing of nucleic acidsPrimary sequencing of nucleic acids
Primary sequencing of nucleic acids
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
 

More from Ved Gharat

Investigational New Drug presentation.pptx
Investigational New Drug presentation.pptxInvestigational New Drug presentation.pptx
Investigational New Drug presentation.pptxVed Gharat
 
Affinity chromatography.pptx
Affinity chromatography.pptxAffinity chromatography.pptx
Affinity chromatography.pptxVed Gharat
 
PRODUCTION OF MONOCLONAL ANTIBODIES.pptx
PRODUCTION OF MONOCLONAL ANTIBODIES.pptxPRODUCTION OF MONOCLONAL ANTIBODIES.pptx
PRODUCTION OF MONOCLONAL ANTIBODIES.pptxVed Gharat
 
EMULSIFIERS .pptx
EMULSIFIERS .pptxEMULSIFIERS .pptx
EMULSIFIERS .pptxVed Gharat
 
Secondary culture.pptx
Secondary culture.pptxSecondary culture.pptx
Secondary culture.pptxVed Gharat
 
Patentable and Non-Patentable inventions.pptx
Patentable and Non-Patentable inventions.pptxPatentable and Non-Patentable inventions.pptx
Patentable and Non-Patentable inventions.pptxVed Gharat
 
Ergot alkaloids.pptx
Ergot alkaloids.pptxErgot alkaloids.pptx
Ergot alkaloids.pptxVed Gharat
 
Hypersensitivity types.pptx
Hypersensitivity types.pptxHypersensitivity types.pptx
Hypersensitivity types.pptxVed Gharat
 
FRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptx
FRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptxFRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptx
FRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptxVed Gharat
 
Enzyme coupled receptors.pptx
Enzyme coupled receptors.pptxEnzyme coupled receptors.pptx
Enzyme coupled receptors.pptxVed Gharat
 
Ubiquitin proteasome pathway.pptx
Ubiquitin proteasome pathway.pptxUbiquitin proteasome pathway.pptx
Ubiquitin proteasome pathway.pptxVed Gharat
 
HOT AIR OVEN .pptx
HOT AIR OVEN .pptxHOT AIR OVEN .pptx
HOT AIR OVEN .pptxVed Gharat
 
IONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptx
IONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptxIONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptx
IONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptxVed Gharat
 
ABSORPTION OF DRUGS FROM LUNGS.pptx
ABSORPTION OF DRUGS FROM LUNGS.pptxABSORPTION OF DRUGS FROM LUNGS.pptx
ABSORPTION OF DRUGS FROM LUNGS.pptxVed Gharat
 
ANAEROBIC BIOLOGICAL TREATMENT .pptx
ANAEROBIC BIOLOGICAL TREATMENT .pptxANAEROBIC BIOLOGICAL TREATMENT .pptx
ANAEROBIC BIOLOGICAL TREATMENT .pptxVed Gharat
 
PRINCIPLES OF CHEESE MAKING.pptx
 PRINCIPLES OF CHEESE MAKING.pptx PRINCIPLES OF CHEESE MAKING.pptx
PRINCIPLES OF CHEESE MAKING.pptxVed Gharat
 
PEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptx
PEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptxPEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptx
PEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptxVed Gharat
 
MECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptx
MECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptxMECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptx
MECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptxVed Gharat
 
GUIDELINES TO GLP.pptx
GUIDELINES TO GLP.pptxGUIDELINES TO GLP.pptx
GUIDELINES TO GLP.pptxVed Gharat
 
PROTEASES.pptx
PROTEASES.pptxPROTEASES.pptx
PROTEASES.pptxVed Gharat
 

More from Ved Gharat (20)

Investigational New Drug presentation.pptx
Investigational New Drug presentation.pptxInvestigational New Drug presentation.pptx
Investigational New Drug presentation.pptx
 
Affinity chromatography.pptx
Affinity chromatography.pptxAffinity chromatography.pptx
Affinity chromatography.pptx
 
PRODUCTION OF MONOCLONAL ANTIBODIES.pptx
PRODUCTION OF MONOCLONAL ANTIBODIES.pptxPRODUCTION OF MONOCLONAL ANTIBODIES.pptx
PRODUCTION OF MONOCLONAL ANTIBODIES.pptx
 
EMULSIFIERS .pptx
EMULSIFIERS .pptxEMULSIFIERS .pptx
EMULSIFIERS .pptx
 
Secondary culture.pptx
Secondary culture.pptxSecondary culture.pptx
Secondary culture.pptx
 
Patentable and Non-Patentable inventions.pptx
Patentable and Non-Patentable inventions.pptxPatentable and Non-Patentable inventions.pptx
Patentable and Non-Patentable inventions.pptx
 
Ergot alkaloids.pptx
Ergot alkaloids.pptxErgot alkaloids.pptx
Ergot alkaloids.pptx
 
Hypersensitivity types.pptx
Hypersensitivity types.pptxHypersensitivity types.pptx
Hypersensitivity types.pptx
 
FRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptx
FRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptxFRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptx
FRUIT FLY AND ZEBRA FISH AS MODEL ORGANISMS.pptx
 
Enzyme coupled receptors.pptx
Enzyme coupled receptors.pptxEnzyme coupled receptors.pptx
Enzyme coupled receptors.pptx
 
Ubiquitin proteasome pathway.pptx
Ubiquitin proteasome pathway.pptxUbiquitin proteasome pathway.pptx
Ubiquitin proteasome pathway.pptx
 
HOT AIR OVEN .pptx
HOT AIR OVEN .pptxHOT AIR OVEN .pptx
HOT AIR OVEN .pptx
 
IONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptx
IONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptxIONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptx
IONIC AND OSMOTIC HOMEOSTASIS, REACTIVE OXYGEN SPECIES.pptx
 
ABSORPTION OF DRUGS FROM LUNGS.pptx
ABSORPTION OF DRUGS FROM LUNGS.pptxABSORPTION OF DRUGS FROM LUNGS.pptx
ABSORPTION OF DRUGS FROM LUNGS.pptx
 
ANAEROBIC BIOLOGICAL TREATMENT .pptx
ANAEROBIC BIOLOGICAL TREATMENT .pptxANAEROBIC BIOLOGICAL TREATMENT .pptx
ANAEROBIC BIOLOGICAL TREATMENT .pptx
 
PRINCIPLES OF CHEESE MAKING.pptx
 PRINCIPLES OF CHEESE MAKING.pptx PRINCIPLES OF CHEESE MAKING.pptx
PRINCIPLES OF CHEESE MAKING.pptx
 
PEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptx
PEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptxPEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptx
PEPTIDOGLYCAN SYNTHESIS IN BACTERIA.pptx
 
MECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptx
MECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptxMECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptx
MECHANISM OF ACTION OF BETA-LACTAM ANTIBIOTICS (1).pptx
 
GUIDELINES TO GLP.pptx
GUIDELINES TO GLP.pptxGUIDELINES TO GLP.pptx
GUIDELINES TO GLP.pptx
 
PROTEASES.pptx
PROTEASES.pptxPROTEASES.pptx
PROTEASES.pptx
 

Recently uploaded

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 

Recently uploaded (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 

Submitting DNA sequences to the databases, SEQUIN.pptx

  • 1. SUBMITTING DNA SEQUENCES TO THE DATABASES, SEQUIN P:1 U:2 Vedanti S. Gharat Roll No. :- 09 M.Sc. Biotech part 1
  • 2. • DNA sequence records from the public databases (DDBJ/EMBL/GenBank) are essential components of computational analysis in molecular biology. • The sequence records are also reagents for improved curated resources like LocusLink or many of the protein databases. • Accurate and informative biological annotation of sequence records is critical in determining the function of a disease gene by sequence similarity search. • The names or functions of the encoded protein products, the name of the genetic locus, and the link to the original publication of that sequence make a sequence record of immediate value to the scientist who retrieves it as the result of a BLAST or Entrez search. • The submission process is governed by an international, collaborative agreement. Sequences submitted to any one of the three databases participating in this collaboration will appear in the other two databases within a few days of their release to the public.
  • 3. WHY, WHERE, AND WHAT TO SUBMIT? • One should submit to whichever of the three public databases is most convenient. • This may be the database that is closest geographically, it may be the repository one has always used in the past, or it may simply be the place one’s submission is likely to receive the best attention. • Under normal circumstances, an accession number will be returned within one workday, and a finished record should be available within 5–10 working days, depending on the information provided by the submitter. • Submitting data to the database is not the end of one’s scientific obligation. Updating the record as more information becomes available will ensure that the information within the record will survive time and scientific rigor. • Submissions of sequences are done electronically: via the World Wide Web, by electronic mail, or on a computer disk sent via regular postal mail.
  • 4. DNA/RNA • The submission process is quite simple, but care must be taken to provide information that is accurate and as biologically sound as possible, to ensure maximal usability by the scientific community. 1]Nature of the Sequence. Is it of genomic or mRNA origin? 2] Is the Sequence Synthetic, But Not Artificial? There is a special division in the nucleotide databases for synthetic molecules, sequences put together experimentally that do not occur naturally in the environment. The DNA sequence databases do not accept computer-generated sequences. 3] How Accurate is the Sequence? The assumption that the submitted sequence is as accurate as possible usually means at least two-pass coverage on the whole submitted sequence. Equally important is the verification of the final submitted sequence.
  • 5. • Organism All DNA sequence records must show the organism from which the sequence was derived. Many inferences are made from the phylogenetic position of the records present in the databases. • Citation Having a citation in the submission being prepared is of great importance, even if it consists of just a temporary list of authors and a working title. Updating these citations at publication time is also important to the value of the record. • Coding Sequence(s) A submission of nucleotide also means the inclusion of the protein sequences it encodes. This is important for two reasons: • Protein databases (e.g., SWISS-PROT and PIR) are almost entirely populated by protein sequences present in DNA sequence database records. • The inclusion of the protein sequence serves as an important, if not essential, validation step in the submission process. The coding sequence features, or CDS, are the links between the DNA or RNA and the protein sequences, and their correct positioning is central in the validation, as is the correct genetic code.
  • 6. POPULATION, PHYLOGENETIC, AND MUTATION STUDIES • The nucleotide databases are now accepting population, phylogenetic, and mutational studies as submitted sequence sets, and, although this information is not adequately represented in the flatfile records, it is appearing in the various databases. • This allows the submission of a group of related sequences together, with entry of shared information required only once. • Sequin also allows the user to include the alignment generated with a favorite alignment tool and to submit this information with the DNA sequence. PROTEIN-ONLY SUBMISSIONS • In most cases, protein sequences come with a DNA sequence. There are some exceptions—people do sequence proteins directly—and such sequences must be submitted without a corresponding DNA sequence. SWISS-PROT presently is the best venue for these submissions.
  • 7. HOW TO SUBMITON THE WORLDWIDEWEB • The World Wide Web is now the most common interface used to submit sequences to the three databases. The Web-based submission systems include Sakura at DDBJ, WebIn at EBI, and BankIt at the NCBI. • Some 75–80% of individual submissions to NCBI are done via the Web. • On entering a BankIt submission, the user is asked about the length of the nucleotide sequence to be submitted. The next BankIt form is straightforward: it asks about the contact person, the citations, the organism, the location, some map information, and the nucleotide sequence itself. • At the end of the form, there is a BankIt button, which calls up the next form. At this point, some validation is made, and, if any necessary fields were not filled in, the form is presented again. If all is well, the next form asks how many features are to be added and prompts the user to indicate their types. • If no features were added, BankIt will issue a warning and ask for confirmation that not even one CDS is to be added to the submission. The user can say no (zero new CDSs) or take the opportunity to add one or more CDS. • To begin to save a record, press the BankIt button again. The view that now appears must be approved before the submission is completed; that is, more changes may be made, or other features may be added. To finish, press BankIt one more time. • The final screen will then appear; after the user toggles the Update/Finished set of buttons and hits BankIt one last time, the submission will go to NCBI for processing. A copy of the just-finished submission should arrive promptly via E-mail.
  • 8. HOW TO SUBMIT WITH SEQUIN • Sequin is designed for preparing new sequence records and updating existing records for submission to DDBJ, EMBL, and GenBank. • It is a tool that works on most computer platforms and is suitable for a wide range of sequence lengths and complexities, including traditional (gene-sized) nucleotide sequences, segmented entries, long (genome- sized) sequences with many annotated features, and sets of related sequences (i.e., population, phylogenetic, or mutation studies of a particular gene, region, or viral genome). • Sequin is more practical for more complex cases. Certain types of submission (e.g., segmented sets) cannot be made via the Web unless explicit instructions to the database staff are inserted. • For sets of related or similar sequences (e.g., population or phylogenetic studies), Sequin accepts information from the submitter on how the multiple sequences are aligned to each other. • Finally, Sequin can be used to edit and resubmit a record that already exists in GenBank, either by extending (or replacing) the sequence or by annotating additional features or alignments.
  • 9. SUBMISSION MADE EASY • Sequin has a number of attributes that greatly simplify the process of building and annotating a record. • The most profound aspect is automatic calculation of the intervals on a CDS feature given only the nucleotide sequence, the sequence of the protein product, and the genetic code. This ‘‘Suggest Intervals’’ process takes consensus splice sites into account in its calculations. • Another important attribute is the ability to enter relevant annotation in a simple format in the definition line of the sequence data file. • Sequin recognizes and extracts this information when reading the sequences and then puts it in the proper places in the record. This is especially important for population and phylogenetic studies, where the source modifiers are necessary to distinguish one component from another.
  • 10. STARTING A NEW SUBMISSION • Sequin begins with a window that allows the user to start a new submission or load a file containing a saved record. If Sequin has been configured to be network aware, this window also allows the downloading of existing database records that are to be updated. • A new submission is made by filling out several forms. The forms use folder tabs to subdivide a window into several pages, allowing all the requested data to be entered without the need for a huge computer screen. These entry forms have buttons for Previous Page and Next Page. When the user arrives at the last page on a form, the Next Page button changes to Next Form. • The Submitting Authors form requests a tentative title, information on the contact person, the authors of the sequence, and their institutional affiliations. • The Sequence Format form asks for the type of submission (single sequence, segmented sequence, or population, phylogenetic, or mutation study) • The Organism and Sequences form asks for the biological data. On the Organism page, as the user starts to type the scientific name, the list of frequently used organisms scrolls automatically.
  • 11. Entering a Single Nucleotide Sequence and its Protein Products • For a single sequence or a segmented sequence, the rest of the Organism and Sequences form contains Nucleotide and Protein folder tabs. • The Nucleotide page has controls for setting the molecule type (e.g., genomic DNA or mRNA) and topology (usually linear, occasionally circular) and for indicating whether the sequence is incomplete at the 5 or 3 ends. • For each protein sequence, Suggest Intervals is run against the nucleotide sequence, and a CDS feature is made with the resulting intervals. A Gene feature is generated, with a single interval spanning the CDS intervals. A protein product sequence is made, with a Protein feature to give it a name. • In most cases, it is much easier to enter the protein sequence and let Sequin construct the record automatically than to manually add a CDS feature later.
  • 12. Entering an Aligned Set of Sequences • A growing class of submissions involves sets of related sequences. A large number of HIV sequences come in as population studies. A common phylogenetic study involves ribulose- 1,5-bisphosphate carboxylase (RUBISCO). • The same submission information form is used to enter author and contact information. • In the Sequence Format form, the user chooses the desired type of submission. Population studies are generally from different individuals in the same (crossbreeding) species. Phylogenetic studies are from different species. • Multiple sequence studies can be submitted in FASTA format, in which case Sequin should later be called on to calculate an alignment. • The Organism and Sequences form is slightly different for sets of sequences. The Organism page for phylogenetic studies allows the setting of a default genetic code only for organisms not in Sequin’s local list of popular species. Instead of a Protein page, there is now an Annotation page. • As a final step, Sequin displays an editor that allows all organism and source modifiers on each sequence to be edited .On confirmation of the modifiers, Sequin finishes assembling the record into the proper structure.
  • 13. Viewing the Sequence Record • Sequin provides a number of different views of a sequence record. The traditional flatfile can be presented in FASTA, GenBank or EMBL format. • There is a more detailed view that shows the features on the actual sequence. For records containing alignments one can request either a graphical overview showing insertions, deletions, and mismatches or a detailed view showing the alignment of sequence letters. • Clicking on a feature, a sequence, or the graphical representation of an alignment between sequences will highlight that object. Validation • To ensure the quality of data being submitted, Sequin has a built-in validator that searches for missing organism information, incorrect coding region lengths, internal stop codons in coding regions, mismatched amino acids, and non consensus splice sites. • The validator also checks for inconsistent use of ‘‘partial’’ indications, especially among coding regions, the protein product, and the protein feature on the product.
  • 14. SENDING THE SUBMISSION • A finished submission can be saved to disk and E-mailed to one of the databases. It is also a good practice to save frequently throughout the Sequin session, to make sure nothing is inadvertently lost. CONCLUDING REMARKS • The act of depositing records into a database and seeing these records made public has always been an exercise of pride on the part of submitters, a segment of the scientific activity from their laboratory that they present to the scientific community. In this process, submitters always hope to provide information in the most complete and useful fashion, allowing maximum use of their data by the scientific community. • The databases strongly encourage the submission of sequence data and of all appropriate updates. Many tools are available to facilitate this task, and together the databases support Sequin as the tool to use for new submissions, in addition to their respective Web submissions tools. Submitting data to the databases has now become a manageable (and sometimes enjoyable) task, with scientists no longer having good excuses for neglecting.