Vision and reflection on Mining Software Repositories research in 2024
DNA Sequencing & Data Analysis Techniques
1. DNA Sequencing & data Analysing
30-Nov-2016
Dr.jassim Mohammed Abdo
Director of Duhok Research Center
PhD in Molecular Biology &Immunology
Issued by Ludwig-Maximilians University
Munich, Germany
2. History
1953 - structure of DNA established as a double
helix.
1970 - first method of DNA sequencing involved a
location specific primer extension strategy.
1977 - Frederick sanger published a method for DNA
sequencing with chain terminating inhibitors.
3. 1977 - Allan Maxam and Walter Gilbert developed
DNA sequencing by chemical degradation.
1977 - the first genome to be sequenced was that of
bacteriophage φX174.
1990 - several new methods are developed in the mid
to late 90’s.
2003 - Complete Human Genome Project.
4.
5. • The first sequence of the human genome was
obtained using so called »first generation«
sequencing technology
• In the following years, »second« or »next
generation« sequencing (NGS) technologies were
developed, characterized by massive parallelization,
improved automation and speed, and, most
importantly, greatly reduced price
6.
7. For example, in 2001, the cost of sequencing a human genome
was almost 100 million$, In 2015, it was just 1245$
8.
9.
10. Primary NA sequence can be produced by
Sanger-based technologies or NGS technologies
21. Primary sequence dbs are synchronised and
every sequence receives a unique identifier
22. One sequence entry contains three categories of
different types of information
23.
24.
25.
26. data Analysing
• BioEdit
• Chromas
• DNA star
• Lasegene
• Gegenees
• DNA Maste
• Oligo Analyzer
• DNA Club
27. After sending a sample to be sequenced, the result needs to be
interpreted, the normal steps in the process include:
1. Open the chromatogram file, check the quality of the sequence.
and determine the length of high quality sequence.
2. Differentiate the vector sequence (if used) from the insert’s
sequence using restriction sites as markers
3. 3. If you know exactly what the sequence should be, make a pair-
wise alignment of the DNA sequences using Bioedit, ClustalW or
NCBI’s BLAST2.
4. If the DNA sequence contains variations to the consensus,
perform a pair-wise alignment of the predicted peptide
sequences
28.
29. Analysing sequences and chromatograms
Molecular biologists sequence samples of DNA for a huge
range of reasons and we will explore this fundamental
technique here. 'Dye terminator sequencing' is currently the
preferred method used by molecular biologists for
sequencing of DNA samples.
30. Chromatogram files can be opened by Bioedit or by Chromas.,Lasegen ,DNAstar ……
These programs will display the hromatogram of the sequence, it is up to you to
determine the reliability of the sequence.
31.
32.
33.
34.
35.
36.
37.
38. How to submit a sequence in NCBI
we use BankIt if,
We have a single sequence, a simple set of sequences (for example:16S rRNA, matK,
ITS/rRNA, amoE, tefB, cytb, or COI sets), or a small batch of different sequences
we prefer to use a web-based submission tool
the feature annotation for our sequences is not complicated
we do not require advanced sequence analysis tools
we use Sequin if,
we prefer to work on our submission off-line
we have a sequence or sequences that are complex
we would like graphical viewing and editing options, including an alignment editor
we would like the option to have network access to related analytical tools
41. GenBank Sequence Submission Policy
the GenBank database is intended for new sequence data that is determined by
and annotated by the submitter
sequences built or derived from other GenBank primary data intended for
the Third Party Annotation (TPA) database may be submitted through BankIt
the following types of submissions are NOT acceptable:
sequences less than 200 nucleotides long, unless they represent complete
exons, non-coding RNAs (ncRNAs), microsatellites or ancient DNA
non-contiguous sequences that have been artificially joined; for example,
multiple exons without their intervening introns or without a 'gap'
representing any missing sequence
single sequences that are a mix of molecule types, such as mix of genomic
and mRNA sequence data