GenoTHREAT
A biosecurity software to screen DNA
synthesis orders against pathogens
GBCB seminar
Laura Adam
10/07/2014
7/10/2014 GenoTHREAT 2
7/10/2014 GenoTHREAT 3
(2005) Science, 310(5745), 77. AAAS..
7/10/2014 GenoTHREAT 4
7/10/2014 GenoTHREAT 5
http://www.washingtonpost.com/wp-srv/nation/daily/graphics/wmdbio_123004.html
CURRENT REGULATIONS
7/10/2014 GenoTHREAT 6
The Gene Synthesis Industry
7/10/2014 7GenoTHREAT
Industry Response to Dual Use
• 5 members (all based in
Germany)
• Undersigned by:
► 6 German or
German/American
► 2 Chine...
7/10/2014 GenoTHREAT 9
Major Sections:
Customer screening
Sequence screening
Record retention
Government contact
Our Primary Objectives
1. Interpret the (draft) guidance as an algorithm
2. Implement as a software: GenoTHREAT
3. Charact...
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the
guidance
III. GenoTHREAT: implementatio...
[Guidance] : Purpose
“[…] to minimize the risk that unauthorized individuals
or individuals with malicious intent will obt...
[Guidance] : Goals of sequence screening
• Agent of concern?
• Select Agents and Toxins
• Sequences of concern?
• “dsDNA s...
7/10/2014 GenoTHREAT 14
7/10/2014 GenoTHREAT 15
[Guidance] : Major Points
1. Perform Six Frame Translation
2. Divide the query sequences into
subsequences of 200bp or 66a...
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Algorithm] : Input a query DNA sequence to
screen
7/10/2014 GenoTHREAT 18
[Algorithm] : Six Frame translation
7/10/2014 19GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Algorithm] : Division
7/10/2014 21GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Algorithm] : What should we do with
subsequences?
7/10/2014 GenoTHREAT 23
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Algorithm] : BLAST subsequences
against entire Genbank database
7/10/2014 25GenoTHREAT
Basic Local Alignment Search Tool (BLAST)
• Developed at the U.S. National Center for
Biotechnology Information
• One of t...
BLAST
Query Sequence
Database of sequences
Local alignment
7/10/2014 27GenoTHREAT
BLAST Output
Percent Identity
► The percentage of identical nucleotides (or amino acid) in
the sequence aligned
Query Cove...
[Algorithm] : What should we do with all
those results of BLAST?
7/10/2014 29GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Guidance] : The Best match approach
• Use local sequence alignment tool
• suggest Blast
• Best matches = greatest percent...
[Algorithm] : Identify Best Matches
7/10/2014 32GenoTHREAT
Best matches
Mus musculus
Mus musculus
BLAST results PI QC (%)
Mus musculus 100 100
Mus musculus 100 100
Danio rerio 97 10...
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Algorithm]: Determine nature of Best
Matches
7/10/2014 35GenoTHREAT
[Algorithm] : How can we know if a Best
Match is to a Select Agent or Toxin?
Problem: no suggestion in guidance
Solution:
...
BLAST
[Example] : Is this subsequence a
hit?
7/10/2014 GenoTHREAT 37
BLAST results PI QC (%)
Bacillus anthracis 100 100
Ba...
[Example] : Keyword vs. Anti-keyword
If a GenBank entry contains a keyword, then the
sequence is flagged
SA
7/10/2014 38Ge...
[Example] : Keyword vs. Anti-keyword
If a GenBank entry contains both a keyword and anti-
keyword, the order is not flagge...
[Algorithm] : When to flag the subsequence?
7/10/2014 40GenoTHREAT
QC (%)
100
100
100
80
Best matches
Mus musculus
Mus musculus
BLAST results Score
Mus musculus 100
Mus musculus 100
Danio r...
QC (%)
100
100
100
80
Best matches
Lumpy skin disease virus
Sheeppox virus
BLAST results Score
Lumpy skin disease virus 10...
QC (%)
100
100
100
80
Best matches
Bacillus anthracis
Bacillus cereus
BLAST results Score
Bacillus anthracis 100
Bacillus ...
[Algorithm] : No Best Matches…
7/10/2014 44GenoTHREAT
[Algorithm] : Points of the Guidance left to
interpretation
How do you identify sequences of concern of 200bp or
greater w...
[Algorithm] : Extension Method
7/10/2014 GenoTHREAT 46
[Algorithm] : Extension Method
7/10/2014 47GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 48GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 49GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 50GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 51GenoTHREAT
Extend to
meet possible
alignments
120bp 80bp
120bp80bp
New subsequence
[Algorithm] : Extension Method
7/10/2014 52GenoTHR...
[Algorithm] : Extension Method
7/10/2014 53GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 54GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 55GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 56GenoTHREAT
[Algorithm] : Extension Method
7/10/2014 57GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
1. Perform Six Frame Translati...
[Algorithm] : Recap
7/10/2014 59GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
Using BLAST
Online BLAST
Performs BLAST via NCBI
website interface
► Faster per BLAST
► Computationally less
expensive
► O...
Screening time & hardware
7/10/2014 GenoTHREAT 62
Online Desktop Business Class Server
Sequence length (bp) Screening time...
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
Database of Test Sequences
• Implementations must be compared to assess quality
• Standardized set of test sequences is ne...
Database of Test Sequences
Contribute to the development of a standard test set
of sequences
65
7/10/2014 65GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
Keyword and Anti-Keyword list
• Test with the unmodified sequences (184 sequences)
• Two lists of keywords
• Limited
• ext...
Keyword List Content Variation
7/10/2014 GenoTHREAT 68
0
20
40
60
80
100
120
Limited keywords Extensivekeywords
Correct SA...
Anti-Keywords
7/10/2014 69GenoTHREAT
Anti
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
Modified Test Sequences
Modification performed on the initial unmodified
sequences
► Intervening sequences
► Degenerate se...
Degenerate Sequences
Potential Danger: Codon optimized nucleotide sequences
7/10/2014 GenoTHREAT 72
GATTTGGACACTCATTTCACC
...
Intervening sequences
Potential Danger: SAT sequences hidden within larger, benign sequences
300bps
NSAT
200bps
SAT
300bps...
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
Mutated sequences
Potential Danger: mutated, but still active, SAT sequences which do not align to
GenBank entries
7/10/20...
Nucleotides subsequences
7/10/2014 76GenoTHREAT
Result: BLAST parameter settings affect screening capability
Amino-Acid subsequences
7/10/2014 77GenoTHREAT
Result: BLAST parameters do not clearly change the efficiency of the screen...
Nucleotides subsequences
7/10/2014 78GenoTHREAT
Result: Direct relationship between screening time and ability to identify...
Amino-Acid subsequences
7/10/2014 79GenoTHREAT
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
Real world gene orders simulation
Gene Synthesis company: low number of false hit
needed
1. iGEM registry
• Registry compl...
iGEM Registry
First step: screen registry sequences 1-->1724
Hit rate: 6.5%
Major causes of hits:
• 100% query coverage fo...
iGEM Registry
7/10/2014 GenoTHREAT 83
GenoCAD database
• 1,258 sequences
• 32 hits: 2.54%
• Manual review:
• YopH: protein from Y.pestis (gi|14488772)
7/10/2014...
Real world gene orders simulation
Hits left are due to:
• Very often: 1 subsequence of 1 Protein frame leads
to a correct ...
Road Map
I. Current regulations
II. Sequence screening algorithm: interpreting the guidance
III. GenoTHREAT: implementatio...
GenoTHREAT
• “Best Match”
• Hardware and software parameters
• Keyword list
• BLAST parameters
• Certain types of sequence...
Guidance conclusion
Government Guidance potentially usable by
companies:
• Reasonable time
• Good detection of sequences o...
7/10/2014 GenoTHREAT 89
http://www.dagorret.net/2009/12/18/new-technology-
developed-by-microsoft-for-photography-dna-imag...
7/10/2014 GenoTHREAT 90
© iGEM and Justin Knight.
7/10/2014 GenoTHREAT 91
7/10/2014 GenoTHREAT 92
7/10/2014 GenoTHREAT 93
7/10/2014 GenoTHREAT 94
A T A A C T C C C T G G G T C G T T A A A C C G G
C G G C T G C G G C A G T C T T A G C A T A A T ...
Acknowledgeme
nts
Dr. Jean Peccoud
Mandy L. Wilson
The VT-ENSIMAG iGEM team (2010):
Michael Kozar
Gaelle Letort
Olivier Mi...
7/10/2014 GenoTHREAT 96
Upcoming SlideShare
Loading in …5
×

GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synthesis industry and the synthetic biology community.

465 views

Published on

DNA sequence screening software that implements the best match method recommended by the federal government.

Publication: Adam L et al, Strengths and limitations of the federal guidance on synthetic DNA, Nature Biotechnology (2011) 29, 208–210 doi:10.1038/nbt.1802
US Department of Health and Human Services voluntary guidelines “Screening Framework Guidance for Synthetic Double-Stranded DNA Providers” November 2009.

Software: http://sourceforge.net/projects/genothreat/

Published in: Software, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
465
On SlideShare
0
From Embeds
0
Number of Embeds
52
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

GenoThreat / GenoGUARD -- open source biosecurity solution for the gene synthesis industry and the synthetic biology community.

  1. 1. GenoTHREAT A biosecurity software to screen DNA synthesis orders against pathogens GBCB seminar Laura Adam 10/07/2014
  2. 2. 7/10/2014 GenoTHREAT 2
  3. 3. 7/10/2014 GenoTHREAT 3 (2005) Science, 310(5745), 77. AAAS..
  4. 4. 7/10/2014 GenoTHREAT 4
  5. 5. 7/10/2014 GenoTHREAT 5 http://www.washingtonpost.com/wp-srv/nation/daily/graphics/wmdbio_123004.html
  6. 6. CURRENT REGULATIONS 7/10/2014 GenoTHREAT 6
  7. 7. The Gene Synthesis Industry 7/10/2014 7GenoTHREAT
  8. 8. Industry Response to Dual Use • 5 members (all based in Germany) • Undersigned by: ► 6 German or German/American ► 2 Chinese • “Code of Conduct for Best Practices in Gene Synthesis” • 5 companies (American) • 80% of worldwide synthesis capacity • “Harmonized Screening Protocol” 7/10/2014 8GenoTHREAT
  9. 9. 7/10/2014 GenoTHREAT 9 Major Sections: Customer screening Sequence screening Record retention Government contact
  10. 10. Our Primary Objectives 1. Interpret the (draft) guidance as an algorithm 2. Implement as a software: GenoTHREAT 3. Characterize screening efficacy 7/10/2014 10GenoTHREAT
  11. 11. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 11GenoTHREAT
  12. 12. [Guidance] : Purpose “[…] to minimize the risk that unauthorized individuals or individuals with malicious intent will obtain “toxins and agents of concern” through the use of nucleic acid synthesis technologies, and to simultaneously minimize any negative impacts on the conduct of research and business operations.” 7/10/2014 12GenoTHREAT
  13. 13. [Guidance] : Goals of sequence screening • Agent of concern? • Select Agents and Toxins • Sequences of concern? • “dsDNA sequences derived from or encoding Select Agents and Toxins” • Sequence unique to select agent • No house-keeping genes • Both DNA strands and the six-frames translation • Detect any “sequence of concern” • Embedded : as small as 200bps  Use Best match approach (at least) 7/10/2014 13GenoTHREAT
  14. 14. 7/10/2014 GenoTHREAT 14
  15. 15. 7/10/2014 GenoTHREAT 15
  16. 16. [Guidance] : Major Points 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision 7/10/2014 16GenoTHREAT
  17. 17. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 17GenoTHREAT
  18. 18. [Algorithm] : Input a query DNA sequence to screen 7/10/2014 GenoTHREAT 18
  19. 19. [Algorithm] : Six Frame translation 7/10/2014 19GenoTHREAT
  20. 20. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 20GenoTHREAT
  21. 21. [Algorithm] : Division 7/10/2014 21GenoTHREAT
  22. 22. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 22GenoTHREAT
  23. 23. [Algorithm] : What should we do with subsequences? 7/10/2014 GenoTHREAT 23
  24. 24. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 24GenoTHREAT
  25. 25. [Algorithm] : BLAST subsequences against entire Genbank database 7/10/2014 25GenoTHREAT
  26. 26. Basic Local Alignment Search Tool (BLAST) • Developed at the U.S. National Center for Biotechnology Information • One of the most widely used bioinformatics tools • Aligns query sequences against sequences in the GenBank sequence database • Algorithm emphasizes speed over sensitivity 7/10/2014 26GenoTHREAT
  27. 27. BLAST Query Sequence Database of sequences Local alignment 7/10/2014 27GenoTHREAT
  28. 28. BLAST Output Percent Identity ► The percentage of identical nucleotides (or amino acid) in the sequence aligned Query Coverage ► The length of sequence aligned 7/10/2014 28GenoTHREAT
  29. 29. [Algorithm] : What should we do with all those results of BLAST? 7/10/2014 29GenoTHREAT
  30. 30. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 30GenoTHREAT
  31. 31. [Guidance] : The Best match approach • Use local sequence alignment tool • suggest Blast • Best matches = greatest percent identity over the entire fragment • 66AA or 200bps fragments 7/10/2014 31GenoTHREAT
  32. 32. [Algorithm] : Identify Best Matches 7/10/2014 32GenoTHREAT
  33. 33. Best matches Mus musculus Mus musculus BLAST results PI QC (%) Mus musculus 100 100 Mus musculus 100 100 Danio rerio 97 100 Danio rerio 43 80 BLAST [Example] 7/10/2014 GenoTHREAT 33
  34. 34. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 34GenoTHREAT
  35. 35. [Algorithm]: Determine nature of Best Matches 7/10/2014 35GenoTHREAT
  36. 36. [Algorithm] : How can we know if a Best Match is to a Select Agent or Toxin? Problem: no suggestion in guidance Solution: keyword and anti-keyword list 7/10/2014 36GenoTHREAT
  37. 37. BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 37 BLAST results PI QC (%) Bacillus anthracis 100 100 Bacillus anthracis str. Sterne 100 100 Danio rerio 97 100 Danio rerio 43 80 Best matches Bacillus anthracis Bacillus anthracis str. Sterne
  38. 38. [Example] : Keyword vs. Anti-keyword If a GenBank entry contains a keyword, then the sequence is flagged SA 7/10/2014 38GenoTHREAT
  39. 39. [Example] : Keyword vs. Anti-keyword If a GenBank entry contains both a keyword and anti- keyword, the order is not flagged NSA 7/10/2014 39GenoTHREAT
  40. 40. [Algorithm] : When to flag the subsequence? 7/10/2014 40GenoTHREAT
  41. 41. QC (%) 100 100 100 80 Best matches Mus musculus Mus musculus BLAST results Score Mus musculus 100 Mus musculus 100 Danio rerio 97 Danio rerio 43 BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 41
  42. 42. QC (%) 100 100 100 80 Best matches Lumpy skin disease virus Sheeppox virus BLAST results Score Lumpy skin disease virus 100 Sheeppox virus 100 Goatpox virus 98 Dearpox virus 44 BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 42
  43. 43. QC (%) 100 100 100 80 Best matches Bacillus anthracis Bacillus cereus BLAST results Score Bacillus anthracis 100 Bacillus cereus 100 Plasmodium falciparum 63 Clostridium ljungdahlii 44 BLAST [Example] : Is this subsequence a hit? 7/10/2014 GenoTHREAT 43 [Guidance] : « unique to Select Agent » !!!
  44. 44. [Algorithm] : No Best Matches… 7/10/2014 44GenoTHREAT
  45. 45. [Algorithm] : Points of the Guidance left to interpretation How do you identify sequences of concern of 200bp or greater which partially span two adjacent subsequences? Problem: no suggestion in guidance Solution: extension method 7/10/2014 45GenoTHREAT
  46. 46. [Algorithm] : Extension Method 7/10/2014 GenoTHREAT 46
  47. 47. [Algorithm] : Extension Method 7/10/2014 47GenoTHREAT
  48. 48. [Algorithm] : Extension Method 7/10/2014 48GenoTHREAT
  49. 49. [Algorithm] : Extension Method 7/10/2014 49GenoTHREAT
  50. 50. [Algorithm] : Extension Method 7/10/2014 50GenoTHREAT
  51. 51. [Algorithm] : Extension Method 7/10/2014 51GenoTHREAT
  52. 52. Extend to meet possible alignments 120bp 80bp 120bp80bp New subsequence [Algorithm] : Extension Method 7/10/2014 52GenoTHREAT
  53. 53. [Algorithm] : Extension Method 7/10/2014 53GenoTHREAT
  54. 54. [Algorithm] : Extension Method 7/10/2014 54GenoTHREAT
  55. 55. [Algorithm] : Extension Method 7/10/2014 55GenoTHREAT
  56. 56. [Algorithm] : Extension Method 7/10/2014 56GenoTHREAT
  57. 57. [Algorithm] : Extension Method 7/10/2014 57GenoTHREAT
  58. 58. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance 1. Perform Six Frame Translation 2. Divide the query sequences into subsequences of 200bp or 66aa 3. For each subsequence i. BLAST ii. Best Matches iii. Flag if SAT 4. Automatic decision III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 58GenoTHREAT
  59. 59. [Algorithm] : Recap 7/10/2014 59GenoTHREAT
  60. 60. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization IV. Conclusions 7/10/2014 60GenoTHREAT
  61. 61. Using BLAST Online BLAST Performs BLAST via NCBI website interface ► Faster per BLAST ► Computationally less expensive ► Only sequential, due to NCBI restrictions ► Lack of privacy Local BLAST Performs BLAST in parallel on local machine ► User privacy ► Faster per sequence due to parallelization ► Computational expensive (Memory + CPU intensive ) 7/10/2014 GenoTHREAT 61
  62. 62. Screening time & hardware 7/10/2014 GenoTHREAT 62 Online Desktop Business Class Server Sequence length (bp) Screening time (min)* 2,000 2 10,000 12.5 *Screening performed using business class server
  63. 63. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 63GenoTHREAT
  64. 64. Database of Test Sequences • Implementations must be compared to assess quality • Standardized set of test sequences is needed • Test Set contains 184 sequences: • Select Agents o Genes associated with toxins or pathogenicity o Genes associated with normal function • Model Organisms 64 7/10/2014 64GenoTHREAT
  65. 65. Database of Test Sequences Contribute to the development of a standard test set of sequences 65 7/10/2014 65GenoTHREAT
  66. 66. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 66GenoTHREAT
  67. 67. Keyword and Anti-Keyword list • Test with the unmodified sequences (184 sequences) • Two lists of keywords • Limited • extensive • Plus • anti-keyword list • or not 7/10/2014 67GenoTHREAT
  68. 68. Keyword List Content Variation 7/10/2014 GenoTHREAT 68 0 20 40 60 80 100 120 Limited keywords Extensivekeywords Correct SAT Correct NSAT Keyword list method not mentioned in guidance Limited keyword list: uniquely composed of words in SAT List Extensive keyword list: extension of limited keyword list containing words uniquely related to SAT.
  69. 69. Anti-Keywords 7/10/2014 69GenoTHREAT Anti
  70. 70. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 70GenoTHREAT
  71. 71. Modified Test Sequences Modification performed on the initial unmodified sequences ► Intervening sequences ► Degenerate sequences ► Mutated sequences (BLAST parameters) 7/10/2014 71GenoTHREAT
  72. 72. Degenerate Sequences Potential Danger: Codon optimized nucleotide sequences 7/10/2014 GenoTHREAT 72 GATTTGGACACTCATTTCACC DLDTHFT Unmodified Nucleotide Degenerate NucleotideGATACGTCAACCTTTTAA GC Amino Acid Sequence Result: all codon optimized sequences detected due to screening of amino acid sequences
  73. 73. Intervening sequences Potential Danger: SAT sequences hidden within larger, benign sequences 300bps NSAT 200bps SAT 300bps NSAT 300bps NSAT 300bps NSAT 250bps SAT 7/10/2014 73GenoTHREAT Result: All hidden sequences were detected
  74. 74. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 74GenoTHREAT
  75. 75. Mutated sequences Potential Danger: mutated, but still active, SAT sequences which do not align to GenBank entries 7/10/2014 75GenoTHREAT
  76. 76. Nucleotides subsequences 7/10/2014 76GenoTHREAT Result: BLAST parameter settings affect screening capability
  77. 77. Amino-Acid subsequences 7/10/2014 77GenoTHREAT Result: BLAST parameters do not clearly change the efficiency of the screening
  78. 78. Nucleotides subsequences 7/10/2014 78GenoTHREAT Result: Direct relationship between screening time and ability to identify mutated sequences
  79. 79. Amino-Acid subsequences 7/10/2014 79GenoTHREAT
  80. 80. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization 1. Software implementation 2. Software Characterization i. Database of test sequences ii. Keyword list variation iii. Detection of Potentially dangerous sequences iv. BLAST parameters v. Real world gene orders simulation IV. Conclusions 7/10/2014 80GenoTHREAT
  81. 81. Real world gene orders simulation Gene Synthesis company: low number of false hit needed 1. iGEM registry • Registry completed by iGEM teams each year • Contains 10,000 sequences 2. GenoCAD database • 1,258 sequences longer than 200 bp 7/10/2014 81GenoTHREAT
  82. 82. iGEM Registry First step: screen registry sequences 1-->1724 Hit rate: 6.5% Major causes of hits: • 100% query coverage for Best Match too restrictive • Some results have 100% query coverage but very low Percent Identity • Keyword list issues 7/10/2014 82GenoTHREAT 95% 60% solved 2.9%
  83. 83. iGEM Registry 7/10/2014 GenoTHREAT 83
  84. 84. GenoCAD database • 1,258 sequences • 32 hits: 2.54% • Manual review: • YopH: protein from Y.pestis (gi|14488772) 7/10/2014 GenoTHREAT 84
  85. 85. Real world gene orders simulation Hits left are due to: • Very often: 1 subsequence of 1 Protein frame leads to a correct hit  Is it worth flagging the entire sequence? • Sometimes: many subsequences leads to correct hits  Probably worth flagging 7/10/2014 85GenoTHREAT
  86. 86. Road Map I. Current regulations II. Sequence screening algorithm: interpreting the guidance III. GenoTHREAT: implementation and characterization IV. Conclusions 7/10/2014 86GenoTHREAT
  87. 87. GenoTHREAT • “Best Match” • Hardware and software parameters • Keyword list • BLAST parameters • Certain types of sequence modifications • High-resolution screen 7/10/2014 87GenoTHREAT
  88. 88. Guidance conclusion Government Guidance potentially usable by companies: • Reasonable time • Good detection of sequences of concern • Number of false hits potentially low (manual review) 7/10/2014 88GenoTHREAT
  89. 89. 7/10/2014 GenoTHREAT 89 http://www.dagorret.net/2009/12/18/new-technology- developed-by-microsoft-for-photography-dna-image/ http://www.wadsworth.org/testing/biodefense/education.shtml
  90. 90. 7/10/2014 GenoTHREAT 90 © iGEM and Justin Knight.
  91. 91. 7/10/2014 GenoTHREAT 91
  92. 92. 7/10/2014 GenoTHREAT 92
  93. 93. 7/10/2014 GenoTHREAT 93
  94. 94. 7/10/2014 GenoTHREAT 94 A T A A C T C C C T G G G T C G T T A A A C C G G C G G C T G C G G C A G T C T T A G C A T A A T A A T C G G A T A G C A C T T T A T G A C C T G T C G T C G G G G C A C T A A A T G A A C T A G T G G C A G T A A C T G T C A G G C A G C A T A T A C A A C G T T C A A A T A A C T G C A T A G A A C C C A G A A T A A C T A C C A C C A C C G A A T C T T T A T C C A G A C G A C T G C A T G A C T C G C T T C T A C G A C G G T G A A T G A C G T T G G G T T G C G T C G C A T G G T A C C T A C T T A A C T T C G G T C G C T C A A T G A T C T G C A A A A G A A T C G G C T A T T G G A C T C C T A G G C G C G T C T T A T A T A T G C G G C G C T T T T A C G A T C C G G A C A T A A T C T A A G G T A T C G T A C G C G C G G G A A C A C G A G G T T G T A A C A C C G T A G C T A T C T C A T G C A T T C C G A C C A G C G G T T A T A T A A T A C T C G T T T T T T C C G C G T G C C A T C A T A C G A C G C T G G C C G C C G C G T T A G T G T C G T G T G T A C A C A C C G A G T T A C C C T C C T T C G T T C G C A C C A G C G T T A C T G C G T G T A G A G G A A A T T G G C T T G A G A G C T T T G C C C C A C C G C A C G A G G T A A C T A T T G A G A T C A G T C T A C A G A G T G C A A T A C A C C A A C G C http://sourceforge.net/projects/genothreat/
  95. 95. Acknowledgeme nts Dr. Jean Peccoud Mandy L. Wilson The VT-ENSIMAG iGEM team (2010): Michael Kozar Gaelle Letort Olivier Mirat Arunima Srivastava Tyler Stewart My PhD committee: Dr. Bevan Dr. Garner Dr. Peccoud Dr. Ramakrishnan Dr. Setubal 7/10/2014 GenoTHREAT 95
  96. 96. 7/10/2014 GenoTHREAT 96

×