DNA - based signatures defend against biological warfare agents and their makers


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

DNA - based signatures defend against biological warfare agents and their makers

  1. 1. rill... DNA-BASED SIGl':ATURES'DEFEND~~ . ",. . AdAINST BIOLOGICAL WARFARE AGENTS AND THEIR MAKE~S , AROIANA TIWARI, SUSMrr KOSTA, ROOPESiI JAIN WIth the end of the Cold War, the thrbat of nuclear holocaust faded but another threat emerged - attack by terrorists or even natio~ usip.g biological agents such as bacteria, viruses, biological toxins, and ge~etically 'altered organisms. The former Soviet Union once had a formidable biol~ical weapons program. Now, several countries and extremist groups are belieVl:(dto possess or to be developing biological weapons that could threaten urban P9puJations, destroy livestock, and wipe out crops. Even terrorists with limited ~kills and resources could make biological weapons without much'difficulty. It's not complex, it's not expensive,. and you don't need a large facility. For these reasons, biological weapons have dubbed the poor man's atomic bQmb, Contributing to the ease of making and concealing biological weapons is 'the . dual-use nature of the materials to produce such weapons, be¢ause they, are found in many legitimate medical research and agricultural acti~ities as well. The agents used in biological weapons are difficult todetec~ an~to identify quickly and reliably. Yet, early detection and iden,tification are ctucial for minimizing their potentially catastrophic humapand e~onomic cost. A major objective ofbiol6gical warfare is 4evelop~g better equipment, both fixed and portable, to detect biological agents. How,ever, any det~ction system is dependent on knowing the sign&tures of organism~ likely tq be tl,sed in biological weapons; These signatures are telltale bits of'PNA unique t,o pathogens (disease-causing microbes). Without prqper signatures, medical authorities could .lose houI:sor days trying to. determine the. cause of an outbreak, or they could be treating victims with ineffective antibiotics. Because of the importance of biological signatures as a key thrust ofits effort to improve response· to terrorist attacks. Over the past several years, scientific,' teams expect- to produce species- level signatures for all the' most likely b.iological. .,-.,.,.•.. ."DNA-Based Signatures Agliinst BiologicalWarfare .2ffJ .,'warfare pathogens. The team also expecis to have an iriitial set of species~level 'signatures for likely agnculturalpathogens, because an attack 60 a nation's "food supply could b.ejust as disruptive as'an attac~ ohthe .civilian population. Modem health and s~~urityconCeli1s have raised iriterest in the real-time detection and identification, of pathogenic microbes. Bacterial and viral pathogens have alwaysrepn:seIlted.one of the greatest threats to human health, andin-recent timesfuis threat .Increased due to the possibility of engineered biological agents. For these and other reasons; the genome sequencing field has targefed and sequenced-the 'complete genomes of hundreds of baderia.and thousands of viruses over the past q~cade, with ·many more sequences expected to appear in the near future. These sequences now make it possible.todevelop probe-based assays capable of identifying " any of hundreds of organisms in environmental an.<!,clinical samples. Such assays rely on detecting a DNA sequence that distinguishes the ,target organism from all other known bllcteriaand viruses and from backgroun4.material, which cQuld include DNA from humans, other animals, plants; or other species. A ·probe that 'accurately distinguishes between a target geno~Oo'-Or set of genomes--:-and all olherbackground genomes is terrned a signature sequence. DNA signatures are nucleotide sequences that can be' used to detect the ,•presence of an organism and to distinguish that organism from all other species. Several Levels of Signature The prime aim is to develop strain-level signatures for tI:i.~top suspected agents. Strains are a subset of a species, and their DNA !nay differ by about 0.1 percent within the species. A species, in turn, is a member of a larger related group (genus), and itsDNA may differ by a percent or so from that 'of other !nemb~rs of the genus, Characterizing pathogens at the strain -hivel signatures are essential for determining the native origin of a pathogen ass,ociated with an outbreak; such mfonttation could help law inforcement id~tify the group or . groupsbehirrd the attack. The biological foUndations work aims to provide validated signatures · usefulto public health and law enforcement agencies as well as classified .signatures for the national security community. In developing these signatures, , biological foundation researchers are also shedding light on poorly understood aspects of biology, microbiology, and genetics, such as iinmunology, evolution, and virulence. Increased'kIlowledge in these fields holds the promise of better medical treatments, including new kinds of vaccines. The biological foundations work is one element in DOE's Chemical anaBiological NonprolifetationProgram. Livetmore'scomponent<>fthis work is managed by its Nonproliferation, Arms . Control, andIntemationai secUrity DIrectorate. Other components of the overall program iI}clude detection, 'modeling and prediction, decontamination, and t~chD:01ogydemonstration projects. Livermore researchers were among the fIrst tot~cognize, in the early 1990s, the tremendous potential of detectors based on DNA signatures. OWe knew that a lot Of work was necessary to , ' 4
  2. 2. An ideal sigJiallire:, , • Has few short regions «0.1% total) • ,Occurs only in the parhogea - No false positives • Occurs i.n all variants (strai.ns) • No false negatives • Is necessary for virulence - "Unspoofable", ' • In engineered ocganisms ' • Tests for antibiotic resistance • Provides rooundancy ••• ~ Figure h'Bacte~ial chromos~mes (DNA) form loois, unlike human' chromosomes which form strands. In the loop, between two to five million bases'ofbacterialDNA are screened to locate unique r~gion~ (circled), which are marked with primer pairs. The marked regions are amplified thousands of times using polymerase chain reaction technology and then processed to identify and characterize an organism. ,classes of threats, such as agricultural pathogens. Two extremely virulent pathogens head the list: B. anthracis and Y.pestis, which'cause anthrax and plague in humans, respectively. Bacillus arithracis has few detectable differences among its~trainsj whereas Y.pestis strllms can vary considerably in genetic 'makeup"Unraveling die' sigriificant differences between the two organisms wilr givenatioRallaboratory researchers, experience vital for facing the challe,ngesofthenext few years, as they develop signatures for a wide, spectrum of microbes. , DNA- Based SignaturesAgainst Biological Warfare '271 of confidence requires several days; the goal again istoreduc~the'time to,less than 30 minutes. The final si~ature levei, mtended primarily for laWenf()rcement " use, will permit detailed identific,ation of a specific strain of a pathogen (for example, ,Yersinidpestis 'I<1M) and' correlate that strain with otl:ler forensic evidence. Such data will help to identify and prosecute attackers. Theptesent typical time lag for results isctirrently a few,weeks, and the goal IS to reduce that to a few days.." , ' Biological scientists asserrible a list of natural pathogens most likely to b~ used in a domestic attack; The list includes bacteria, viruses, ap.d other "'. tt f' ~l ] ~~ :}'1 .-;~ ;i, ~il 'i, f t, ,i;i " ;:';1:( ~~"~ ;~;, ie i~ !=:i. no ,Bioterrorism and Biologial Warfare develop the signatures the new detectors would, n~ed,6 says WeiQstein. In particular, the researchers recognized several pitfalls. For example, if signatmes are overly specific, they do not identify all strains of the pathogen and so' can give a false-negative reading. On the other hand, if signatures are based on ' g~nes that are widely shared among'many different bacteria, they can give a fa!se-positive reading. ~s a re~ult, ~i~atures m~st b~ .able, for example, to ,~eparate a nonpathogemc vaccmestram from an mfectIous one.""'.Jl: 'S'everal Levels of Identification To enhance their detection development effort, researchers are exploring advanced methods that distinguish slight differences in DNA. They are using , / the multidisciplinary approach. 'In this case, t>NA signature development' involves a team of microbiologists, molecular biologists, biochemists, " geneticists, and computer experts. Much of the work is focused on screening the two to five million bases that comprise a typical microbial genome to design unique DNA markers. This phylogenetic tree is a simple represen~tion of the bacterial kingdom. All human bacterial pathogens belong to the Granl-positive (red) or Proteobacteria (magenta) divisions. The other divisions' consist of nonpathogenic bacteria associated with diverSe environments. ~iological signatUres must be able to differentiate infectious bacteria from hundreds of , thousands of harmless ones. Each genus of bacteria has many species, and each species can have thousands of different strains are performing suppressive., , subtractive hybridization to distinguish DNA of vario,us species of virulent organisms that, will identify the microbe. The markers, called, primer pairs, ,typically contain about 30 base segments and bracket specific regions of DNA ' that area few hoodred bases long (Figure 1). The bracketed regions are replicated many thousands of times with a detector that uses polymerase chain reaction (PCR) technology. Then they are processed to unambiguously identify and characterize the orgartismofinterest. The different signatures will be needed for different levels of resolution. For example, authorities trying to characterize an unknown material or respond to a suspected act ofbioterrorism will begin with fairly simple signatures that flag potentially harmful pathogens within a few minutes. Typically, such a signature would encompass one' or two primer pairs and be sufficient for identification at the genus level (Yersinia or Bacillus, for example) or below. A signature in the next level of resolution is needed for unambiguously identifying a pathogen at the speCies level (Yersinia pestis, for example)~ This signature involves about:lO primer pairs. Currently, it takes several days to obtain conclusive data for a,speci~s-Ievel signature. The goal is to reduce that time to less than 30 minutes; The third signature level is used in pathogen characterization, identifying any features that could affect medical response (for example, harmless vaccine , materials versus highly virulent or antibiotic~esistancepathogeils). This ,signature level involves some 20 to 30 primer pairs. Together, the Primer pairS offer a certainty of correct identification. Currently, providingsucl1 a high level
  3. 3. .;:l.;~~ ~'.. 'QNA~ Based SigilaturesAgainstBiological Warfare m . '. . .During the Cold War, the Soviet Union ransevei;a~offensive biowarfare programs to developso~caned "Super Bugs." One. such program; Project Bonfire, worked to create bacteria that wereresistantto about tetivarietiesof antibiotics (Figtire 2).' This was done by identifying and cutting out.genes that " cQnferred antibiotic resistance in many different strams of bacteria. By pasting thes~ genes into the DNA of the anthrax bacterium, the Project Bonfire researchers created.a strain of anthrax that resisteq any existing cUre, making it imp()ssible to treat." ' .'The· HUnter ,Program was· anothei' Soviet biological warfare r.esearch program that focused on combiningwhole,genomes of different viruses to produce completely new hybrid viruses. These artificial viruses .could cause unpredictable Sym,ptOIllSthathave no known treatment. In an innovative tWist, ,the HUIlter ProgramreselU"chers' alsocreat.ed bacteria strains that carried' , pathogenic viruses inside thein.(Figure3). Figure 3: Hunter ProjeCt ' . These strains woul9bedouble trouble: a person who cOIitracte~ the bacterial disease wou.ld likely be treated with an antibiotic, whiCh would stop 'tQe .mfection by disrupting the bacterial cells. This would release the virus" re~ultingin an outbreak of viral diseaSe, Such a scenario would confuse medical perso~el; making treatment very diffic.ul~. .£ Bioterrodsmand BiQlogical Warfare·...~m Figure 2: ProjectBonffire :-':;Micha;nis~Qtf~~~ivebiological_~~,!:..((l:e.•cteating·' harmful tilolo'gical:agents'·' ....- ":"..' ,....':' .', , , ';, ,;inili~it:~atural state, bacteria, viruses and f'un~i canInak~ pretty .good .'biologic~l weapons. Thfow some genetic engineering mto the mix, however, an,dmore harmful agents can emer8i.. . ~~ " Eachofth~se organisms maintainsits genetic information in tHeforin of . .'DNA or,.in some virUses, RNA. This genetic material contains genes, which encode all of the information the organism needs to, survive and replicate. , Some of these. genes gove,m the organism's pathogeniCity, or its ability to infect a cell of a plant or animal. Through genetic~ngmeeiing, pathogenicity ," genes may be, manipulated to make, the orgap.isni'mOi;¢ .iJ}~ect~ous;,or more '. resistant to a therapy or cute. ', ,
  4. 4. Z75 Figure 4.:Idaho. Technology's R;A;P.I.l)~ detection unit. . .4 DNA~ BaSed SignilturesAgainst.Biological Warfare • The state health ·department conducts a (ull'Investigation' to determine whether the incident is an act of biological warfare~ To protect themselves from any potentially harlnful biological prchemical . agents, investigators at the scene are outfitted in protective suits apd self-contained breathing apparatus (SCBA) respirators'(same as the "SCUBA" gear used underwater). . . . • Investigators collect samples from patients and the surroundirig environment~,then test them for the presence of harmful biQlogical agents. In order to know which agents to test for, investigators evaluate .allof the evidence they collect at the scene, in~luding signs and symptoms shown by patients and patterns of disease . . transmission. Through a process of deduction, investigators can . narrow the list of suspected pathogens to just a few candidates .. .• While testing can take'place in existing laboratories, it can' be performed more quickly' in temporary field laboratories, using compact, portable detection units such as Idaho Technology's R.A.P.l.D., which stands for "Ruggedized Advanced Pathogen .Identification Device." The RA.P.I.D., detection unit uses peR to .id.eriiify the unique DNA signatures of suspected pathogens (Figure ~ . .Oenomic'DNA or RNA extracted from collected samples is added to a cocktail of reagents thatwill amplify a particular pathogen's DNA signature. If that specific pathogen is present in the sample collected, it will be positively identifie_dusing this approach. 'the entire process .from sample preparation to detection ~es less than 60 minutes. , '. . Ifa biowarfare. incident is confrrmed or thought to be probable; the state investigators notify the FBI and local law eg,forcement agencies' ..' immediately. Law enforcement and health officBls work together to .implement a pian to contain the site of contamination, clean itup and . . pinpoint the source of the attack. ' .,< ..•.~.iil"., .. t. ~~:. f Bioterrorism and Biological Warfare274 Defensive biological warfare ~ vacdnesaild detection methods In 1969, President Richard M. Nixon tenninated'the U.S. offensive biological warfare program and ordered stockpiles destroyed. The biological warfare research focus shifted from offensive to defensive techniques. Three y~ars later, at the 1972 Biological arid Toxin Weapons Convention, more than 100 nations signed a treaty prohib)ting the possession of deadly biological' ~agents, except for purposes of defensive research. Nations around the' world i=~oncentrated on developing vaccines as well as enhancing detection of biological agents. The Soviet Union signed the treaty, butiristead of dismantling their offensive program, they stepped uptheii pace. The Soviet program was not terminated until after the collapse of the Soviet ,Unionin 1992, when Russian President BorisYeltsin banned all offensive biological weapons-related ~ctivity. . All biological weapons stockpiles were destroyed and rese'arch was' considerably doWnsized, but it is unknown if Russia has completely dissolved the old Soviet program. . Vaccines Traditionally, vaccines consisted of a preparation of the infectious agent' itself - either living, weakened or killed. Introducing the vaccine into the body activates the immune system, resulting in the production of antibodies against that particular agent. Ifa vaccinated person is later exposed to the infectious .' agent, he or she will already have built up immunity against.it. More recently, .. researchers have started using fragments of the pathog~ri's DNA genome as a' vaccine, rather than the entire organis~. This approach helps eliminate the risk . of infection that comes with using traditional vaccines. . Detection methods While vaccination helps protect a population from known infectious agents, rapid detection of a suspected act ofbiowarfare allows fast action to be taken to control the spread of disease. Curreritdetection methods take advantage ofthe fact that each biological agent maintains its own unique DNA signature. Rapid detection methods use a technique called Polymerase Chain Reaction (PCR) to make a billion copies of a single. DNA strand within minutes. This method positively ideritifies an infectioqs agent, by means of its DNA signature, using even the tiniest samples. . .. Putting It All Togeth~r The Dark Winter project gave us a~gliriipse of how a biowarfare scenario might unfold. But what. is the government's planned response to· such a scenario? .. ,. ~. Although it niay be difficult to confrrmrightaway that an unusual illnessiD. a community is caused by a biological attack, theloeal health officer isimmediat~!y notified. Itis this person's responsibility . to ,wonn the state health ~depaitment, which in turn notifi.es the . federal Centers for DiseaSe Control and Prevention (CDe). :1 I I' i:' t Ii If I. [, ~!. ,I
  5. 5. ... ":' --_.• -f ~NA-"- BasedSignaturesAgainst Biological Warfare in simultaneously analyze 96 strains ofDNA..The another technique to aid poultry . ind~try by providing a handy way to detect Salmonella enteritidis. This bacteriWn can cause illness if eggs are eaten raw orundercooked. Subtractive hybridization results have been so successful that the signature can now be used to diStinguish between. subtypes of salmonella bacterhim.' In addition to · the DNA-based pathogen detection m.ethods, researchers are developing " detection capabilities using.antibodies that can tag a pathogen by attaching to . a molecular level physical feature of the organism. Antibody assays are likely to play an important tole in pathogen detection because they are generally fast and easy to use (commercial home-use medical tests use.this form of assay). Researchers are working to. improve. these· detection methods. as well. A bacteriophage (bacteria-killing virus) that only attacks Y.pestis and none of its cousins discovered that the virus produces a unique prQtein component to attach to the bacterium cell wall at a certain site and gain entry recognizing the distinct site could fonn the basis of a foolproof antibody signature. To achieve it with Y.pestis, we may be able to do it with other pathogens. Sensing Virulence As more information about' pathogens and their disease mechanIsms becomes available and as genetic engineering tools to transplant genes become cheaper and simpler to use, the threat of genetically engineered pathogens increases. Biodetectors must be able to sense the virulence signatures of · genetically engineered pathogens" or they will be blind to an entire class of .threats. The ultimate objective is to identify several specific virulence factors that might be used in engineered biological warfare organisms so that we can detect these engineered organisms and breaktheir virulence pathway. One.key factor useful for' detecting. engmeered organisms is an antibiotic resistance gene~ When transplanted into an infectious microbe, the gene could greatly increase the effectiveness of a biological attack and complicate medical response. Some antibiotic resistance genes are widely shared among bacteria •.and are easily transferred with elementary molecular biolom. methods. In fact, a standard biotechnology research technique is intrOducingantibiotlc resistance genes into bacteria as an indicator of successful 'cloning. The need to be able to rapidly recognize such genes so that the medical response is appropriate, another telltale indication of genetic tampermg is the presence of virulence genes in a microbe that should not con~in them. Virulence gene~ areoftett involved in producing toxinsor~olecutes that cause harm or that simply · evade a host defense. A series of genes is made available to perform their functions at the right time, they.could cause real damage. If interfering with the action of otie of these.genes. or its protems interrupts the virulence pathway, thl': disease process can be halted. Identifying and characterizing iinportant virulence genes attddeterminirtg their detailed molecular structure will greatly aid the development of vaccines, drUgs, .and other medical treatments. As an example,Y. pestis disables the imrnurie system in humans by injecting proteins into macrophages, one of the body key. deftmders against bacterial attack. ~x't" ~¥ ,~n f -fl",·,·" r~ ii. , r~ .''€~I~:>'" ....••'.·i~;f:.. " ',' . '.: -. Bioterrorisin and Biological Warfare ,I'-./ ~' --- . J ~t, ~'t Y .. . ..(, '7 I "". 116 !;z)t~...•... . .' .00 ••••• ~:. , . . " • f',~.,., ,.;,':~'~:~_~;~.~:~:< Figure 5: Twoextremelyvirul¢~t.()J;"ganisms head the list of pathogens most . likely to be used byt~rforistS;'B;:ttn~hrac~Oeft)and y. pest~ (right), which causeantllraxand'plagueinhumans;respectively. . . Focus on Plague . The main focUsis on Y.pestis, Francise/la tuli:ll'ensis(a bacterium caUsing a plague like illness in hunians), and s~veral other microbes thaUhreaten human and animal health, Eleven species and many thousands of strains belong to the Y~rsiIiia genus. The most nototious sp'ecies, Y.pestis, causes bubonic pl!!gue . and'is usually fatal unless treated~uickly with antibiotics. The disease is transmitted by·rodents and their fleas tQ humans and other animals. The '~gly subtle DNA differences among many Yersinia species maSk important differences. . One species causes' gastroenteritis, another is often fatal; and a third is virtually harmless; yet all have very similar genetic makeup. Insertionseqllence~'" based fingerprinting to understand these slight genetic, 4ifferences. Insertion sequences are mobile sectionS of DNA that replicate 0l.l their own. Analyzing' for their presence will not only help refinesigriatUres for Y. pestis but also shedI. . I:igDton how microorganisms evolve into strains that produce. lethal toxins. This tmderstanding, in turn, should give ammunition to researchers seeking an antidote or vaccine. to better understand the genetic differences. among species and strains. COmparing the genetic complement ofY. pestis with another I1lember of tile Yersinia group (pseudotuberculosis) that caUses aninte~tinal disease they are closely related, and yet they' cause such different diseases. Bette •. and Faster, with More Uses There are a number of methods. fu~t:allow ritorerapid.identification and . characterization of unique segments of;P~A.,.Eachmethod has advantages and drawbacks, with some more applicable to one;organism than another; In addition to the insertion seqllence method, another promising technique is called suppressive subtractive hYbridization ..The method takes aD organism and its near neighbor, hybri4izes the DNA frQill both, and determines the fragments 'not in common as th~ basIs of asignatiJrc: One':goal is to .I
  6. 6. 1111 I I I I [ I·' I I II"I!" "--_.- "-- 278 Bioterrorism arid BiologicalWarfare Because,the protein acts as an inimuDosuppressant to disable the macrophage; understanding its structure not only would help scientist,s fashion a drug that physically blocks the protein but also would shed light on autoimmune diseases . sUyh as arthritis and asthma. . . Virulence Genes in' Com~on Vfrulence genes spread natur¥lyamong pathogens and thus are also . !?und in unrelated microbial species. Therefore, virulence genes alone are n'8t>sufficient for species-specific DNA based detection. Differentiate the virulence genes in natural organisms from engineeredorganisnis are using different methods for differentiating virulence genes from among the thousands of genes comprising the genomes of pathogens. One technique looks for genes that start making proteins at the internal telDperatlli"es of mammals. For example, genes of rpestis that becomes much more active at 'Sl"C. It seems a safe bet that many of these genes are associated with the bacterium multiplying within a warm blooded host. The sequence of the three plasmids (bits of DNA located outside the microorganism circular chromosome) that contain most. of the virulence genes required for' full development of the bubonic plague in animals and humans. Plasmids sometimes transfer their genes to neighboring bacteria in what is called lateral evolution. (Antibiotic resistance genes are also located on plasmids.). The Y. pestis strain that causes bubonic plague, for example, may. have evolved some 20,000 years ago. Such understanding is relevant to HIV, which may not have become infectious for humans until.the 20tlr century. Working with End Users . There needs to be a strong relationship between development ofbiological signatures and detection technologies and their end uses. Making diagnostic tools available to regional publichealthagencies and thus create a national mechanism for responding quickly to bioterrorism threats. Currently, many . health agencies use detection methods that are not sufficiently sensitive, selective, or fast. For example, one culture test for detecting anthrax takes two days. Major damage and even death may have occurred in that time. DNA signatures will be thoroughly validated before being released, because their use might lead to evacuations of subways, airports, or sporting· events and such evacuations cannot be undertaken lightly. As part of the validation effort, which are characterizing natural microbial backgrounds to make sure that the . signatures are accurate. imdel'actual conditions. To that end, researchers are . collecting background microbial samples in air, water, and soil, liswell as in human blood, urine, and saliva ..R anthracis is related to B: thrugin~nsis, a naturally occurring harmless microbe that lives in dirt and can give a false positive readingto anthrax ifthe signann:e used is not adequately specific. The characterization effort is being aided by a device called the Gene Chip. The device simultaneously moriitors the expression of thousands of gepes. Equally ,.1 Ii!H ',r, ( "·c. ....alii" ' ...t-., .r . DNA-Based Signatures AgainstBiological Warfare .T79 . .important, the researchers envision a strong mechanism linking biomedical .scientists with public health and law enforcement officials to develop new signatures speedily and cost,effectively to stay several steps ahead of terrorists. . .. DNA signatures are nu<;leotide sequences that can. be useq. to. detect the presence of an organism and to distinguish that organism from all other 'species: Here we describe Insignia, a new, compreIwnsive system for the rapid identification. of signatures in the genomes of bacteria and virLises. With the availability of hundreds 6fcomplete bacterial lmd viral genome sequences, it is now possible to use computational methods'toidentify signature sequences in aU of these species, and to use these !>ignatures as the basis for diagnostic assays to detect and genotype microbes in both environmental and ciiniCa,lsamples. The success of such assays critically depends on the methods llsed to identify signatures that properly differentiate between the target genomes and the sample background. We have used Insignia- to compute accurate signatures for most bactel'iatgenomes and . made them available through our Web site. A sample of these signatures has been 'Suc~essfully;,tested on a set of 46 Vibrio cholerae strains; and the .. results indicate that the signatures are highly sensitive for detection as well as specific for discrimination between these strains and their near relatives. Our approach, whereby the entire genomic complement of organisms are compared to identify probe targets, is a promising method for diagnostic assay development, and it provides assay designers with the flexibility to cho.ose probes from the most relevant genes or genomic Fegions. The Insignia system is freely accessible via a Web interface and has be.en released as open source software at: http://insignia.cbcb.umd.edu. 'Occurrence and E~pression of Insignia . . Modem health and security concerns have raised int~est in the real-time detection and identification of pathogenic microbes. nacterial and viral . pathogens have always represented one of the greatest threats to human health, and in recent times this threat increased due tp the possibility of engineered biological agents. For these and other reasons, the genome sequencing field has.targeted and sequenced the-complete genomesof hundreds of bacteria and thousands of viruses over the past decade, with lnany more sequences expected to appear in the near·future. These sequences . now make it possible· to develop probe-based assays capable ofidentifying any of hundreds of organisms in environmental and clinical samples. Such .assays rely on detecting a DNA sequerice that distinguishes thetarget organism from all other known bacteria and.viruses and from background material, yvhich could include DNA from humans, other animals, plants, Qr other species. A probe that accurately distinguishes between a target genome-or set of genomes--and all other background genomes is termed a signature sequence:
  7. 7. ..,,-~~.'~1"~t DNA- B3sed Signatures Against Biological Warfare· 281 anthraciswhose 16SrRNA seqlences are identical [Keim et al., 1999,2000]. Although these methods areeffettive, they only provide a limited number of signatures, which are not always sufficient to ideritifybacteria or viruses in a new sample; in particular, if the siunple contains an unknown strain, it might contain genetic variability in preci~ely the region for which assays are designed. Thus, in general; one would like to have as mliQ.y,assaysavailable as possible. Insignia a,ddresses this by' 4sing the complete genome' to generate all unique signatures, ,from which the assay ,designer can choose those that are best- suited for a particular appiication. " Recent increases irithe amount of available genomic sequence have made it possible to largely automate the design and screening o(probes via cOlQPutational search algorithms. Large-scale computational prediction of DNA '.sign~ture!! was first undertaken for the Biological Aerosol Sentry and , , Information System (BASIS), deployed at the Salt Lake City Olympic Games in 2002 [Fitch.et al., 2002, 7.9.03].The related BioWatch project operates by collecting' and analyzing airborne miCrobial samples for known pathogens,, ' .. using PCR probe-based detection methods. Newer aerosol detll<;,tionsystems, '. such as'the Autonomous Pathogen Detection System (APDS) [McBride et al., , 2003], automate the proces!!~and can identif< a known bioweapon in 0.5 to 1.5 hours [Brown, 2004]. Similar'teclmiques are not limited to aerosols, and can be used in clinical or agricultural settings [Lirn, 2005]., "( The success of these assaysde~eridS on both the available sequence databases and the computational meth used to identify signatures that differentiate the threat organisms from the ) c~ground.Signature design for both BAS.IS and BioWat(;h was handled by. LliWr~c~ Livermore, ~ational Laboratones (LLNL), an4 what began as asunple proof..of-concept BLAST searen at LLNL evolved into the sophisticated KPATII signature pipeline [Slezak , et a/~, 2003]. KPAtHidentifies sequences shared by a collection of ~get 'genomes, yet urtique with respect to all other microbial genoiites, and isnotable for its ability to handle such a large search space. Other methods for probe selection more rigorously address hybridization efficiency (binding energy, , self-hybridization; etc.), but do not scale well for large target and background sets [Kaderali and Schliep, 2002., Gordon and Sensen, 2004,. Nordberg, 2005., Li F, 2001]. Most notabie are the approaches that promise the scalabi1ity~of KPATH combined with the hybridization considerations .ofthe other methods (Tembe et al., 2007., Rahmatm, 2003]. Because of its history of use in real-wotld'dlagnostic syste~s, a mote detailed description of KPATH is warranted. It consists ,of four major components. First, a whole-genome multi-alignmerit isperforrned ein a set of targefgenomes. This produc~s a "~o~seris1.isgestalt," ,which repres~nts the sequences that are conserved in all the' target genomes:Next, this consensus is matched against a database ofbackgrOuD.dsequences using Vmatch [Kurtz, 2003]. This'step computes all exact matchesb,etween ~e target consensus and, "... w1, tj:,. "~ 1 f~ ~ l ~ ~'; ',',£: " i'r", It .~ [['.1.•••.1: . '1' ~ ~~ ' ~.~.,.r.:. .i. ~' ~':~:' .~' i~i, .Bioterrorism .and Biological Warfare ''. By the definition, a signature sequence mus~be conserved among a set of target genomes and dissimilar to any !!equence in the surrounding environment. TO.detect a target with existing technology such as qPCR 'assays, signatures must be relatively short; however, if they are too short, they will not be ,unique: For example, because there.are only 410 ~ I million 10-bp (base-pair) sequences, and a typical bacterial genome is more than I million bp in length, . most 10-mers will be shared by man~.genomes and therefore make unsuitable .~ipatures. Increasing the length, k~of the signature alleviates this problem,. . but'if k is to.o large, it may not be possible to fmd a signature shared by a set of target genomes. Therefore, there is a tradeoff between signature sensitivity (the number of genomes that share the signature) and specificity (the number of genomes that do notpossesstne signature). For instaIlce, a long signature .'- may be highly specific to a particular strain or isolate, but it may not be sensitive enough to detect closely related strains that might 'cause the same disease or have other shared phenotypic characteristics. Because genomic sequence is nonrandom, and only a small sample of genomes has been, sequenced, it is difficult to estimate an optimal signature length. In practice, signature length is ' usually determined by the constraints of the detection technology (e.g., -20 bp focPCR primers). Cmrent probe-based technologies are generally based on either PCR or microarray hybridization. These methods are beginning to replace traditional gel-based fingerprinting because they can more effectively differentiate between: closely related microbes (Willse et al., 2004]. Microarray methods are particularly promising because of their ability to multiplex many probes on a single chip [Willse et al., 2004; Wang et al. 2002; Volokhov et al., 2004], improving both the ' redundancy and capabilities of the diagnostic. PCR does notmultip1ex as nicely; 'however, it remains popular because of its robustness, speed, and low cost - [Slezak et al., 2003; O'Connell et al., 2006; Moser, 2006]. Unlike restriction fingerprinting, both PCR and microarray methods require explicit knowledge of the Underlying DNA sequence, therefore necessitatiri.g probe design. Traditional probe design strategies have focused on single genes or other loci that are determined a priori to be useful in distinguishing one target organism from another. Examples include genes that are associated with phylogenetic distance (e.g., 16S rRNA genes) and variable number tandem repeats (VNTRs). In the fOflller case, where the gene or locus is conserved among target and nontarget organisms, gene sequence alignments would be used to aid in probe design. Probes would then be manually designed and screened for sensitivity and specificity to the target. Those assays failing, to identifyall target organisms, or producing false positives, would be invalidated and the design revised. This manual screening made diagnostic assay,design expensive and only worth doing for a few select pathogens. Alternatively, variable number ~dem repeats (VNTRs) have proven very useful in classifying and distinguishing many closely related strains of bacteria, such as Bacillus II"·
  8. 8. ."'<"!!li*!@!J~-' ••• 283 ., DNA-Based Signatures Against Biological Warfare the matches may take days to compute,' the signatures can be extracted from this cached information in seconds. Match Pip'eline The function of the match pipeline is to identify exact matches between aU pairs of target and background sequences in the database. The size of the Insignia sequence database is cUrrently about 60 billion nucleotidesj and even with the linear-time algorithms described below, this is too large to search in real time. Some computational effort is saved by limiting targets to microbial genomes only, but the process of matching all pairs oftarget a~d background genom~s remains expensive. . . ' To complete the matching phase within a reasonable amount oftirne, all exact matches of 18bp or longer are first identified using MUMmer [DeIcher et ai., 1999; Deicher et ai., 2002; Kurtzet al., 2004], alinear time and space suffix tree matching algorithm. To expedite the process, MUMmer'searches are partitioned across a 192-node Linux cluster. Even with the use of an efficient search algorithnl, however, the size of the database and the high repeat content ·of many genomeS"causethe size of the output-the number of matches between allpairs'of genomes-to reach unmanageable levels (e.g., the number of matches can be quadratic with respect to the size of the genomes). To combat this problem, matches are converted to a minimalized "match cover" data structure, described riext. This structure saves substantial space and later ·provides a convenient mechanism for computing signatures.. . . The match cover is not a lossless conversion, however, because it discards information about where a match occurred in the background. The information is nonetheless sufficient for sigllature computation, where it suffices to know which regions of a target are unique .. Furthermore, .by ex,cbiding irrelevant background match positions, large background_databases can be accommodated without drastically increasing the matclftoversize, and dra.ft quality genomic sequences can be incorporated without diffic,ulty. As the next .section will show, the match cover encapsulates all the necessary information . for signature discovery and.allows for the rapid construct~on of signatures for any set of target andbackgrOl.ind genomes in linear time. For petspec~ive,' it is"worth mentioningthaf the match cover is an equivalent, interyal representation of matching statistics (Chang & Lawler, '1994; Gusfield, 1991]. Both formalizations represent the longest contiguous match beginrling at any position of Ii. sequence, but our interval representation is space-efficient ~ndeasier to interpret in the context of signature discovery . Rahm~ also leverages the properties of matching statistics in describing a "jump list" for the discovery of DNA probes [Rahmann, 2003], and it is · iriteresting to note that although the match cover and jump list Were ~ived at · independently, they are analogous given their shared utilization of matching statistics. '(i.' .~. '> ,.~. Bioterrorismand Biological Warfare the background. Matching sequences are m~ked out to create a "uniqueness gestalt," which represents all sequences that are shared between target genomes and unique with respect to the backgroUnd. Third, signature sequences are . supplied to the PrimerJprogram [Rozen and Skaletsky, 2000], which designs ,PCRa$says based on those sequences. Primer3 produces a set of oligos suitable for testing by a TaqMan PCR assay: a forward primer; areverse primer, andan intervening probe oligomer [Liva1cet al., 1995]. Finally, assay candidates are :'~;;creenedusing BLAST [Altschul· et at., 1990] for near matches that might . disrupt the hybridization process, and ranked according to their satisfaction of PCR experimental 'Constraints. The result of this four-stage process is a set of ranked, prescreened assays, which are 'then subjected to rigorous laboratory. . validation. The. transition' to these computational methods' from previously manual design methods has result~d in greatly increased design efficiency by limiting the number of assays that fail during laboratory validation. In addition to the computational restrictions, limitations ofTaqMan PCR have been demonstrated for rapidly diverging target genomes, such as hepatitis . and HIV viruses [Gardner et aI.~2004; Gardner etaI., 2003]. However, for typical bacterial targets, TaqMail assays remain one of the most rapid and sensitive methods for signatute detection. In the case where TaqMan is inadequate, different detection technologies, such aschip-hybridizatiori methods, could be used to remove the TaqMan requirement for three adjacent probes and to provide greater signature redundancy. Insignia would,easily support the'design of such assays. ViruSes pose significant challenges for all detection methods because of . . their sma'l genomes and high mutation rates. The Insignia databasecontams thousands of viral genomes; however, for -large target. sets there are often no conserved signatures. To address highly divergent targets, future Insignia versions may include the ability to identify signatures with degenerate basesj for cases where no exact signature is share.dbetween them. An alternative is to compute the minimUm signature set; where each signature might not identify every target, but the set contains at least. one identifying signature for each target. This approach is particJ.llarlysuited for chip assays where signatures can be multiplexed. A related approach selects combinations of non-Unique probes, such that certain viral strains c.anbe·identified by theirhybridiZeition pattern [Urismanet ai., 2005]. Insignia support for specialized viral diagnostics . isleft for future work. . . . Insignia provides real-time signatUre retrieval for an arbitrary set o(target and background genQmes. This requires the vast majority of compuuitional . . work to be done in advance ,and cached, so that a minimum amount of computation is necessary at the time of the query.To accommodate this, Insignia , is designed as two separate components: th,ematch,pipeline and the signatUre . pipeline. This distinction separates the computationaUy intensive matching. t .• • . step from the much simpler signature'generation step, and· allows, sequence matches to be recomputed offline as new genomes become available. While ,Ii i·1 II'i' 'I. ~ ;i,,( I I c I . I. ~. I
  9. 9. Signature Pipeline The function of the signature pipeline is to generate valid signatures for , any set of target and background genomes. Because there are thousands of , possible targets artc:imany more backgroUnds, combin,atorics rules out the pre- computation of all signatures;' however, it is possible to generate signatures from the match information with Giinimal overhead. The pipeline for doing so is ,~ ",divided into tWo parallel stages, corresponding to the two primary criteria a valid signature ml,lstmeet: I. a signature must be shared by all genomesin the target set; and ' 2. a signature mus£not exist in any genome in the backgroood set. .' Occurrericeand Expression of MannDB MannDB is a relational database that organizes data resulting from fully automated, high~throughput protein~sequence anaiyses using open:"source tools. Types of analyses proxided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigeniclty, and secondary structure. Proteomes (lists of hypothetical and known proteinS) are downloaded and parsed from Genbank and then inserted into MannDB; and annotations from SwissProt are downloaded when identifiers arefound jn the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submissiontp external servers. In addition, BLAST against protein entries in , ,MvirDB, our d~abas~ of~icrobial virulence. factors, isperfouned. A web client browser enables vlewmg..of comput,atlOnal results and downloaded 'annotations" anda query tool enables sttuctured, and free~textsearch capabiIItie~.Whenavailable, links to eXternal databases, includingMvirDB, are provided. 'MannDB' contains whole~pro~eomeanalyses, for at least, one , iepre~entative organismfrQm each category of biological threat organism liste,d by APHIS, CDC; HHS,NWD, USDA, USFDA:, and VHO~ MannDB comprises a huge numberofgenomes and compreh~nsive protein sequence analysesrepreseriting :organismslisted as high-priority;agents on the websites of several governmental organizations concerned With bio- terrorism. MannDB ptovides the user with a BLAST ,interfabe for companson of native and non-native sequences arid a query tool forconveriiently selecting proteins' of interest. In addition, ~the.user has acc¢ss to a web-based browser . that compile's comprehensive and extensive reporKAccessto MiuinDa is freely available at h#p://inanndb.llril.gov/ webcite.· ..... . . '. MannDB was created to meeta needfor;apid,comprehensive sequence analysis with an emphasis 011 proteinprocessing,.surfa~e:character:istics;. and functional classification to support selection of pathogen or, virulence- associated proteins suitable as targets for driving the developfilent of protein- h~~ed rea!!:ents (e.!! .• antibodies, non-natUral amino-acid ligan~, synthetic DNA- Based Signatur.esAgainst BiologicalWarfare 285 high, affinity ligands) for pathogen detection [Slezak et '01., 2003; Zhou CEZ, ~005]. Because comprt';heIisive. analyses of this typeJequired using a large humber of open-source tools,aIidbecause it was 'necessary to'scale the computations for analysis, 'of whole prott';omes, we built a fully automated system for executing sequence analysis tools and for storage, integration, and display of protein 's~quem:eanalysisand amlotation data. IIi order to be able to rapidly examine and compare whole bacterial and viral proteomes for selection of suitable target protemsfor .bio-defense applications, we compiled data for whole proteomes frbmreprt';sentative organisms from allcategories of biological . threat agents listed by several governmental' agencies: APifIS, CDC, HHS; . USDA, USFDA; NiAID, and WHO.[APIDS Agricultilral SelectAgent Program select agent and toxin list;GDC bioterrorism agents/dIseases list; HHS and USDA select agents artd toxins list; USFDABad Bug Book; NIAID catt';gory A, B and C priority pathogens; WHO list of majo~ zoonotic diseases; WHo list . of diseases covered by the Epidemic and Pandemic Alert and Response (EPR)] as weHas taxonomic near-neighbor species as appropriate. Therefore, the scope of MannDB is automated sequence analysis and evidence integration for proteins fromalkurrently recognized bio-threat pathogens. Emphasis is placed . upon analyses that are most useful in characterizing potential protein targets and surface· motifs that could be exploited for development of detection reagents. The contentofManriDB is updated on a regular basis, In recent years several software systems and accompanying databases have been developed for microbial genome annotation, each with a particular emphasis [Andrade etal., 1999; Frishman et al., 2001; Gattiker et al., 2003; Goesmarin et al.,'2005;Markowitz et al., 2005; Meyer et al., 2003; Vallenet, 2006;VaiI Domselaar et al.; 2005 ]. Some databases place artemphasis on gene prediction and DNA-basedanalysesvs. protein sequence-based analyses, or provide autorpated (primary) vs. curated (secondary) annotations. Although microbial annotation databases frequently include pr~ictions of biological, chemical, structural .• and physical properties. Qf proteins (e.g., antigenicity, post-translational modification's, hydrophobicity, membrane helices), none cWTentlyoffers the comprehensive suite of analyses (see MannDB website for complete list of tools ) contained withinMannDB for chm-acterizing viral as well as bacterial proteins from human and agricultural/veterinary pathogens of interest to the bio-defense community anl for rapIdly identifying putative viI1l1ence-associated proteins for development of functional assays. The, MannDB database was built and linked to MvirDB [MvirDB nllcrobial virulence database] in order to meet thesetequirements. In addition, we focus on sequence analyses that' assist. in selection' of protein features (e.g., surface, characteristics) most suited for targeting detection reagent development. Construction and Content MannDB is implemented as an Oracle 10 g relational database. The schema forMannDB data organization is available on the website. MannDB captures . - 4 ~. IiiI· Bioterro,riSmand BiologicalWarfare284 1: ;i Ti, Ii if i; i; ii ·;1' :!j Hi III ill !II ill :'1 ii, iJi :11 :1. :1; III " ,.- 1i - ,Ii Ii II 1I,
  10. 10. ,,' ZZ7 .s .5 .!a 'd 0~b"lO r:: ell '0, o CoI - -r::.~ ~ t).ll o ell "Q r:: ~ ell ;=,.-..~ ••. r:: '.~.-~0- -= ~,~~.- 0 b:~'~Q ~'" ',~' r:: Sm. = r:: '-' o ell = "':s ~.f!. ,,'" ell .f!.;:, o;lell~-a "Q ~ r:: •••• - _ o. sg ~ >:..- ~.!, '~ ~,.~ ell Q.S.- .•..•. " r:: "Q ;j.- Q) - --.c ~~ "Q ._, ,r:: = a. ~ ell .~"'O ~ "'r::.!abell ell ell '" >. r:: ~,~ ell Col' '" ~ 0 -u a- 0 5a..s = 1;;' ~ C" ~ "u ~ ~ a- '" 0 =' =.c ~~ - I r:: ~ 5 , ; ~ go ':S ~ 'Q a- c = cE "'0 = ....= ~ -=ell _ U a- '" ~ t).ll»~ ell '" ~ :e .~ ~ ell ~,~ .f!rJ3 "eIl '. ~=.. ~ ~aa- ell ~:s ~ t.'" == •.. o •• .~ E: 's=.1:> •• ~ e oC'a u 10< -;; .. =s t; ';!e ~o(Si'ii 5 Q ~ 8 ~g~~ ,Q 0" ~ .E ~ ~ .. ~~ ~ ••• l~ (I)71 ' = , ~" ...:~ ' " ~ ..' . u t ~~. 'Cj.w ILl '5-:il ' ,~.!-= c . '.c::a ~.,Q"Q " -e ~ •• ~ ~~'" ~ CIS 4!,jf =-:-::s e.~ ~~OU.=~ S"'~"'t''''Soo ::s b.Q ~ rI) e- = e t:• aI"O.fIllU.~"f'-I401 C Go' u· C •• tn·, = >. E :-·~·eb·S:~; a ~~J ci~~ 5~QuGJ-uO::s=t::o a 6- ~ cr·-;·. ~:e ~ ~~(J:f~~~E~ 1i ~.~..=;:; ~ ~~ ~ .r a· .~g.~ ~~o•• 0 I:••. s. -= . "Q Co •• ~4i~ cS •• ~01>.O o5"Q ~ 1>.0; j, ••• ~rn' ~, 0 ,~,,i,r-1 == .. ~ .~ rI2 .. -bg.. -I:- '•• 3'=0 ::l- ~~ t '[D~' , 1>.0 I:•• l;fJ ,m~ ..' (J DNA.;..BasedSignatures Against BiologicalWarfare :"i:,, r '7 ~ '1(; il 286 Bioterrorism and BiologicalWarfare results from our fullyautoinated, high~throughpiit, whole-proteome sequence analysis process pipeline, depicted in Fig. 6. Proteomes (lists of hypothetical and known' proteins) representing' human, bacterial and viral pathogens' and near-neighbor species are downloaded from GenBank and parsed into MannDB. Whenever possible, ~e begiQ with gene calls on fmished genomes. However, the system' can be used to predict genes on draft genomes, and can be used to analyze arbitrary listtof protein sequences. Reference genomes ,are updated on a quarterly basis to ensure that the softWare tools are being run on current sequence data. Annotations fromSwissProt are downloaded when GenBank, entries contain SwissProt identifiers, or when identical sequences are detected by blasting MannDB entries against the SwissProt proteiQ fas~ ,database. MannDB contains at leastone.reference genomefor each category of pathogen listed as abio-threat organisl}l on websites maintained by APHIS! CDC, HHS, USDA; USFDA, NIAID, and WHO. Open-source tools 'are run either on local systems or by means of batch submission to external serVers. As of this writing the system executes 36 tools, which are listed on the MannDB web site. Automated sequence analyses include predictions of po'st-translational modifications; structural conformation, chemical properties, functional assignment, and antigenicity, as well as motif detection and pre-computed BLAST against protein and nucleic acid sequences in MvirDB, our database of microbial virulence factors, protein toxms, and antibiotic resistance genes [MvirDB microbial virulence database]. Tools thatarerun in-house are updated periodically to ensure that the system.isrunning themost recent software versions against the mos,t recent data sets. Toolsar~ selected, and input' parameters are set according to'the taxon ()f th~,organism from ~hich the ptoteinset is constructed. For example, some tools (e.g., NetPicoRNA; [Blom et al., 1996]) are run only on specific organisms, 'Whereas others (e.g., SignalP; [Bendtsen et al., 2004]) have taxon-specific settings. In some cases we run more than one tool for a similar prediction. TMHMM and TopPred both predict membrane helices, but results may. differ, for example, in the start and end residues for a given segment. Our strategy is to e,mploy more than one tool, when available, so that conflicting results can be noted and evaluated by the user. In parsing results from each tool, data are inserted into one of nine tables (see schema on web site) depending on the type of prediction (e.g., protein chemistry); tools that make similar predict~ons tend to produce similarly structured output (although formatting differs considerably), which facilitates data storage and retrieval. .. " . A web client browser enables viewing of automated analY)iis results, annotations, and'linksto MvrrDB. The user first selectS a proteome, then'a specific protein for which to view summary results, and [mally selects the specific categories of analysis to be vie·wed. Only analyseS returning results are displayed. Hyperlinksto external data sources are provided for additional' information whenever external database identifiers arereturned ..The MannD8
  11. 11. : I:~': [,II'i i , , :'1 ! i 1,. , ;:i :~!l l~~!l 1il ~lj i!l "~!l _:~l ,m " ll"- ~ - , , Biot~rrorism and Biol()gicalWarfare tools.etincludesa BLAST interface, which can be used to quickly identify an entry ofinter:est by its sequence, when the gene name orlocus tag is unknown, or to identify protein sequences related to,a sequence of interest. A query tool allows the user, to construct 3 types of searches: 1)free-'textsearches against ~ndatabase fields that contain qescriptive infornation, including fields, , containing gene names or external database ident1fiers~2) structw:ed searches against-specific analysis types; and-J) a search for proteIns linked to entries in ~JviVirDBeitherbycornmon uniq.ieidentifier orbypre-computed blast homology. Iteports lUldresults sets from the query tool can be downloaded into Excel. zhou et ai, BMe Bioinformatics 2006 7:459doi: 10,1186/1471-2105-7-459 Utility "MannDB provides users with pre-computed s~quenceanalyses for complete proteomes of bacterial and viTalpathoge~ from several governmental agencies' lists ofbio-tbreat agents. The genomes.and tools are maintained up to date, with predictions being re-run every 3' months. The user can browse proteomes, or can blast sequences againstMannDBto pull up related entries ,and associated data. MannDB provides a convenient source of automated sequence analyses and downloaded annotation information for whole proteomes of human pathogenic bacteria and viruses and has a high degree of integration with external databases. MannDB provides sequence analysis information ofpririlary interest to 'researchers in the bio-defense communitY. We have been using MannDB for" several years to "annotate" DNA signatures [Slezak: etal., 2003] and more " recently to assist collaborators in efforts to down~select from ",hole bacterial and viral genomes to identify suitable protein ~gets and protein features for _. driving the development of detection reagents [Zhou et al., 2005]. For example, a common requirement for a detection assay isthat it be performed with minimal sample disruption. Therefore, an initial down selection for proteins expected to be on the stirface of a bacteriaJparticle might entail identification of proteins that are predicted to be secreted or membrane bound by using tools such as PSORT [Gardy et al.,2005; Nakai and Horton, 1999;], TMHMM (Kroghet al., 2001], SignalP, TargetP [EmanueJsson et al., 2000], TopPred [Claros et al., 1994], and HMMTOP [Tusnady and Simon, 1998]. Having results from several tools that provide similar predictions but using different algorithms or slightly different approaches' allows. us to compare predictions and make selections with greater confideJ:.lce. Identification of surface features for targeting of. detection reagents is done primarily by means' of additional sequence- and structure-based analyses [Zhou et aj., 2005], although predictions pertaining . to post-translational modifications (e.g., glycosylation, cleavage) are taken into consideration as they may affect prote~ recognition. Availability llnd Requirements MannDB is freely accessible at http://manndbJInl.gov/ webCite. Although .~ ~ ;'~ > •••••••- . ·f ,DNA- Based SignaturesAgainstBiologicalWarfare 289 the software that populates and updates MannDB is not open-source,' the user' may request coUaborativesequence analysis services by contacting Wi group@kpath.llnl.gov. List of abbreviations BLAST =Basic local alignment se'arch tool. APHIS =Animal and PlantHealth Inspection SerVice. (])c = Centers for Disease Control and Prevention. ,'HHS =Health and tIuman Services. USDA = United States Departinent of Agriculture. USFDA =United States Food and Drug Administration. ' NIAID =National Institute 'of Allergies and Infectious Diseases. , WHO , =World Health Organization. Comparative genomicstools applied, to bioterrorism defence Rapid advances in the genomic sequencing ofbacteril!.and viruses over the past few yeats have made it possible'to consider sequencing the genomes of all pathogens that ·affect hlimans and the crops and livestock upon which our lives depend. Recent events make it imperative that full genome sequencing be accomplished as soon as possible for pathogens that could be used as weapons of mass destruction or disruption. This sequence information must be exploited to provide rapid and accurate diagnostics to identitY pathogens and distinguish them from harmless near-neighbours and hoaxes. The Chem- Bio Non-Proliferation (CBNP) programme of the US Department of Energy (DOE) began a large-scale effort of pathogen detection in early 2000 when it ,was announced that the DOE would be providing bio-security at the 2002 Winter Olympic Games in Salt Lake Cityl Utah.' Our team at the Lawrence Livemlore National Lab (LLNL) was given the task of dAveIopingreliable and validated assay s for a number of the most likely biote'rrorist agents. The short timeline led us to devise a novel system that utilised whole-genome comparison methods to rapidly focus on parts of the pathogen genomes that had a high probability of being unique. As~ays develqped with this approach have been validated by the Centers for Disease Control (CDe). They were used at the 2002 Winter Olympics, have entered the public health system, and have been in continual use for non-publicised aspects of homeland defence since autumn 2001. Assays have been developed for all major threat list agents for which adequate genomic sequence is available, as well as for other pathogens requested by various government agencieS'. Collaborations with-comparative genomics algorithm developers have enabled our LLNL team to make niajor . advances in pathogen detection, since many of the existing tools simply did not scale well enough to be of practical use for this application. It is hoped that a discussiOn of a real-life practical application of comparative genomics •••
  12. 12. I'i; , fl' J ., JI' II ,.-I,L!t I: i :1 "{ ,i 11. i! ;1 !I I ~1,I ij!I 1111 ~j!q,. q Il!1[. .,'il III "Itl' Iii Bioterrorism and Biological Warfare algorithms may help spur algorithm developers to tackle some of the many remaining problems that need to be addressed. Solutions to these problems will advance awide range of biological disciplines, only one of which is pathogen det~ction. For example, exploration in evolution and phylogenetics, amwtilting ,~ene coding regions; predicting- and understanding gene function and regulation, and untangling gene networks all rely on tools for.aligning multiple . sequences, detecting gene rearrangements and duplications, andvisualising ,~~~n:o~c.data:Two key problems cUrrently needing improved so~utions ar?: (1)lilt'gnmgmcomplete, fragmentary sequence (eg draft genome contlgs or arbItrary genome regions) with both complete genomes and other fragmentary seq~ences; and (2) ordering, aligning and visualising hon-colinear gene rearrangements and inversions in addition to ~e colinear alignments handled , .' by current tools. . ,. DNA- based signatUres are needed to quiCkly and' accurately identify biological warfare agents and their makers. DNA signatures are nucleotide sequences that can l)e used to detect the presence of an organism' and to distinguish that organism from all other species. Insignia, a new, comprehensive . system is applicable for the rapid identification of signatures in the genQmes of bacteria and viruses. With the availability of hundreds of complete bacterial . and viral genome sequences, it is now possible to use computational methods . to identify signature sequences in all of these species, and to use these signatures as the basis for diagnostic assays to detect and genotype microbes.~ . in both enviJ:onmental and clinical samples. The success of such assays critically depends on the methods used to identify signatures that properly differentiate between the target genomes and the sample background. Insi@ia is used to compute accurate signatures for most bacterial genomes and' made them available through the Web site. A sample of these signatures has been successfully tested on a set of 46 Vibrio cholerae strains, and the results iri.dicatethat the signatures are highly sensitive for detection as well as specific for discrimination between these strains' and. their near relatives. Th~ entire genomic complement of organisms are compared to identify probe targets, is a promising method for diagnostic assay development, and it provides assay designers with the flexibilitY to choose probes from the most relevant genes or genomic regions. The Insignia system is freely accessible via a Web interface and has been released as open source software at: http://insignia.cbcb.umd.edu. MannDB is a genome-centric database containing comprehensive automated sequen£eanalysis predictions for protein :;equences from organisms of interest to the bio-defense research community. Computational tQolsfor the MannDB automated pipeline were selected based on customer needs in providihg down selections from large sets of proteins (e.g., wholeproteomes) to short lists of proteins most suitable for developing reagents to be used in field assays for detection of pathogens. For that reason we have focused our .efforts on' applying tools that would enable selection of proteins that meet '!t:,.~':.. ':'1't, >{ ;'1. f4, ~...~,t-l.~fi?""." ., ,'DNA-Based Signatures Against lUological Warfare. 291 assay requirements, such as cellular localization, that would liSsistin determining the value of a surface feature for targeting'ligand binding, or that would identify antig~nic sub-sequences of particular value inantipody development~ As the ·goals of some of these assays have been to detect toxins or proteins associated with virulence, we constructed hard links between protein sequences in MannDB with entries in MvirDB in order to conveniently identify and characterize protein - · targets and features for ~ese applications. We believe that MannDB will be of general use to the bio-defense and medicalresearch communities as a resoUrce · for predictive sequence analyses and virulence inform<tion. References Altschul SF, GishW;Miller W, Myers EW and Lipman OJ (1990): BasiC local alignment search tOI;>1.JMol Bioi, 2i5, 403-410. . Aridrade MA, Brown NP, Leroy G,Hoersh S, de Daruvar A, Reigh C, Franchini . A, Tamames J, Valencia A, Ousounis C and Sander C (1999) : Automated . gen_omesequence analysis and annbtatlon. Bioinformatics, 15,391-412. APHIS Agricultural Select Agent Prograin select agent and toxin list [http:// . www.aphis.uspa.gov/programs/atLselectagentlalLbioter't _toxinslisthtml] 'webcite BendtsenJD, Nielsen H, von Heijne G and Brunak S (2004) :Improved prediction . of signal peptides: SignalP 3.0. Journal ofMolecularBiologj., 340, 783- 795. BlomN, Hansen J, Blaas D and Brunak S (1996): Cleavage site analysis in picomaviral polyproteins: Discovering.cellular targets by neural networks ... Protein Science,S, 2203-2216. Brown K (2004): Biosecurity. Up in the air.Science, 305·, 1228-1229. CDC bioterrorism agents/diseases list [http://www.bt.~dc.gov!agentlagentIist- category.asplwebcite . j;;' Chang WI and Lawler EL (1994): Sublinear expect~d time approximate string matching and biological applications. Algorithmica, 12,327-344. Claroi,-MG,vonHeijpe G: TopPred IT (1994) :An improved software for membrane profein structure predictions. CABIOS, 10,685-686. DeIcher AL, KasifS, Fleischniann RD, Peterson J, White °and et al. (1999): . Alignmentofwholegenomes. NucleicAcids Re.s, 27,2369-2376. Deicher AL, Phillippy A, Carlton J and Salzberg SL (2002): Fast algorithms for . large-scale genome aHgrimentand comparison~Nucleic Acids Res; 30,2478- ~~. . 'Emanuelsson 0, Nielsen H, Brunak S and vOll Heijne G (2000): PrediCting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology, 300, 1005-1016.' •••
  13. 13. 292· DNA- Based Signatures Against Biological Warfare 293 Keim P, PriceLB, KlevytskaAM, Smith KL and Schupp JM, (2000) Multiple- locus variable-number tandem repeat analysis reveals genetic relationships within Bacfllus anthracis. JBacteriol182, 2928-2936. 'Krogh A, Larsson B, von Heijne G, Sonnhammer ELL Year: Predictmg . transmembrane protein topology with a hidden Markov model: application to .complete genQmes. Kurtz S (2003): A time and space efficient algorithm for the substring matching problem. TechDicalReport. Hamburg: Zentrum fiirBioinformatik, Universitiit Bamburg. Kurtz S, Phillippy A, DeIcher AL, Smoot M, Shumway M and et al. (2004) : Versatile and open software for comparing large genomes. Genome Biol, 5, R12. Li F and Stonno GD (200 I): Selection of optimal DNA oligos for geneexpression . arrays. Bioinformatics, 17, 1067-1076. Limnv,Simpson 1M,Keams EAand Kramer MF (2005): Current anddeveloping technologies for monitoring agents of bioterrorismaml biowarfare. Clin Microbiol Rev. 18,583-'607.. . .L-ivakKJ, Flood SJ, Marmaro J, Giusti Wand Deetz K (1995) Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting peR product and nucleic acid hybridization. PCR Methods Appl, 4,357-362. , McBride MT, Masquelier D, Hindson BJ, MakarewiczAJ and Brown S (2003): Autonomous detection of aerosolized Bacillus anthracis and Yersinia pestis.AnaIChem, 75,5293-5299. Markowifz vM, Korzeniewski F,PalaniappanK:; Szeto P, lv.anovaN and Kyrpides NC(200S): The integrated microbial genomes (IMG) system: a case study in biological data management. Proceedings of the 3J§t VLDB Conference: 2005; TrondheimNorway.2005, 1067-1078. . Meyer F, GoesmannA, McHardy AC, Bartels D, Bekel T, Clausen J, Kafinowski· J, Linke B, Rupp 0, Giegerich Rand PuhlerA (2003): GenDB - an open source genome annotation system for prokaryote genomes. Nucleic Acids Research, 31,2187-2195. Peterson ill,Umayam LA; Dickinson TM, Hickey EK and WhiteO (200I): The comprehensive microbial resource. Nucleic Acids Research, 29, 123-125.. . . MvirDB microbial virulence database [bttp://mvirdb.llnl.gov} webcite. Moser MJ, Christensen DR, Norwood D an~Prudent JR (2006): Multiplexed .detection of anthrax-related toxin genes.J Mol Diagn, 8, 89~96. NakaiK and Horton P (1999) : PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization. NlAID category A, B and C prioritY pathogenS (bttp://wWw3.niaid.tiih.govl biodefenselbandc Jlriority.htm) webcite , . ..•. : .. '. .,.,~ Bioterrorismand Biological Warfare Fitch jp, Gardner SN, Kuczmarski TA; Kurtz S,MyerS R and et al. (2002):.Rapid . deveiopment of nucleic aciddiagnpstics. Proc IEEE, 90, 1708:-1721. . Fitch IP,Raber Eand Imbro DR(2003): Technology challenges in ~esponding to biologiCal or chemical attacks in the civilian sector. Science, 302, .1350- ,,1354. .. . .. .... . FtjshmartD,Albermanrt K,Hari I, Heumann K, MetariomskiA,Zollner A, Mewes . H-W (2001): Functional ari-d structural genomics using PEDANT .. '.'1>,(j;, Bioinjormatics, 17,44~57. . . Gardy JL, Laird MR,CheriF, Rey S, WalshCJ; EsterM and BrfukmanFSL' (2005):PSOR1b v.2.0: expanded prediction of bacterial prQteinsubcellular localization and ~·insighis.gained from comparative proteome analysis.~· Bioinfo~matics, 21,617-623. . Gardner SN, Lam MW, Mulakkeil NI, Torres CL; Smith JR and ef al. (2004):· Sequencing needs for viral di<!-gnostics.J Clin Microbiol, 42, '5472-5476. Gardner SN, Kuczmarski TA, Vitalis EAand Slezak TR (2003): Limitations of TaqMilDPCR for detecting divergent viral pathogens illustrated by hepatitis A, B, C, and E viruses and human iminunodeficiency virus. JClin Microbiol, 41,2417-2427. Gattiker A, Michoud K, Rivoire C, Auchincloss AH, Coudert E, Lima T, Kersey P, Pagni M, Sigrist CJA, Lachaize C, Veuthey A-L, Gasteiger E and Bairoch A, (2003): Automated amiotation of microbial proteomes in SWISS-PROTo Computational Biology and Che"Jistry, 27,49-58. . GoesmannA, Linke B, Bartels D, Dondrup M; Drause L, Neuweger H, Oehm S, Paczian T, Wilke A and Meyer F, (2005): BRIGEP - the BRIDGE-based~ genome-transcriptome-proteomebrowser. Nucleic Acids· Research, 33, W710-W716. Gordon PM and Sensen CW (2004): Osprey: A comprehensive tool employing. novel methods for the design of oligonucleotides for DNA sequencing and microairays.NucleicAcids Res, 32, e133. . Gusfield b (1997): Algorithms on strings, trees, and sequences: Computer' science and computational biology. New York: Cambridge University Press. 554p. HHS and USDA select agents and toxinS list (http://www.cdc.gov/od/sap/docsl salist.pdf) webcite. Hohl M, Kurtz S aild Ohlebusdr E (2002): Efficient multiple genome alignment. Bioinjormatics" 18, S312-8320. Kaderali L and Schliep A(2002): Selecting signature oligonucleotides to identify organisms using DNA arrays.Bioinjormatics, 18, 1340-1349" Keim P, KlevytskaAM, Price La, Schupp JM and Zinser G (1999) Molecular diversity in Bacillus anthracis. I Appl Microbiol87: 215c..217.. lli1 il! ;1 ,~ !l , ~f i·. :11. :.:[' 1 ',I ~ 111
  14. 14. i'l '~II ,..-' ! ' " , q' Ii ! '~:> I" : .:~j ;: i i !.]i , 'I ri !! I I : I 1' I ' jl i : ~" r: ",,: -' 294 Bioterrorismand Biological Warfare Nordberg EK (2005) YODA: Selecting signatl,lre oligonucleotides'. Bioinformatics, 21, 1365~1370. ' O'Connell KP, Bucher JR, Anderson PE, Cao CJ, Khan AS and et al. (2006): Real-time fluorogenic reverse trans~ription-PQRassays for detection ofbacteriophage MS2. Appl Environ Mic.robiol, 12, 478~83. ,KD, Tatusova T"Maglott DR: N~I reference/sequence' (RefSeq), year: a curated non-redundant sequence database of genomes" transcripts and 'iI!~ '" proteins; Nucleic Acids Res 35: D61-D65. ' Pruitt KD, Tatusova T and Maglott DR (2007): NCBI reference sequences (RefSeq): A curated nonredundantsequence qatabase of genomes, transcripts, and protein's. Nucleic Acids Res, 35, D61~D65. Rahmann S (2003): Fast and sensitive probe selection for DNA chips usmg jumps in matching statistics. Proc IEEEComput Soc·Bio~form Conf2, 57- 64.' , Rozen Sand Skaletsky H(2000): Primer3 on the WWW for general users and for biologist programmers. Methods Mol BioI, 132: 365-386: ' , Slez3I< T, KuczmarsId T, OU'L, Torres C, Medeiros D and et al. (2003): Comparative genomics tools applied to Qioterrorismdefense. BriefBloinform 4,133-149. Slezak T, Kuczmarski T, Ott L, Torres C, Mederos D, Smith J, Truitt B,Mulakken N, Lam M, Vitalis E, ZemlaA, Zhou,C and Gardner S (2003) : Comparative genomics tools applied to bioterrorism defense. Briefings in Bioinformatics, 4,133-149. Tembe W, Zavaljevski N, Bode E,ChaseC, Geyer J andet al. (2007): Oligonucleotide fmgerprint identification for micro array-based pathogen . diagnostic assays. Bioinformatics, 23, 5~13. ' Tusnady GE, Simon I year? : Principles governing ~inoacid composition of integral membrane proteins: applications to t9P.ology prediction. , UrismanA, Fischer KF, Chiu CY,Kistler AL, Beck S andet al. (2005):E-Predict: - '0_, A computational'strategy for speciesidentificatiori 'based, on ob'served DNA micro array hybridization patterns. Genome BioI, 6~R78.", ' USFDABad Bug Book [http://www.cfsan.fda~gov/-mowr.iltroJitmllweb~ite Vallenet D, Labarre L, Roily Z, Barbe V"Bocs S~Cruveiller S;Lajtis A, pascal 0. Scarpelli C and Medigue C (2006): MaGe: a Diicrobial genome annotation system supported by synteny results. Nucleic Acids Research, 34,53-65. . Van Domselaar GH, Stothard P, Shrivastava S;CMJA, Guo A, Dong X,LuP, Szafran D,Gremer Rand WIShart DS (2005) :BASys:IIweb server fot~tOrnated " bacterial genome annotation. NudeicAcids Research,;~3(Yl455;;'W 459, ,,'. ': ... '",.','" ',.1,' " ,V610khov D,Pomerantsev A, Kivovich WRaSoolyA afidChi~ikov V(2004): ,0; Identification ofBacillus,antl;zracisbYU1f1.ltip(o~ Iiiicro~ay hybridization. DiagnMicrobiolInjectDis; 49, 163q7f: ' ' , I';"t,~{, ,,~t~. _it~ , 4t 't' 'i'}i:;''!~t~. J.;i' ~1i:~ . "', ;. {.: ii/ 1.:, ,';t~ ;~;,:: ~'l!.' 1 'i i t " .tI.oi.,~~"t., ,,' , DNA- Based SignaturesAgainst Biological Warfa,re 295 . Wang D, Coscoy L, Zylberberg M, Avila PC, Boushey HA and et al. (2002): Microarray.;.based detection and genotyping of viral pathogens. Proc Natl AcadSci USA, 99,15687-15692. WHO list of major zoonotic diseases [http://www.who.intlzoonoses/diseases/ 'en/) webcite ' WHO list of diseases covered by the Epidemic and Pandemic Alert and Response (EPR) [http://wWw.who.iiltlcsr/diseaseleRl] webcite Willse A, Straub TM, v~schel SC, Small JA, Call DR and et al. (2004): Quantitative oligonucleotide micro array fingerprinting of Salmonella ent~rica isolates. Nucleie;Acids Res, 32, 1848-1856. Zhou CEZ, ZernlaA, Roe D, YoungM, Lam M, SchoeingerJ and Balhom R (2005) :Computational approaches for identification of conserved/unique binding pockets in ,the A chain of ricin. [http:// bioinforrnatics.oxfordj ournals.o rg/cgilreprint/21114/3089] webcite, Bio'informatics, 21, 3085~3096. ' it. .4