COMET:A Novel approach to HIV-1 subtype prediction     (Context-based Modeling for Expeditious Typing)                    ...
Background• HIV-1 subtype is often used for epidemiological studies• Many different subtyping tools exist:    – jpHMM, RIP...
COMET HIV-1 subtyping tool•   Context-based modeling for classification of HIV-1 sequences adapted from    ppm compression...
Algorithm• Training of the model with the subtype reference sequences from Los Alamos National Lab(LANL) from 2008 and 30 ...
Analysis of 27017 prot-RT sequences from LANL•       Dataset for analysis:    –       27017 prot-RT sequences downloaded f...
Subtype distribution of the dataset             (27017 prot-RT sequences)              STAR   REGA COMETB              199...
Comparison of STAR, REGA & COMET                                  (27017 prot-RT sequences)•       All 3 tools agreed in 8...
Comparison of REGA & COMET to LANLOf the 27017 from the dataset, 24735 had a subtype (PURE, CRF, URF) assigned in the LANL...
Cohen Kappa         REGA ↔ LANL COMET ↔ LANL training set01_AE              0.98        0.98            502_AG            ...
BenchmarkAnaylsis of the 27017 prot-RT sequences:392+/-2 seconds (6 ½ minutes) on Opteron server (2 x Quad-core, 2.5GHz)=>...
Ultra-deep sequencing (UDS) applicationIn-house UDS (454) software:•    alignment, trimming•    filtering•    compressing•...
UDS application, dataset:    64 patients from Rwanda AMATA study    454 Sequence length: 333 bp (454, RT, AA 88 → 198) ...
UDS application, results:                                    COMET                                                 subtype...
Summary•       Reliable prediction of HIV-1 subtype•       Generally it is best to compare the results of different approa...
http://comet.retrovirology.lu                          subtype results can be                          downloaded in CSV f...
Acknowledgements     CRP-Santé, Laboratory of Retrovirology               Jean-Claude Schmit                 Carole Devaux...
Upcoming SlideShare
Loading in …5
×

COMET presentation at 8th European HIV Drug Resistance Workshop, Sorrento 2010

613 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
613
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

COMET presentation at 8th European HIV Drug Resistance Workshop, Sorrento 2010

  1. 1. COMET:A Novel approach to HIV-1 subtype prediction (Context-based Modeling for Expeditious Typing) Daniel Struck CRP-SANTÉ Laboratory of Retrovirology (daniel.struck@crp-sante.lu) comet.retrovirology.lu
  2. 2. Background• HIV-1 subtype is often used for epidemiological studies• Many different subtyping tools exist: – jpHMM, RIP (LANL), NCBI genotyping, STAR, REGA Subtyping Tool, …• Subtyping remains a controversial topic → compare the results fromdifferent approaches comet.retrovirology.lu
  3. 3. COMET HIV-1 subtyping tool• Context-based modeling for classification of HIV-1 sequences adapted from ppm compression algorithm (prediction by partial match) – take ambiguities from population sequencing into consideration• Software written in Java (Linux, Windows, Apple, …)• Core algorithm holds in approx. 300 lines of code• Does not require any external analysis tool (muscle / mafft / clustal, paup / raxml / phyml)• Multi-threaded (takes advantage of all the cpu cores available) comet.retrovirology.lu
  4. 4. Algorithm• Training of the model with the subtype reference sequences from Los Alamos National Lab(LANL) from 2008 and 30 additional near full length sequences from LANL.• Slide over the sequence and determine the probabilities for each subtype.Simplified example with a model 4: C T A G C A A C A C T A G C A A C A C T A G C A A C A C T A G C A A C A Subtype A 0.5 0.5 0.1 0.2 0.3 Subtype B 0.5 0.5 0.4 0.6 0.8 Subtype C 0.3 0.2 0.1 0.2 0.1• Determine the most probable subtype.• Then slide over the table of probabilities with a window size of 250bp and a stepping size of2bp to detect possible recombination events. comet.retrovirology.lu
  5. 5. Analysis of 27017 prot-RT sequences from LANL• Dataset for analysis: – 27017 prot-RT sequences downloaded from LANL.• Query parameters: – HXB2 start point: 2253, end point: 3450 (prot-RT region) – Sequence length < 1700 bp• Download subtype results from the STAR and REGA subtyping tools. – STAR: all PURE, CRF: 01_AE - 02_AG – REGA v2: all PURE, CRF: 01_AE - 14_BG comet.retrovirology.lu
  6. 6. Subtype distribution of the dataset (27017 prot-RT sequences) STAR REGA COMETB 19988 19722 20282C 1329 1334 1329A 672 1200 1194D 555 186 499G 246 441 393F 205 206 193H 19 21 19J 3 6 2CRF02_AG 867 787 829CRF01_AE 414 419 416other CRF 0 653 806unassigned 2719 2042 1055 comet.retrovirology.lu
  7. 7. Comparison of STAR, REGA & COMET (27017 prot-RT sequences)• All 3 tools agreed in 88.3% cases (23854) – 22352 PURE – 777 CRF – 725 unassigned• All 3 tools disagreed in only 0.1% cases (30).• COMET & REGA agreed in 6.4% cases (1722); STAR disagreed – 1034 PURE, 582 CRF, 106 unassigned• COMET & STAR agreed in 4.0% cases (1090); REGA disagreed – 910 PURE, 40 CRF, 140 unassigned• REGA & STAR agreed in 1.2% cases (321); COMET disagreed – 77 PURE, 8 CRF, 236 unassigned comet.retrovirology.lu
  8. 8. Comparison of REGA & COMET to LANLOf the 27017 from the dataset, 24735 had a subtype (PURE, CRF, URF) assigned in the LANL database.For comparison 24576 sequences were analyzed (PURE, CRF: 01_AE → 14_BG, URF) REGA & LANL agreed in 93.9% cases (23077) and disagreed in 6.1% of the cases (1499). Fleiss kappa = 0.84 COMET & LANL agreed in 96.9% cases (23818) and disagreed in 3.1% of the cases (758). Fleiss kappa = 0.92“The Fleiss kappa measure calculates the degree of agreement in classification over that which would be expected by chance and is scored as a number between 0 and 1.” comet.retrovirology.lu
  9. 9. Cohen Kappa REGA ↔ LANL COMET ↔ LANL training set01_AE 0.98 0.98 502_AG 0.92 0.93 603_AB 0 0 204_CPX 0.86 0.86 405_DF 1 0 306_CPX 0.83 0.77 507_BC 1 0.98 408_BC 0.97 0.97 209_CPX 0 0.8 410_CD -1.09E-04 0 211_CPX 0.64 0.64 312_BF 0.65 0.61 513_CPX 0.8 0.8 314_BG 0 0 2A 0.96 0.96 7 A1 ,2 A2B 0.92 0.98 7C 0.99 0.99 6D 0.41 0.94 6F 0.94 0.92 6 F1, 2 F2G 0.9 0.91 4H 0.97 0.91 4J 0.5 0.5 3K 0 0 2URF 0.38 0.55 comet.retrovirology.lu
  10. 10. BenchmarkAnaylsis of the 27017 prot-RT sequences:392+/-2 seconds (6 ½ minutes) on Opteron server (2 x Quad-core, 2.5GHz)=> 68 prot-RT sequences / second144+/-0 seconds (2 ½ minutes) on new Intel server (2 x Quad-core, newest generation, 2.93 GHz)=> 187 prot-RT sequences / second comet.retrovirology.lu
  11. 11. Ultra-deep sequencing (UDS) applicationIn-house UDS (454) software:• alignment, trimming• filtering• compressing• automatic correction of homopolymer count & “carry forward” errors• …• added adapted COMET module with bootstrap analysis (100 values per sequence, threshold 75%) comet.retrovirology.lu
  12. 12. UDS application, dataset: 64 patients from Rwanda AMATA study 454 Sequence length: 333 bp (454, RT, AA 88 → 198) Total sequences analyzed: 267749 (seq. with frameshifts excluded) Time needed for analysis (100 bootstraps / seq. ): 5 ½ minutes Sanger (prot-RT) (URF: 2 AC, 5CA, 1 CAC, 1 AD, 2CD, 1 DC, 1GH) comet.retrovirology.lu
  13. 13. UDS application, results: COMET subtype confirmation patient major subtype number minor subtype number unassigned minority % REGA STAR jpHMM man. align. insp. Sanger 5 A1 4312 C 1 0 0.02 ok A1/u ok ok URF_CA 8D 6853 A1 1 57 0.01 ok ok ok ok D 9C 6603 A1 14 28 0.21 u/A1 u/A1 H/A1 C-H?/A1 URF_GH 17 A1 5727 C 3 0 0.05 ok ok ok ok A1 18 C 3279 A1 5 0 0.15 ok ok ok ok C 21 A1 2856 C 4 0 0.14 u/u ok ok ok A1 22 C 5995 A1 5 0 0.08 u/A1 ok ok ok C 24 A1 6361 C 13 0 0.2 u/C ok ok ok A1 25 C 6412 A1 15 0 0.23 C/u ok ok ok URF_CD 26 A1 7350 C 1 0 0.01 u/C ok ok ok C 32 C 6094 A1 11 0 0.18 C/u ok ok ok URF_DC 33 A1 2226 C 1 0 0.04 ok ok ok ok A1 35 A1 4864 C 4 0 0.08 A1/u ok ok ok A1 36 A1 670 C 1 0 0.15 ok ok ok ok A1 47 A1 3290 C 2 0 0.06 u/C ok ok ok A1 48 A1 4120 C 1 0 0.02 u/C ok ok ok A1 49 C 5279 A1 58 0 1.09 ok ok ok ok C 64 C 1695 A1 9 0 0.53 ok ok ok ok URF_CA 65 A1 6346 C 8 0 0.13 A1/u ok ok ok A1 73 C 3335 A1 1 0 0.03 ok ok ok ok C 79 A1 3244 C 3 0 0.09 ok ok ok ok A121 out of 64 patients (32.81%) seem to be dually infected by two different subtypes comet.retrovirology.lu
  14. 14. Summary• Reliable prediction of HIV-1 subtype• Generally it is best to compare the results of different approaches to define the subtype of a sequence• High performance and scalability – suitable for deep sequencing (454) analysis• In preparation: stand-alone desktop version with possibility to inspect the recombination pattern comet.retrovirology.lu
  15. 15. http://comet.retrovirology.lu subtype results can be downloaded in CSV format comet.retrovirology.lu
  16. 16. Acknowledgements CRP-Santé, Laboratory of Retrovirology Jean-Claude Schmit Carole Devaux Danielle Perez Bercoff Jean-Claude KarasiCRP-Santé, Laboratory of Cardiovascular Research Francisco Azuaje comet.retrovirology.lu

×