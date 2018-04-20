Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
P O S T E C H DiagTree: Diagnostic Tree for Differential Diagnosis CIKM 2017 Yejin Kim This is a joint work with Jingyun ...
P O S T E C H Problem: Find the Optimal Decision Rules 11/9/17 2 Differential Diagnosis D1 D2 D3 Prob. Disease1? Disease2?...
P O S T E C H Partially Observed Markov Decision Process (POMDP) 11/9/17 3 𝑝 𝒚 = (0.33, 0.33, 0.33) Disease 1, 2, 3 ...
P O S T E C H Limitation 1: Time to Finish Decision Process 11/9/17 4 9 tests take only 3 days t1 t2t3 t6 t5 t10 … 0 1 8 7...
P O S T E C H Limitation 2: Limited Number of Tests 11/9/17 5 In practice, the number of tests can be used is limited due...
P O S T E C H Limitation 3: No Reused test 11/9/17 6 t1 t2t3 t3 Medical tests sometimes accompany irreversible chemical r...
P O S T E C H Limitation 4: No not-tested disease 11/9/17 7 For each disease, there should be at least one test that is no...
P O S T E C H Related Work §Decentralized POMDP [1] enable multiple agents to do test at the same time §𝜌POMDP [2] reduce...
P O S T E C H DiagTree: Diagnostic Tree for Differential Diagnosis §A tree-like structure that represents a policy and its...
P O S T E C H DiagTree Structure (Q,q) = (6,3) 11/9/17 10 t1 t3 t6 1,1,0 Assume the max. number (Q) of tests = 6 the max. ...
P O S T E C H Probability Inference and Disease Detection 11/9/17 11 𝑝 𝒚 = (0.33, 0.33, 0.33) 𝑦+ 𝑦, 𝑦- t1 t3 t6 1, 1, 0 𝑝 ...
P O S T E C H DiagTree Representation §To formulate the optimization problem of building DiagTree, we represent DiagTree...
P O S T E C H Objective Function for Optimal DiagTree §We derive a linear objective function w.r.t. 𝑥8,U,V for integer pro...
P O S T E C H Constraints 11/9/17 14 2. No reuse 3. No not-covered diseases 1. Each internal node has q tests 4. Only one ...
P O S T E C H Branch-and-Bound (BNB) §Mixed integer linear programming problem & non-deterministic exponential time compl...
P O S T E C H Experiment Setting §Dataset: Immunohistochemistry (IHC) profiles -diagnoses abnormal cells found in cancer t...
P O S T E C H Experiment Setting – Algorithm evaluation §Six IHC sub-profiles with diseases of interest §Prior probability...
P O S T E C H Experiment Results – Algorithm evaluation 11/9/17 18 DiagTree optimized using BNB produced high confidence f...
P O S T E C H Experiment Setting – Comparison with clinicians 11/9/17 19 §Baseline: Clinicians’ rules extracted from patho...
P O S T E C H Diagnosis Rule from DiagTree §Example of DiagTree to diagnose the five diseases in IHC4: - 9823/3: Chronic l...
P O S T E C H Diagnosis Rule from DiagTree 11/9/17 21 Q=6, q=6 Q=6, q=1 MUM1, FMC7, lgG, TCL1, CD43, lgD
P O S T E C H Accuracy DiagTree detected the diseases with high accuracy (ACC), high true positive rate (TPR) and low fal...
P O S T E C H Cost (#test, #stages) Compared to clinician’s rules, DiagTree produced lower costs from most diseases 11/9/1...
P O S T E C H Conclusions -Minimizes the cost for decision process -Allows multiples tests at the same time -Can support c...
Do you have any questions or comments?
Appendix 11/9/17 26
P O S T E C H Partially Observed Markov Decision Process (POMDP) Markov Models Do we have control over the state transitio...
P O S T E C H DiagTree Structure (Q,q) = (6,1) 11/9/17 28 Assume the max. number (Q) of tests = 6 the max. number (q) of ...
P O S T E C H DiagTree Structure (Q,q) = (6,6) 11/9/17 29 t1 t3 t6 t7 t8 t9 Assume the max. number (Q) of tests = 6 the ma...
P O S T E C H DiagTree Structure (Q,q) = (6,6) 11/9/17 30 Assume the max. number (Q) of tests = 6 the max. number (q) of ...
Upcoming SlideShare
Loading in …5
×

DiagTree: Diagnostic Tree for Differential Diagnosis (CIKM17)

12 views

Published on

Presentation slides for DiagTree: Diagnostic Tree for Differential Diagnosis (CIKM17)

Published in: Data & Analytics
no profile picture user

  • Be the first to comment

  • Be the first to like this

DiagTree: Diagnostic Tree for Differential Diagnosis (CIKM17)

  1. 1. P O S T E C H DiagTree: Diagnostic Tree for Differential Diagnosis CIKM 2017 Yejin Kim This is a joint work with Jingyun Choi, Yosep Chong, Xiaoqian Jiang, and Hwanjo Yu 11/9/17 1
  2. 2. P O S T E C H Problem: Find the Optimal Decision Rules 11/9/17 2 Differential Diagnosis D1 D2 D3 Prob. Disease1? Disease2? Disease3? t1 t2 t3 Stochastic binary tests (outcome is either positive or negative with probability ) t4 t1 t3 t2 Then, which tests should be used in which order?
  3. 3. P O S T E C H Partially Observed Markov Decision Process (POMDP) 11/9/17 3 𝑝 𝒚 = (0.33, 0.33, 0.33) Disease 1, 2, 3 𝑦+ 𝑦, 𝑦- Test t1 𝑝 𝑡+ = 1 𝒚) = (0.8, 0.0, 0.5) Positive (1)Negative (0) 𝑝 𝒚|𝑡+ = 1 = (0.62, 0.00, 0.38) Positive 𝑝 𝒚|𝑡+ = 0 = (0.12, 0.58, 0.29) Test t4 𝑝 𝑡6 = 1 𝒚) = (0.1, 0.5, 0.7) Positive (1)Negative (0) 𝑝 𝒚|𝑡+ = 1, 𝑡6 = 1 = (0.19, 0.00, 0.81) 𝑝 𝒚|𝑡+ = 1, 𝑡6 = 0 = (0.83, 0.00, 0.17) D1 D2 D3 Prob. D1 D2 D3 Prob. D1 D2 D3 Prob.
  4. 4. P O S T E C H Limitation 1: Time to Finish Decision Process 11/9/17 4 9 tests take only 3 days t1 t2t3 t6 t5 t10 … 0 1 8 7 2 0 1 2 t1 t3 t6 t7 t8 t9 t2 t5 t10 t2 t10 t12 9 tests take 9 days! Too long! Assume we can use up to 9 tests
  5. 5. P O S T E C H Limitation 2: Limited Number of Tests 11/9/17 5 In practice, the number of tests can be used is limited due to insurance policy t1 t2t3 t6 t5 t10 … 0 1 8 7 2 Eg. National Health Insurance Service from Korea imposes an additional fee of about 1.8 times on seventh cell immunity test Penalty! 6
  6. 6. P O S T E C H Limitation 3: No Reused test 11/9/17 6 t1 t2t3 t3 Medical tests sometimes accompany irreversible chemical reaction (that we can’t repeat). Reusing test Antigen (Tumor cell) Antibody (tests) + à ßX
  7. 7. P O S T E C H Limitation 4: No not-tested disease 11/9/17 7 For each disease, there should be at least one test that is not missing! Eg. 𝑡, must be used to test for 𝑦, 𝑦+ 𝑦, 𝑦- 𝑡+ X 𝑡, X 𝑡- X 𝑡6 X X 𝑝 𝑡8 = 1 𝑦9) X = missing value 𝑦+ 𝑦, 𝑦- No! No! ? (Not tested) Your cancer is 𝑦-!
  8. 8. P O S T E C H Related Work §Decentralized POMDP [1] enable multiple agents to do test at the same time §𝜌POMDP [2] reduces uncertainty of hidden states without instant reward 11/9/17 8 [1] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of markov decision processes. Mathematics of operations research, 27(4):819–840, 2002. [2] M. Araya, O. Buffet, V. Thomas, and F. Charpillet. A pomdp extension with belief-dependent rewards. In Advances in Neural Information Processing Systems, pages 64–72, 2010. Multiple test Reduce uncertainty Real-world constraints Dec POMDP ✔ 𝜌POMDP ✔ DiagTree ✔ ✔ ✔
  9. 9. P O S T E C H DiagTree: Diagnostic Tree for Differential Diagnosis §A tree-like structure that represents a policy and its results §Allows multiple tests in each stage §Uses integer programming to find a best policy to detect target disease 11/9/17 9
  10. 10. P O S T E C H DiagTree Structure (Q,q) = (6,3) 11/9/17 10 t1 t3 t6 1,1,0 Assume the max. number (Q) of tests = 6 the max. number (q) of tests in each stage = 3 t7 t8 t9 0,0,0 0,0,1 1,1,1 1,1,00,0,0 0,0,1 1,1,1 Detected diseases at leaf nodes q = 3 tests 23-way child node for 23 results y3y1 y2 t2 t4 t7 t2 t5 t8 … … 0 1 2 # stages = 2
  11. 11. P O S T E C H Probability Inference and Disease Detection 11/9/17 11 𝑝 𝒚 = (0.33, 0.33, 0.33) 𝑦+ 𝑦, 𝑦- t1 t3 t6 1, 1, 0 𝑝 𝒚|𝑡+ = 0, 𝑡- = 1, 𝑡; = 0 = (0.69, 0.00, 0.31) t7 t8 𝜙 0, 0, 0 0, 1, 0 1, 1, 1 1,0,10, 0, 1 0, 1, 1 1, 1, 1 𝑝 𝒚|𝑡+ = 0, 𝑡- = 1, 𝑡; = 0, 𝑡= = 1, 𝑡> = 0 = (0.92 0.00, 0.08) Likelihood in leaf nodes 𝑙 𝑢B = 𝑝(𝑦∗|𝑢B) ∏ 𝑝(𝑦9|𝑢B)9,FG HF∗ IJK Probability inference in internal nodes 𝑝 𝒚|𝑡+, … , 𝑡M = 𝑝 𝒚 𝑝(𝑡+, … , 𝑡M|𝒚) 𝑝(𝑡+, … , 𝑡M) = 𝑝 𝒚 ∏ 𝑝(𝑡8|𝒚) M 8N+ ∏ 𝑝(𝑡8) M 8N+ Bayes theorem Memoryless in Markov process 𝑦+ is the target disease! 𝑙 𝑢B = 0.92 𝜖 ×0.08 QJK ≈ 325 𝜖=arbitrary small value instead of 0 Max value Geometric mean of the others𝑦+
  12. 12. P O S T E C H DiagTree Representation §To formulate the optimization problem of building DiagTree, we represent DiagTree as a set of binary indicator variables: 11/9/17 12 𝑋 = 𝑥8,U,V ∀𝑚 = 1, … , 𝑀 (all tests) ∀𝑢 = 0, … , 2M Z/M + (all internal nodes) ∀𝑗 = 1, … , 𝑞 (all positions) 𝑥8,U,V = _ 1 0 if test tm is assigned to position j on node u otherwise 𝑥=,`,+ = 1 𝑥>,`,, = 1 𝑥a,`,- = 1 𝑥+,b,+ = 1 𝑥-,b,, = 1 𝑥;,b,- = 1 t7 t8 𝜙 t1 t3 t6 𝑢 = 5 𝑢 = 0 ß Position 3Position 1 à 𝑢 = 1 𝑢 = 8
  13. 13. P O S T E C H Objective Function for Optimal DiagTree §We derive a linear objective function w.r.t. 𝑥8,U,V for integer programming 11/9/17 13 Likelihood at leaf nodes Total number of not pruned position ⟺
  14. 14. P O S T E C H Constraints 11/9/17 14 2. No reuse 3. No not-covered diseases 1. Each internal node has q tests 4. Only one test for each position 5. Max target disease posterior probability log 𝑝(𝑦∗|𝑢Bg) 6. Child nodes from negative results of the dummy test always have dummy tests
  15. 15. P O S T E C H Branch-and-Bound (BNB) §Mixed integer linear programming problem & non-deterministic exponential time complexity à Branch-and-Bound (a non-heuristic method to find the local optimal) §We first solve relaxed linear programming problem without binary constraints and restore the relaxed variables to binary variables by iteratively selecting the most promising variable. 11/9/17 15
  16. 16. P O S T E C H Experiment Setting §Dataset: Immunohistochemistry (IHC) profiles -diagnoses abnormal cells found in cancer tumors -Transition probabilities between 115 antibodies (tests) and detect 104 antigens (diseases) 11/9/17 16 𝑝 𝑡8 = 1 𝑦9) Antibodies (tests) CD2 CD4 … CD16 Antigens (diseases) T-cell prolymphocytic leukemia 0.95 0.75 T-cell large granular lymphocytic leukemia 0.95 0.95 … … Aggressive NK cell leukemia 0.00 0.75
  17. 17. P O S T E C H Experiment Setting – Algorithm evaluation §Six IHC sub-profiles with diseases of interest §Prior probability from [3] §Baseline: Random, IG (Greedy information gain), LG (Greedy likelihood gain) §Measure: Averaged log likelihood in leaf nodes 11/9/17 17 [3] S. O. Yoon, C. Suh, D. H. Lee, H.-S. Chi, C. J. Park, S.-S. Jang, H.-R. Shin, B.-H. Park, and J. Huh. Distribution of lymphoid neoplasms in the republic of korea: analysis of 5318 cases according to the world health organization classification. American journal of hematology, 2010.
  18. 18. P O S T E C H Experiment Results – Algorithm evaluation 11/9/17 18 DiagTree optimized using BNB produced high confidence for most (Q, q)
  19. 19. P O S T E C H Experiment Setting – Comparison with clinicians 11/9/17 19 §Baseline: Clinicians’ rules extracted from pathologic report in -UC San Diego Medical Center (2,029 patients 2009 – 2014) w/o test results -Yeouido St. Mary Hospital, Korea (131 patients 2009 – 2015) §Measure: Accuracy, Cost (# stages, # tests used) Test1 Test2 … Test N Disease 0 1 y1 1 1 y1 0 1 0 1 0 1 y3
  20. 20. P O S T E C H Diagnosis Rule from DiagTree §Example of DiagTree to diagnose the five diseases in IHC4: - 9823/3: Chronic lymphocytic leukemia/small lymphocytic lymphoma - 9673/3: Mantle cell lymphoma - 9680/3 CD5+: Diffuse Large B-cell Lymphoma (DLBCL) CD5+ - 9699/3: Extranodal marginal zone lymphoma of mucos associated lymphoid tissue (MALT lymphoma) - 9695/3: Follicular lymphoma 11/9/17 20 Q=6, q=3 TCL-1, CD10, FoxP1 Bcl6, MUM1
  21. 21. P O S T E C H Diagnosis Rule from DiagTree 11/9/17 21 Q=6, q=6 Q=6, q=1 MUM1, FMC7, lgG, TCL1, CD43, lgD
  22. 22. P O S T E C H Accuracy DiagTree detected the diseases with high accuracy (ACC), high true positive rate (TPR) and low false positive rate (FPR). 11/9/17 22
  23. 23. P O S T E C H Cost (#test, #stages) Compared to clinician’s rules, DiagTree produced lower costs from most diseases 11/9/17 23
  24. 24. P O S T E C H Conclusions -Minimizes the cost for decision process -Allows multiples tests at the same time -Can support clinicians by suggesting automated and cost-effective diagnosis process 11/9/17 24 We propose DiagTree, a new decision process framework for differential diagnosis
  25. 25. Do you have any questions or comments?
  26. 26. Appendix 11/9/17 26
  27. 27. P O S T E C H Partially Observed Markov Decision Process (POMDP) Markov Models Do we have control over the state transitions? No Yes Are the states completely observable? Yes Markov Chain MDP Markov Decision Process No HMM Hidden Markov Model POMDP Partially Observed Markov Decision Process 27Retrieved from https://www.cs.cmu.edu/~ggordon/780-fall07/lectures/POMDP_lecture.pdf11/9/17
  28. 28. P O S T E C H DiagTree Structure (Q,q) = (6,1) 11/9/17 28 Assume the max. number (Q) of tests = 6 the max. number (q) of tests in each stage = 1 Detected diseases at leaf nodes q = 1 test 21-way child node for 21 results # stage = 5 t1 t2t3 0 1 2 0 1 t4 t6 t5 t7 0 1 0 1 0 1 0 1 0 1 y3 y2 3 4 5 y1 t30 1 … … … … … Early stopping & detect diseasesy3 0 1 𝜙 𝜙 𝜙 0 1 0 1 0 1 y3 … … …
  29. 29. P O S T E C H DiagTree Structure (Q,q) = (6,6) 11/9/17 29 t1 t3 t6 t7 t8 t9 Assume the max. number (Q) of tests = 6 the max. number (q) of tests in each stage = 6 0,0,0,0,0,0 q = 6 tests 26-way child node for 26 results … 0 … 1 1,1,1,1,1,1 y3y1 y2y2 y2y2 y3 # stages = 1
  30. 30. P O S T E C H DiagTree Structure (Q,q) = (6,6) 11/9/17 30 Assume the max. number (Q) of tests = 6 the max. number (q) of tests in each stage = 6 t1 t3 t6 t7 0,0,0,0 24-way child node for 24 results … 0 … 1 1,1,1,1 y1 y2y2 y2 y3 # stage = 1 t1 t3 t6 t7 𝜙 𝜙 0,0,0,0,1,1 1,1,1,1,1,1 y1 y2y2 y2 y3 0,0,0,0,0,0 1,1,0,1,1,1 1,1,0,1 24+2-way child node for 24+2 results

×