“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring Systems”

272 views

Published on

Dr. Frank Hsu from Fordham University presented this to the Cognitive Systems Institute Group Speaker Series on December 17, 2015.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
272
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

“To Fuse or Not to Fuse: Cognitive Diversity for Combining Multiple Scoring Systems”

  1. 1. TO FUSE OR NOT TO FUSE: COGNITIVE DIVERSITY FOR COMBINING MULTIPLE SCORING SYSTEMS (MSS) Frank Hsu Fordham University  IBM Cognitive System Institute Group (CSIG), Dec. 17, 2015 1
  2. 2. 2 To rank a list of choices (subjects, objects, items, options, …) Genes, ligands, or DNA fragments in Biomedical Science Targets, documents, trajectories, or host names in Technology or Engineering Movies, books, apartments, skaters, or sports teams in Social Network or Social Choices Customers, vendors, corporate risks, or stocks in Business and Finance Customers, vendors, corporate risks, or stocks in Business and Finance Biomedical and Health STEM Areas Society and Social Choices Business and Finance Genes, ligands, or DNA fragments in Biomedical Science Targets, documents, trajectories, or host names in Technology or Engineering Movies, books, apartments, skaters, or sports teams in Social Network or Social Choices Labels and degree of stress in classification and affective computing respectively Customers, vendors, corporate risks, or stocks in Business and Finance
  3. 3. 3 Each choice (or option) has (or can be described by) a set of variables: Attributes, criteria, cues, features, indicators, judges, parameters, … Variables A, B, and C, D. C = SC(A, B) D = RC(A, B) Scoring Systems sA rA sB rB sC rC sD rD d1 d2 . . di . . dn A B C D * * * ** *
  4. 4. 4 Domain Examples: Active Search in Chemical Space Internet Search Strategy Figure Skating Judgment Crossing the street
  5. 5. 5 Combining Multiple Scoring Systems (MSS) to rank a group of skaters: J1 J2 J3 SC Final Rank d1 8.5 7 9.7 25.2 4 d2 7.6 8.4 9.6 25.6 3 d3 8.3 5.6 9.75 23.65 7 d4 6.4 7.4 9.81 21.61 8 d5 9.4 7.8 9.68 26.88 2 d6 9.5 8.5 9.2 27.2 1 d7 7.9 6.3 10 24.2 6 d8 10 10 5.1 25.1 5 J1 J2 J3 RC Final Rank d1 4 5 4 13 4.5 d2 7 3 6 16 7 d3 5 7 3 15 6 d4 8 8 2 18 8 d5 3 4 5 12 3 d6 2 2 7 11 2 d7 6 6 1 13 4.5 d8 1 1 8 10 1 (a) Scores and Score Combination (b) Ranks and Rank Combination
  6. 6. 6 Similarity between two scoring systems, d(A, B): (a) Data correlation (1885 - )  Pearson’s correlation coefficiency (P).  Spearman’s footrule (F).  Kendall’s rank correlation tau (T).  Spearman’s rank correlation rho (R). ■ RSC Functions fJ1, fJ2, fJ3 (b) Information Diversity ■ Cognitive Diversity d(A,B) between two Scoring systems A and B is based on the rank-score Characteristic (RSC) function of A and B (fA and fB). J1 J2 J3 1 1 1 1 2 0.86 0.75 0.97 3 0.71 0.63 0.93 4 0.57 0.5 0.9 5 0.43 0.38 0.86 6 0.28 0.25 0.83 7 0.14 0.13 0.8 8 0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 Score Rank J1 J2 J3 fJ1 fJ2 fJ3 fJ2 fJ1 fJ3
  7. 7. 7 Combinatorial Fusion Algorithm(CFA): D= set of classes, documents, genes, molecules with |D| =n. N= the set {1,2,….,n} R= a set of real numbers f(i)=(s ° r-1) (i) =s (r-1(i)) Ref: Hsu et al in Advanced Data Mining Technologies in Bioinformatics, Idea Group Inc. 2006. (a) Multiple Scoring Systems (MSS) Each scoring system has a score function sA, rank function rA, and the rank- score characteristic function (RSC) fA. (b) Diversity (or similarity) between two scoring systems A and B, d(A, B) can be defined using score functions, rank functions, or rank-score characteristic (RSC) functions: d(A, B) = d(sA, sB), or d(rA, rB), or d(fA, fB).
  8. 8. 8 Combining MSS for structure-based virtual screening: (I) Combining 2 to 5 scoring systems (by rank or by score) with performance comparisons Combinations of different methods improve the performances The combination of B and D works best on thymidine kinase (TK) Ref: Yang et al. Journal of Chemical Information and Modeling. 45, (2005). pp. 1134-1146. The Performance of Thymidine Kinase (TK) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0 200 400 600 800 1000 Rank Score GEMDOCK-Binding GEMDOCK-Pharma GOLD-GoldScore GOLD-Goldinter GOLD-ChemScore TK 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 E D C A B DE CE AE BE CD AD AC BC AB BD CDE ACE ABE ADE BCE BDE ACD ABD BCD ABC ACDE BCDE ABCE ABDE ABCD ABCDE Combinations AverageGHScore rank combination score combination TK
  9. 9. 9 Combining MSS for structure-based virtual screening: (II) Positive cases(o) vs negative cases (x) for 80 2-combinations in terms of performance ration (x-coordinate) and cognitive diversity ( y-coordinate)
  10. 10. 10 It was shown in the information retrieval domain that under certain conditions (one of these condition is higher cognitive diversity), rank combination can be better than score combination. Ref: Hsu, D.F., Taksa, I. Information Retrieval 8(3), pp. 449–480, 2005.
  11. 11. 11 Target Tracking with Three Features: We use three features: • Color – average normalized RGB color • Position – location of the target region centroid • Shape – area of the target region + Color Position Shape Ref: Lyons, D.M., Hsu, D.F. Information Fusion 10(2): pp. 124-136, 2009.
  12. 12. 12 Target Tracking Seq. RUN2 Score fusio n MSSD Avg . MSSD V ar. RUN3 Score and r ank fusion using groun d truth to se lect MSSD Avg . MSSD V ar. RUN4 Score and r ank fusion u sing rank-sc ore function to select MSSD Avg . MSSD Va r. 1 1537.22 694.47 1536.65 695.49 1536.9 694.24 2 816.53 8732.13 723.13 3512.19 723.09 3511.41 3 108.89 61.61 108.34 60.58 108.89 61.61 4 23.14 2.39 23.04 2.30 23.14 2.39 5 334.13 120.11 332.89 119.39 334.138 120.11 6 96.40 119.22 66.9 12.91 67.28 13.38 7 577.78 201.29 548.6 127.78 577.78 201.29 8 538.35 605.84 500.9 57.91 534.3 602.85 9 143.04 339.73 140.18 297.07 142.33 294.94 10 260.24 86.65 252.17 84.99 258.64 85.94 11 520.13 2991.17 440.98 2544.69 470.27 2791.62 12 1188.81 745.01 1188.81 745.01 1188.81 745.01 RUN4 is as good or better (highlighted in gray) than RUN2 in all cases RUN4 is, predictably, not always as good as RUN3 (‘best case’). Note: Lower MSSD implies better tracking performance.
  13. 13. 13 Cognitive Informatics: Combining Two Visual Perception Systems Ref: A Batallones et al; On the combination of two visual cognition systems using combinatorial fusion, Brain Informatics (2015), 2, p.21 - 32.
  14. 14. 14 Cognitive Diversity provides information diversity (complementary to and in contrast with the statistical data correlation): ■ In Similarity measurement between two scoring systems(or data distributions): ■ In Goodness of Fit between two models (or hypotheses): ■ In Cognitive Computing between two hypotheses (or scoring systems) in order to decide when and how To Fuse (or to combine) multiple scoring systems. Pearson, foot- rule, Kendall tau, Spearman rho. CDvs Chi-square test, Kolomogorov- Smirnov test. CDvs NLP, ML, DM, IR, ensemble, MADM SC, RC, majority voting, weighted SC, weighted RC, POSet, max, min, ave., … &
  15. 15. 15 Cognitive Systems that are capable of combining a group of diverse and good-performance scoring systems from a variety of sensors, sources, and software Can serve as a resilient engine and effective telescope For the new scientific discovery paradigm (integration vs. reduction) In the era of data-driven human-interactive knowledge discovery.  D. F. Hsu; IBM CSIG seminar , Dec. 17, 2015

×