FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS Nikolay Zagoruiko Irina Borisova, Vladimir Dyubanov, Olga Kytnen...
<ul><li>Data Analysis, Pattern Recognition, Empirical Prediction, Discovering of Regularities,  Data Mining , Machine Lear...
  Specificity of Data Mining tasks: <ul><li>Polytypic attributes </li></ul><ul><li>Number of attributes>> number of object...
<ul><li>Situation in Data Mining </li></ul><ul><li>Thousands of algorithms </li></ul><ul><li>Reasons:  Types of scales, de...
Measures of Similarity
Similarity is not absolute,  but a  relative  category Is an object  b   similar to  a  or it  is not similar? Whether the...
Measure   F ( z , a | b )  of similarity of the object   z   to  object   a   in competition with object   b <ul><li>Local...
F unction of Concurrent ( Ri val)  S imilarity   ( FRiS ) r1  r2 -1 z A +1 B d2 F A B z r1  r2
Methods of DM, using FRiS - function,  allows to improve a old algorithms  and to solve a some new tasks: <ul><li>Quantita...
All pattern recognition methods are based on hypothesis of compactness   Braverman E.M. , 1962 The patterns are compact if...
Compactness For high compactness it is need: Maximum  of the similarity between objects  of one pattern  Minimum  of the s...
Maximal similarity   between objects  of the same pattern Compact patterns should satisfy  to condition of the Max in Comp...
Min out   Compactness Maximal difference   of these objects with the objects of other patterns   Compact patterns should s...
Algorithm   FRiS-Stolp for selection of the standards (“stolps”) Decision rules
Decision rules
Recognition
k=K
k=K+2
k=K+11
k=K+29
Censoring of the training set Censoring
Censoring of the training set Censoring
Censoring of the training set Censoring
Censoring of the training set H P =argmax |r|(H,P) = 1,2,…7 Censoring = 4 or 5 1.0.8689   -90(90)-20 2.0.8902   -90(90)-20...
Informativeness by Fisher for normal distribution Compactness has the same sense and can be used as  a  criteria of inform...
Comparison of the criteria     (CV  -  FRiS) <ul><li>Order of attributes   by informativeness   </li></ul><ul><li>... . .....
Algorithm GRAD <ul><li>It based on combination of two  greedy approaches :   forward   and   backward   searches . </li></...
Algorithm AdDel <ul><li>To easing influence of collecting errors a relaxation method it is applied. </li></ul><ul><li>n1  ...
Algorithm GRAD <ul><li>AdDel can work with not single attributes only, but also with groups of attributes ( granules ) of ...
Algorithm GRAD (Granulated AdDel) <ul><li>1. Independent testing N attributes  </li></ul><ul><li>Selection m1<<N  first be...
Value of FRiS for points on a plane
  Classification   (Algorithm   FRiS-Class) FRiS-Cluster  divides a objects on clusters FRiS-Tax  unites   a clusters to c...
 
Примеры таксономии алгоритмом  FRiS-Class
Comparison the   FRiS-Class  with other   algorithms of taxonomy K
Taxonomic Decision Rule
Taxonomic Decision Rule
Taxonomic Decision Rule
Universal classification <ul><li>Labeled  Semilabeled  Unlabeled   </li></ul><ul><li>(Pattern Rec)  ( ТРФ )  (Clastering) ...
Universal classification <ul><li>Unlabeled  Semilabeled   Labeled </li></ul><ul><li>(Clastering)  (Pattern Rec) </li></ul>...
Some real tasks DM <ul><li>Task  K  M  N </li></ul><ul><li>Medicine: </li></ul><ul><li>Diagnostics of Diabetes II type  3 ...
Data Mining Cup 2009 http:www.prudsys.deServiceDownloadsbin Prognosis of data at absolure scale To predict 19344  cells 1…...
DMC 2009   618   teams   from   164   Universities of  42   countries  participated     231  have sent decisions,  49 were...
Comparison with  10  methods of feature selection <ul><li>Jeffery I.,Higgins D.,Culhane A.  Comparison and evaluation of m...
Methods of selection <ul><li>Methods  Results  </li></ul><ul><li>Significance analysis of microarrays (SAM)   42 </li></ul...
Results on tasks  <ul><li>Задача  N0   m1/m2   max   of 4   GRAD </li></ul><ul><li>ALL 1  12625  95/33  100.0   100.0 </li...
Recognition of two types of Leukemia  - ALL and  AML <ul><li>ALL  AML </li></ul><ul><li>Training set  38  27  11   N   =  ...
<ul><li>Training set 38  Test set 34 </li></ul><ul><li>N  g  V suc   V ext   V med  T suc   T ext  T med  P </li></ul><ul>...
Projection a training set on  2641 и 4049  features AML ALL
Diabetes of II type   Ordering of patients     M=43  17+8+18 , N=5520   <ul><li>Average similarity F av  of patients to he...
Methods of DM, using FRiS - function,  allows to improve a old algorithms  and to solve a some new tasks: <ul><li>Quantita...
Unsettled problems <ul><li>Stolp+corridor (FRiS+LDR) </li></ul><ul><li>Imputation of polytypical tables </li></ul><ul><li>...
Conclusion <ul><li>FRiS-function : </li></ul><ul><li>1.Provides  effective measure  of similarity,  informativeness and co...
Thank you! <ul><li>Questions, please ? </li></ul>
Upcoming SlideShare
Loading in...5
×

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS


482

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
482
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS


  1. 1. FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS Nikolay Zagoruiko Irina Borisova, Vladimir Dyubanov, Olga Kytnenko Institute of Mathematics of the Siberian Devision of the Russian Academy of Sciences, Pr. Koptyg 4, 630090 Novosibirsk, Russia, <ul><li>[email_address] </li></ul>
  2. 2. <ul><li>Data Analysis, Pattern Recognition, Empirical Prediction, Discovering of Regularities, Data Mining , Machine Learning, Knowledge Discovering, Intelligence Data Analysis, Cognitive Calculations </li></ul><ul><li>The special attention involves ability of the person </li></ul><ul><li>- To estimate similarities and distinctions between objects, </li></ul><ul><li>- To make classification of objects, </li></ul><ul><li>- To recognize a belonging of new objects to available classes, </li></ul><ul><li>To discover natural dependences between characteristics and </li></ul><ul><li>To use these dependences (knowledge) for forecasting </li></ul>
  3. 3. Specificity of Data Mining tasks: <ul><li>Polytypic attributes </li></ul><ul><li>Number of attributes>> number of objects </li></ul><ul><li>Presence of noise, spikes and blanks </li></ul><ul><li>Absence of the information on distributions </li></ul>
  4. 4. <ul><li>Situation in Data Mining </li></ul><ul><li>Thousands of algorithms </li></ul><ul><li>Reasons: Types of scales, dependences of features, lows of distribution, linear-nonlinear decision rules, small or big training set, </li></ul><ul><li>How to make algorithms, which will be invariant to this features? </li></ul><ul><li>Which function is common for all DM algorithms? </li></ul><ul><li>Basic function, used by the person at the clustering, recognition, feature selection etc., is function of estimation of similarity between objects . </li></ul>
  5. 5. Measures of Similarity
  6. 6. Similarity is not absolute, but a relative category Is an object b similar to a or it is not similar? Whether the objects b and a belong to one class? a b a b c a b c We should know the answer on question: In competition with what?
  7. 7. Measure F ( z , a | b ) of similarity of the object z to object a in competition with object b <ul><li>Locality : F is depend on distances (z,a) and (z,b) only. </li></ul><ul><li>Normality: If z=a, F ( z , a | b ) =+1. If z=b, F ( z , a | b ) =-1. </li></ul><ul><li>If (z,a)=(z,b), F ( z , a | b ) =F ( z , b | a ) = 0. </li></ul><ul><li>Invariance to moving and rotation of coordinates. </li></ul><ul><li>Antysimmetricity: F ( z , a | b ) = -F ( z , b | a ) </li></ul><ul><li>====================================== </li></ul><ul><li>Simmetry: F ( z , a | b ) =F ( z , b | a ) </li></ul><ul><li>Thriangularity : F ( z,a | b ) +F ( a , b | z ) ≥ F(b,z|a) </li></ul><ul><li>====================================== </li></ul><ul><li>Competitive Space </li></ul>
  8. 8. F unction of Concurrent ( Ri val) S imilarity ( FRiS ) r1 r2 -1 z A +1 B d2 F A B z r1 r2
  9. 9. Methods of DM, using FRiS - function, allows to improve a old algorithms and to solve a some new tasks: <ul><li>Quantitative estimation of compactness </li></ul><ul><li>Choice of informative attributes </li></ul><ul><li>Construction of decision rules </li></ul><ul><li>Censoring of the training set </li></ul><ul><li>Generalized classification </li></ul><ul><li>Filling of blanks (inputation) </li></ul><ul><li>Forecasting </li></ul><ul><li>Ordering of objects </li></ul>
  10. 10. All pattern recognition methods are based on hypothesis of compactness Braverman E.M. , 1962 The patterns are compact if -the number of boundary points is not enough in comparison with common number; - compact patterns are separated from each other refer to not too elaborate borders. Compactness
  11. 11. Compactness For high compactness it is need: Maximum of the similarity between objects of one pattern Minimum of the similarity between objects of different patterns
  12. 12. Maximal similarity between objects of the same pattern Compact patterns should satisfy to condition of the Max in Compactness
  13. 13. Min out Compactness Maximal difference of these objects with the objects of other patterns Compact patterns should satisfy to the condition
  14. 14. Algorithm FRiS-Stolp for selection of the standards (“stolps”) Decision rules
  15. 15. Decision rules
  16. 16. Recognition
  17. 17. k=K
  18. 18. k=K+2
  19. 19. k=K+11
  20. 20. k=K+29
  21. 21. Censoring of the training set Censoring
  22. 22. Censoring of the training set Censoring
  23. 23. Censoring of the training set Censoring
  24. 24. Censoring of the training set H P =argmax |r|(H,P) = 1,2,…7 Censoring = 4 or 5 1.0.8689 -90(90)-20 2.0.8902 -90(90)-20 3.0.9084 -90(90)-20 4.0.9167 -90(90)-20 5.0.8903 - 90(90)-20 6.0.7309 -88(90)-9 7.0.2324 -86(90)-7
  25. 25. Informativeness by Fisher for normal distribution Compactness has the same sense and can be used as a criteria of informativeness, which is invariant to low of distribution and to relation of N:M Results of comparative researches have shown appreciable advantage of this criterion in comparison with number of errors at the Cross-Validation Criteria of informativeness
  26. 26. Comparison of the criteria (CV - FRiS) <ul><li>Order of attributes by informativeness </li></ul><ul><li>... . ... ... . ... C = 0,661 </li></ul><ul><li>... . ... ... . ... C = 0,883 </li></ul>noise N =100 M =2*100 m t =2*35 m C =2*65 +noise noise Criteria
  27. 27. Algorithm GRAD <ul><li>It based on combination of two greedy approaches : forward and backward searches . </li></ul><ul><li>At a stage forward algorithm Addition </li></ul><ul><li>is used </li></ul><ul><li>At a stage backward algorithm Deletion is used </li></ul>GRAD
  28. 28. Algorithm AdDel <ul><li>To easing influence of collecting errors a relaxation method it is applied. </li></ul><ul><li>n1 - number of most informative attributes, add-on to subsystem ( Addition ), </li></ul><ul><li>n2<n1 - number of less informative attributes, eliminated from subsystem ( Deletion ). </li></ul><ul><li>AdDel Relaxation method: n steps forward - n/2 steps back </li></ul><ul><li>Algorithm AdDel. Reliability (R) of recognition at </li></ul><ul><li>different dimension space. </li></ul>R (AdDel) > R (DelAd) > R (Ad) > R (Del) GRAD
  29. 29. Algorithm GRAD <ul><li>AdDel can work with not single attributes only, but also with groups of attributes ( granules ) of different capacity m=1,2,3,…: , , ,… </li></ul><ul><li>The granules can be formed by the exhaustive search method. </li></ul><ul><li>But: Problem of combinatory explosion! </li></ul>Decision : orientation on individual informativeness of attributes Dependence of frequency f hits in an informative subsystem from serial number L on individual informativeness It allows to granulate a most informative part attributes only GRAD L f
  30. 30. Algorithm GRAD (Granulated AdDel) <ul><li>1. Independent testing N attributes </li></ul><ul><li>Selection m1<<N first best (m1 granules power 1) </li></ul><ul><li>2. Forming combinations </li></ul><ul><li>Selection m2<< first best (m2 granules power 2) </li></ul><ul><li>3. Forming combinations </li></ul><ul><li>Selection m3<< first best (m3 granules power 3) </li></ul><ul><li>M =< m1,m2,m3 > - set of secondary attributes ( granules) </li></ul><ul><li>AdDel(M) selects m*<<|M| best granules, which included n* attributes </li></ul>GRAD
  31. 31. Value of FRiS for points on a plane
  32. 32. Classification (Algorithm FRiS-Class) FRiS-Cluster divides a objects on clusters FRiS-Tax unites a clusters to classes ( taxons ) Using FRiS-function allows: - To make a taxons of any form ; - To search a optimal number of taksons. r 1 r 2 * r 1 r 2 *
  33. 34. Примеры таксономии алгоритмом FRiS-Class
  34. 35. Comparison the FRiS-Class with other algorithms of taxonomy K
  35. 36. Taxonomic Decision Rule
  36. 37. Taxonomic Decision Rule
  37. 38. Taxonomic Decision Rule
  38. 39. Universal classification <ul><li>Labeled Semilabeled Unlabeled </li></ul><ul><li>(Pattern Rec) ( ТРФ ) (Clastering) </li></ul>
  39. 40. Universal classification <ul><li>Unlabeled Semilabeled Labeled </li></ul><ul><li>(Clastering) (Pattern Rec) </li></ul><ul><li>================================= </li></ul><ul><li>FRiS-TDR </li></ul>
  40. 41. Some real tasks DM <ul><li>Task K M N </li></ul><ul><li>Medicine: </li></ul><ul><li>Diagnostics of Diabetes II type 3 43 5520 </li></ul><ul><li>Diagnostics of Prostate Cancer 4 322 17153 </li></ul><ul><li>Recognition of type of Leukemia 2 38 7129 </li></ul><ul><li>Physics: </li></ul><ul><li>Complex analysis of spectra 7 20-400 1024 </li></ul><ul><li>Commerse : </li></ul><ul><li>Forecasting of book sealing </li></ul><ul><li>(Data Mining Cup 2009) - 4812 1862 </li></ul>
  41. 42. Data Mining Cup 2009 http:www.prudsys.deServiceDownloadsbin Prognosis of data at absolure scale To predict 19344 cells 1…………………………………………1856 1…8 T R A I N I N G 1 . . . 84% = 0 . . A = 0 - 2300 . 2394 C O N T R O L 1 . . . . . . . 2418
  42. 43. DMC 2009 618 teams from 164 Universities of 42 countries participated 231 have sent decisions, 49 were selected for rating NN Teams Errors NN Teams Errors 1 Uni Karlsruhe TH_ II 17260 16 TU Graz 23626 2 TU Dortmund 17912   18 Uni Weimar_I 23796 3 TU Dresden 18163   19 Zhejiang University of Sc. and Tech 23952 4 Novosibirsk State University 18353   20 University Laval 24884 5 Uni Karlsruhe TH_ I 18763   24 University of Southampton 25694 6 FH Brandenburg_I 19814   25 Telkom Institute of Technology 25829 7 FH Brandenburg_II 20140   26 University of Central Florida 26254 8 Hochschule Anhalt 20767   32 Indian Institute of Technology 28517 9 Uni Hamburg_ 21064   34 Anna University Coimbatore 28670 10 KTH Royal Institute of Technology 21195   38 Technical University of Kosice 32841 11 RWTH Aachen_I 21780   39 Uiversity of Edinburgh 45096 14 Budapest University of Technology 23277   48 Warsaw School of Economics 77551 15 Isfahan University of Technology 23488   49 FH Hannover 1938612
  43. 44. Comparison with 10 methods of feature selection <ul><li>Jeffery I.,Higgins D.,Culhane A. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. // </li></ul><ul><li>http://www.biomedcentral.com/1471-2105/7/359 </li></ul>9 tasks on microarray data. 10 methods the feature selection . Independent attributes . Selection of n first (best) . Criteria – min of errors on CV: 10 time by 50%. 4 decision rules: Support Vector Machine ( SVM ), Between Group Analysis ( BGA ), Naive Bayes Classification ( NBC ), K - Nearest Neighbors ( KNN ). 40 decision of each of 9 tasks
  44. 45. Methods of selection <ul><li>Methods Results </li></ul><ul><li>Significance analysis of microarrays (SAM) 42 </li></ul><ul><li>Analysis of variance (ANOVA) 43 </li></ul><ul><li>Empirical Bayes t-statistic 32 </li></ul><ul><li>Template matching 38 </li></ul><ul><li>maxT 37 </li></ul><ul><li>Between group analysis (BGA) 43 </li></ul><ul><li>Area under the receiver operating characteristic curve (ROC) 37 </li></ul><ul><li>Welch t-statistic 39 </li></ul><ul><li>Fold change 47 </li></ul><ul><li>Rank products 42 </li></ul><ul><li>FRiS-GRAD 12 </li></ul><ul><li>Empirical Bayes t-statistic – for middle set of objects </li></ul><ul><li>Area under a ROC curve – for small noise and large set </li></ul><ul><li>Rank products – for large noise and small set </li></ul>
  45. 46. Results on tasks <ul><li>Задача N0 m1/m2 max of 4 GRAD </li></ul><ul><li>ALL 1 12625 95/33 100.0 100.0 </li></ul><ul><li>ALL2 12625 24/101 78.2 80.8 </li></ul><ul><li>ALL3 12625 65/35 59.1 73.8 </li></ul><ul><li>ALL4 12625 26/67 82.1 83,9 </li></ul><ul><li>Prostate 12625 50/53 90.2 93.1 </li></ul><ul><li>Myeloma 12625 36/137 8 2 . 9 8 1 . 4 </li></ul><ul><li>ALL/AML 7129 47/25 95.9 100.0 </li></ul><ul><li>DLBCL 7129 58/19 94.3 89.8 </li></ul><ul><li>Colon 2000 22/40 88.6 89.5 </li></ul>
  46. 47. Recognition of two types of Leukemia - ALL and AML <ul><li>ALL AML </li></ul><ul><li>Training set 38 27 11 N = 7129 </li></ul><ul><li>Control set 34 20 14 </li></ul><ul><li>I . Guyon, J . Weston, S . Barnhill, V . Vapnik </li></ul><ul><li>Gene Selection for Cancer Classification using Support Vector Machines. </li></ul><ul><li>Machine Learning. 2002, 46 1-3: pp. 389-422. </li></ul>
  47. 48. <ul><li>Training set 38 Test set 34 </li></ul><ul><li>N g V suc V ext V med T suc T ext T med P </li></ul><ul><li>7129 0,95 0,01 0,42 0,85 -0,05 0,42 29 </li></ul><ul><li>4096 0,82 -0,67 0,30 0,71 -0,77 0,34 24 </li></ul><ul><li>2048 0,97 0,00 0,51 0,85 -0,21 0,41 29 </li></ul><ul><li>1024 1,00 0,41 0,66 0,94 -0,02 0,47 32 </li></ul><ul><li>512 0,97 0,20 0,79 0,88 0,01 0,51 30 </li></ul><ul><li>256 1,00 0,59 0,79 0,94 0,07 0,62 32 </li></ul><ul><li>128 1,00 0,56 0,80 0,97 -0,03 0,46 33 </li></ul><ul><li>64 1,00 0,45 0,76 0,94 0,11 0,51 32 </li></ul><ul><li>32 1,00 0,45 0,65 0,97 0,00 0,39 33 </li></ul><ul><li>1,00 0,25 0,66 1,00 0,03 0,38 34 </li></ul><ul><li>8 1,00 0,21 0,66 1,00 0,05 0,49 34 </li></ul><ul><li>4 0,97 0,01 0,49 0,91 -0,08 0,45 31 </li></ul><ul><li>2 0,97 -0,02 0,42 0,88 -0,23 0,44 30 </li></ul><ul><li>1 0,92 -0,19 0,45 0,79 -0,27 0,23 27 </li></ul>Pentium T=3 hours F RiS Decision Rules P 0,72656 537/1 , 1833/1 , 2641/2 , 4049/2 34 0,71373 1454/1 , 2641/1 , 4049/1 34 0,71208 2641/1 , 3264/1 , 4049/1 34 0,71077 435/1 , 2641/2 , 4049/2 , 6800/1 34 0,70993 2266/1 , 2641/2 , 4049/2 34 0,70973 2266/1 , 2641/2 , 2724/1 , 4049/2 34 0,70711 2266/1 , 2641/2 , 3264/1 , 4049/2 34 0,70574 2641/2 , 3264/1 , 4049/2 , 4446/1 34 0,70532 435/1 , 2641/2 , 2895/1 , 4049/2 34 0,70243 2641/2 , 2724/1 , 3862/1 , 4049/2 34 Name of gene Weight 2641/1 , 4049/1 33 2641/1 32 В 27 первых подпространствах P =34/34 Pentium T=1 5 sec I . Guyon, J . Weston, S . Barnhill, V . Vapnik Zagoruiko N., Borisova I., Dyubanov V., Kutnenko O. Best features SVM FRiS FRE 803,4846 30(88%) 33(97%) 4846 27(79%) 30(88%)
  48. 49. Projection a training set on 2641 и 4049 features AML ALL
  49. 50. Diabetes of II type Ordering of patients M=43 17+8+18 , N=5520 <ul><li>Average similarity F av of patients to healthy people </li></ul>Healthy Patients Group of risk The group of risk did not participate in training It is useful for early diagnostics of diseases and for monitoring process of treatment F=+1 F=-1
  50. 51. Methods of DM, using FRiS - function, allows to improve a old algorithms and to solve a some new tasks: <ul><li>Quantitative estimation of compactness </li></ul><ul><li>Choice of informative attributes </li></ul><ul><li>Construction of decision rules </li></ul><ul><li>Censoring of the training set </li></ul><ul><li>Generalized classification </li></ul><ul><li>Filling of blanks (inputation) </li></ul><ul><li>Forecasting </li></ul><ul><li>Ordering of objects </li></ul>
  51. 52. Unsettled problems <ul><li>Stolp+corridor (FRiS+LDR) </li></ul><ul><li>Imputation of polytypical tables </li></ul><ul><li>Unite of tasks of different types (UC+X) </li></ul><ul><li>Optimization of algorithms </li></ul><ul><li>Realization of program system (OTEX 2) </li></ul><ul><li>Applications (medicine, genetics,…) </li></ul><ul><li>… .. </li></ul>
  52. 53. Conclusion <ul><li>FRiS-function : </li></ul><ul><li>1.Provides effective measure of similarity, informativeness and compactness </li></ul><ul><li>2.Provides unification of methods and invariance to parameters of tasks,low of distribution, relation M:N </li></ul><ul><li>3.Provides high enough quality of decisions </li></ul><ul><li>Publications: </li></ul><ul><li>http://math.nsc.ru/~wwwzag </li></ul>
  53. 54. Thank you! <ul><li>Questions, please ? </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×