"빅" 데이터의 분석적 시각화

700
-1

Published on

2013년 11월 29일 한국보건정보통계학회 발표

Published in: Education, Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
700
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

"빅" 데이터의 분석적 시각화

  1. 1. 한국보건정보통계학회 추계학술발표회 2013 “빅” 데이터의 분석적 시각화 Analytic Data Visualization 許 明 會 2013.11.29 고려대학교 통계학과 stat420@korea.ac.kr 1 Health Info & Stat
  2. 2. Data Visualization - Descriptive vs Analytic ... - Small vs Big ... science technology art 2013.11.29 2 Health Info & Stat
  3. 3. Contents - Scatterplot - Biplot - Regression Biplot - Kernel PCA - SVM Biplot 2013.11.29 3 Health Info & Stat
  4. 4. Scatterplot: 산점도 - “Lego” for analytic data visualization - Reflecting the third variable quakes: 2013.11.29 longitude(=x), latitude(=y), depth(=z) 4 Health Info & Stat
  5. 5. Scatterplot: 산점도 - For the case of large  (≧  ), over-plotting can produce serious outcome. Skin Segmentation Data:  (red) vs.  (green)        2013.11.29 5 Health Info & Stat
  6. 6. Scatterplot: 산점도 - For the case of large  (≧  ), alpha channel can be utilized. Skin Segmentation Data:  (red) vs.  (green)        2013.11.29 6 Health Info & Stat
  7. 7. Scatterplot: 산점도 - lowess: A nonparametric regression for bivariate data cars data: distance vs. speed 2013.11.29 7 Health Info & Stat
  8. 8. Scatterplot: 산점도 - 3D Rotation for three variables Skin Segmentation Data:  (red),  (green),  (blue) - ggobi: 2013.11.29 3D Rotation for four or more variables 8 Health Info & Stat
  9. 9. Biplot of Observations and Variables, Gabriel (1971) - The biplot is a graph that shows  observations and  variables. Protein data (row: 25 nations, column: 9 protein sources) 2013.11.29 9 Health Info & Stat
  10. 10. Biplot of Observations and Variables, Gabriel (1971) - Idea: Linear projection Protein data: variable cereal 2013.11.29 10 Health Info & Stat
  11. 11. Regression Biplot, Huh and Lee (2013) - Regression biplot is a graph for  observations of   ⋯    , arranged by predicted  . - Assume that the model fit is determined by a function of linear combination of   ⋯    . For instance,    ⋯  ,       or log           ⋯    .   - Set the vertical dimension by the direction of regression coefficients       ⋮ , or      . ∥∥    - Set the horizontal dimension by the direction of principal axis of      ⋯   ,  where   denotes the orthogonal component generated from the projection of   on  . 2013.11.29 11 Health Info & Stat
  12. 12. Regression Biplot, Huh and Lee (2013) Example 1. Stack Loss Data (  ;   loss of ammonia,         ) 2013.11.29 12 Health Info & Stat
  13. 13. Regression Biplot, Huh and Lee (2013) Example 2. Magazine Data (  ;   Subscription (0,1),   ) 2013.11.29 13 Health Info & Stat
  14. 14. Kernel PCA, Scholkopf et al. (1998) - For  observations    ⋯    ( × ), consider the nonlinear mapping     ⋯    to a Hilbert space, in which                      . - Denoting            , Kernel PCA is obtained from eigen-decomposing              .       - Kernel PCA yields a plot of observations by projecting       ⋯       on      ′    where  2013.11.29  ′    ′  ,      ,   is an eigenvector of  .   14 Health Info & Stat
  15. 15. Kernel PCA Diagram (or Kernel Biplot), Huh (2013) - Aim: Representation of  variables in Kernel PC plot of observations. - Proposed Procedure:  1) For each    ⋯    , map         on the plane,    ⋯   , where    is a constant and     ⋯   ⋯    . Projection is given by   ′     ′    ′           ″      ′ ″     ″ ″′  .   ″    ″    ″  ″′       2) For each  , link the projection points of   and   2013.11.29 15 by an arrow. Health Info & Stat
  16. 16. Example 1. Arrow diagrams [  ] for kernel PCA of the iris data with rbf kernel,    2013.11.29 16 Health Info & Stat
  17. 17. Example 1. Arrow diagrams [  ] for kernel PCA of the iris data with rbf kernel,    2013.11.29 17 Health Info & Stat
  18. 18. Example 2. Arrow diagrams [  ] for kernel PCA of the spam data [      ] 2013.11.29 18 Health Info & Stat
  19. 19. SVM-Guided Biplot as an extension of Regression Biplot - Idea: Combine Linear/Logistic Regression Biplot and Kernel PCA. - Classification/Regression Part: Classified as SVM classifier   -1 or 1 for    ⋯   .               ,  where         ,    Vertical dimension is set to              2013.11.29    ≧ .     (      ,        ). 19 Health Info & Stat
  20. 20. SVM-Guided Biplot: Classification - Kernel PCA Part:                ∴   (          ′  ),  ′   ′     ⋯   .           ′                  ′   ′       ′   ′ ,   ′   ⋯   . Hence   →      (   ) or          .    Horizontal dimension is determined by eigen-decomposing  .  - Perturbation Scheme for Arrow Diagrams. Define      ,  ×  , where  represents a perturbation of which the magnitude is controlled by . Then, project   on the first (vertical) and the second (horizontal) dimension. 2013.11.29 20 Health Info & Stat
  21. 21. Example 1. Iris Data: Versicolor vs. Virginica [sigma=0.1, C=1,   ] 2013.11.29 21 Health Info & Stat
  22. 22. Importance of Variables (in the case of large ) - It is necessary to select a small number of variables in determining the first and second dimensions. - Measures of Importance (definition)  Length of Arrows 1) in vertical direction, 2) in horizontal direction. - Plot arrow diagrams for importance variables only. 2013.11.29 22 Health Info & Stat
  23. 23. Example 2. Spam Data [sigma=0.1, C=10,   ],    2013.11.29 23 Health Info & Stat
  24. 24. SVM-Guided Biplot: Regression - The same method can be applied to SVM regression. - Example 3. Aerobic Fitness [       ] for oxygen uptake (=  ) with RBF kernel ( =0.1, C=10,  =0.1,   ) 2013.11.29 24 Health Info & Stat
  25. 25. Concluding Remarks - Biplot method can be extended to be suited for linear regression or classification (logistic regression). - Biplot method can be extended to allow nonlinear mapping of observations and variables, by fully utilizing kernel trick. http://blog.naver.com/huh4200 금붕어 어항 (on the iPad) 2013.11.29 25 Health Info & Stat
  26. 26. References Gabriel, K.R. (1971). “The biplot display of matrices with the application to principal component analysis”. Biometrika, 58. 453-467. Huh, M.H. (2013). “Arrow diagrams for kernel principal component analysis”. Communications for Statistical Applications and Methods, 20. 175-184. Huh, M.H. (2013). “SVM-guided biplot of observations and variables”. Communications for Statistical Applications and Methods. (to appear) Huh, M.H. and Lee, Y.G. (2013). “Biplots of multivariate data guided by linear and/or logistic regression”. Communications for Statistical Applications and Methods, 20. 129-136. Scholkopf, B., Smola, A. and Muller, K.R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10. 1299–1319. 2013.11.29 26 Health Info & Stat
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×