Output fica.beamer.43

355 views
265 views

Published on

slides for iwsc2012 , filtering clones for individual user based on machine learning analysis

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
355
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Output fica.beamer.43

  1. 1. . Filtering Clones for Individual User Based on . Machine Learning Analysis Jiachen Yang, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University June 4, 2012 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 1 / 14
  2. 2. Motivating Example Participants of survey Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  3. 3. Motivating Example Participants of survey 1 2 3 4 5 6 7 8 Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  4. 4. Motivating Example Participants of survey Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  5. 5. Interesting U:0 vs I:81542 static . har *. c . 126 c . har *. .1543 . istory_substring ( string , start , end). h . 127 . ubstring ( string , start , end). s .1544 . const char *string;. . 128 . const char *string;. .1545 . int start , end;.. 129 . int start , end;. .1546 . . { 130 . . {1547 . register int len ;. . 131 . register int len ;. .1548 . register char *result ;. . 132 . register char *result ;. .1549 . len = end − start;. . 133 . len = end − start;. .1550 . result = (char *)xmalloc (len + 1);. . 134 . result = (char *)xmalloc (len + 1);. .1551 . strncpy ( result , string + start, len);. . 135 . strncpy ( result , string + start, len);. .1552 . result [ len ] = 0;. . 136 . result [ len ] = 0;. .1553 . return result ;. . 137 . return ( result );..1554 . . } 138 . . } (a) lib/readline/histexpand.c (b) stringlib.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 3 / 14
  6. 6. Un-Interesting U:8 vs I:0191 ... __P((char *, arrayind_t, . har *));. 309 static c . int run_one_command __P((. har *));. c .192 .static intmax_t subexpr __P((char *));. 310 .static . int run_wordexp __P((char *));. .193 .static intmax_t expcomma __P((void));.311 .static . int uidget __P((void));..194 .static intmax_t expassign __P((void));. 312 .static . void init_interactive __P((void));. .195 .static intmax_t expcond __P((void));. 313 .static . void init_noninteractive __P((void));..196 .static intmax_t explor __P((void));. . 314 .static void init_interactive_script __P((void));..197 .static intmax_t expland __P((void. ); ) . 315 .static void set_shell_name __P((char. *)); . (a) expr.c (b) shell.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 4 / 14
  7. 7. Disagreed U:4 vs I:4710 static int711 displen (s) 1098 else712 const char *s; 1099 {713 { 1100 if ( wcharlist == 0)714 wchar_t *wcstr; 1101 {715 size_t wclen, slen ; 1102 size_t len. . ;716 wcstr = 0..; 1103 . len = mbstowcs (wcharlist, charlist , 0);. .717 . len = mbstowcs (wcstr, s, 0);. s . 1104 . if (len == −1). .718 .if (slen == −1). . 1105 . len = 0;..719 . slen = 0;. . 1106 . wcharlist = (wchar_t *)xmalloc (sizeof .... .720 w . cstr = (wchar_t *)xmalloc (sizeof .... . 1107 . mbstowcs (wcharlist, charlist , len + 1);..721 m . bstowcs (wcstr, s, slen + 1);. . 1108 }722 wclen = wcswidth (wcstr, slen); 1109 if (wcschr (wcharlist , wc))723 free (wcstr); 1110 break;724 return (( int)wclen); 1111 }725 } (b) subst.c (a) execute_cmd.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 5 / 14
  8. 8. Fica — the name Filter for Individual user on code Clone Analysis . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 6 / 14
  9. 9. Fica — the website Figure: Snapshot of Fica . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 7 / 14
  10. 10. ... ... ........ ........ ........ ....... . . .... .
  11. 11. ... ... ........ ........ ........ ....... . . .... .
  12. 12. ... ... ........ ........ ........ ....... . . .... .
  13. 13. ... ... ........ ........ ........ ....... . . .... .
  14. 14. Compare Code Clone SimilarityPi = possibility to be interestingPu = possibility to be un-interesting Len Pi Pi /Pu Pu Comp 50 5.56% 1.18 4.72% O 87 2.89% 1.11 2.59% O 79 1.97% 0.69 2.87% X 63 3.55% 0.64 5.57% O 77 2.66% 0.46 5.83% X . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 11 / 14
  15. 15. Good Experiment ResultAll training 44 Matched 32 un-interesting 1All evaluation 34 Accuracy 94.12% interesting 1 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 12 / 14
  16. 16. Bad Experiment ResultAll training 47 Matched 14 un-interesting 16All evaluation 31 Accuracy 45.16% interesting 1 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 13 / 14
  17. 17. Open Question How to improve accuracy? By combining metrics like McCabe Cyclomatic Complexity? Thank you! . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 14 / 14
  18. 18. Unmatched: User un-interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 15 / 14
  19. 19. Unmatched: User interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 16 / 14
  20. 20. Overall Workflow . Submits source code 1 . 2 Detects clones . 3 Mark clones as “interesting” or not . Records marked clones into 4 database . 5 Studies characteristics of marks using machine learning Figure: Overall Workflow algorithms of Fica with CDT . 6 Ranks unmarked clones based on machine learning . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 17 / 14
  21. 21. Calc Similarity of Clones |t : t ∈ d| tf(t, d) = (1) |d| |D| idf(t, D) = log (2) 1 + |d ∈ D : t ∈ d| tfidf(t, d, D) = tf(t, d) × idf(t, D) (3) −− −→ −−− tfidf(d, D) = [tfidf(t, d, D) ∀t ∈ d] (4) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 18 / 14
  22. 22. Predicting Category −− −→ −− −→ −−− −−− sim(a, b, D) = tfidf(a, D) · tfidf(b, D) (5) { 0 , sim(a, b, D) = 0 nsim(a, b, D) = sim(a,b,D) (6) |sim(a,b,D)| , otherwise { ∑ 1 , |M| = 0 poss(t, M) = ∀m∈M nsim(t,m,M) (7) |M| , otherwise . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 19 / 14
  23. 23. Result — bash A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 20 / 14
  24. 24. Result — git A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 21 / 14
  25. 25. Result — xz A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 22 / 14
  26. 26. Result — e2fsprogs A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 23 / 14
  27. 27. Result — All Projects A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 24 / 14

×