SlideShare a Scribd company logo
1 of 27
Download to read offline
.
         Filtering Clones for
       Individual User Based on
 .     Machine Learning Analysis

           Jiachen Yang, Keisuke Hotta, Yoshiki Higo,
                 Hiroshi Igaki, Shinji Kusumoto
          Graduate School of Information Science and Technology, Osaka University


                                   June 4, 2012

                                                                .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                           ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)          Fica@IWSC2012                                        June 4, 2012                          1 / 14
Motivating Example
                              Participants of survey




              Clonesets
     Red: Un-interesting
      Blue: Interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          2 / 14
Motivating Example
                              Participants of survey
                                   1 2 3 4 5 6 7 8


              Clonesets
     Red: Un-interesting
      Blue: Interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          2 / 14
Motivating Example
                              Participants of survey




              Clonesets
     Red: Un-interesting
      Blue: Interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          2 / 14
Interesting U:0 vs I:8

1542 static . har *.
              c      .                                   126 c
                                                             . har *.
                                                                    .
1543 . istory_substring ( string , start , end).
     h                                         .         127 . ubstring ( string , start , end).
                                                             s                                 .
1544 .     const char *string;.   .                      128 .     const char *string;.    .
1545 .      int start , end;..                           129 .     int start , end;.  .
1546 . .
     {                                                   130 . .
                                                             {
1547 . register int len ;. .                             131 . register int len ;.  .
1548 . register char *result ;.   .                      132 . register char *result ;.    .
1549 . len = end − start;.    .                          133 . len = end − start;.     .
1550 . result = (char *)xmalloc (len + 1);.   .          134 . result = (char *)xmalloc (len + 1);.  .
1551 . strncpy ( result , string + start, len);. .       135 . strncpy ( result , string + start, len);.
                                                                                                       .
1552 . result [ len ] = '0';.  .                        136 . result [ len ] = '0';.   .
1553 . return result ;. .                                137 . return ( result );..
1554 . .
     }                                                   138 . .
                                                             }

       (a) lib/readline/histexpand.c                                          (b) stringlib.c
                     Figure: Example of source code in bash-4.2
                                                                        .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                   ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
 Jiachen Yang (IST, Osaka-U)                   Fica@IWSC2012                                      June 4, 2012                          3 / 14
Un-Interesting U:8 vs I:0



191  ... __P((char *, arrayind_t, . har *));. 309 static
                                  c          .                   int run_one_command __P((. har *));.
                                                                                               c         .
192 .static intmax_t subexpr __P((char *));. 310 .static
                                                .                int run_wordexp __P((char *));.   .
193 .static intmax_t expcomma __P((void));.311 .static
                                                  .              int uidget __P((void));..
194 .static intmax_t expassign __P((void));. 312 .static
                                                .                void init_interactive __P((void));. .
195 .static intmax_t expcond __P((void));. 313 .static
                                              .                  void init_noninteractive __P((void));..
196 .static intmax_t explor __P((void));.  .       314 .static   void init_interactive_script __P((void));..
197 .static intmax_t expland __P((void. );
                                         )
                                         .         315 .static   void set_shell_name __P((char. *));
                                                                                                 .

                   (a) expr.c                                                 (b) shell.c
                     Figure: Example of source code in bash-4.2



                                                                              .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                         ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)                    Fica@IWSC2012                                            June 4, 2012                          4 / 14
Disagreed U:4 vs I:4

710 static int
711 displen (s)                                  1098 else
712       const char *s;                         1099 {
713 {                                            1100    if ( wcharlist == 0)
714   wchar_t *wcstr;                            1101   {
715   size_t wclen, slen ;                       1102      size_t len. .
                                                                      ;
716   wcstr = 0..;                               1103      . len = mbstowcs (wcharlist, charlist , 0);.
                                                                                                      .
717   . len = mbstowcs (wcstr, s, 0);.
      s                              .           1104      . if (len == −1).  .
718   .if (slen == −1).  .                       1105      .    len = 0;..
719   . slen = 0;. .                             1106      . wcharlist = (wchar_t *)xmalloc (sizeof .... .
720   w
      . cstr = (wchar_t *)xmalloc (sizeof ....
                                             .   1107      . mbstowcs (wcharlist, charlist , len + 1);..
721   m
      . bstowcs (wcstr, s, slen + 1);.
                                     .           1108      }
722   wclen = wcswidth (wcstr, slen);            1109       if (wcschr (wcharlist , wc))
723    free (wcstr);                             1110         break;
724   return (( int)wclen);                      1111 }
725 }
                                                                             (b) subst.c
           (a) execute_cmd.c
                    Figure: Example of source code in bash-4.2
                                                                       .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
 Jiachen Yang (IST, Osaka-U)                Fica@IWSC2012                                        June 4, 2012                          5 / 14
Fica — the name


 Filter for
 Individual user on code
 Clone
 Analysis
                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                                  June 4, 2012                          6 / 14
Fica — the website




                          Figure: Snapshot of Fica


                                                     .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)     Fica@IWSC2012                                  June 4, 2012                          7 / 14
... ... ........ ........ ........ ....... . . .... .
... ... ........ ........ ........ ....... . . .... .
... ... ........ ........ ........ ....... . . .... .
... ... ........ ........ ........ ....... . . .... .
Compare Code Clone Similarity

Pi = possibility to be interesting
Pu = possibility to be un-interesting
 Len    Pi      Pi /Pu    Pu     Comp
 50 5.56% 1.18 4.72%               O
 87 2.89% 1.11 2.59%               O
 79 1.97% 0.69 2.87%               X
 63 3.55% 0.64 5.57%               O
 77 2.66% 0.46 5.83%               X


                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        11 / 14
Good Experiment Result
All training 44               Matched 32      un-interesting 1
All evaluation 34             Accuracy 94.12% interesting 1




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        12 / 14
Bad Experiment Result
All training 47               Matched 14      un-interesting 16
All evaluation 31             Accuracy 45.16% interesting 1




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        13 / 14
Open Question



 How to improve accuracy?
     By combining metrics like McCabe Cyclomatic
     Complexity?
 Thank you!




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        14 / 14
Unmatched: User un-interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        15 / 14
Unmatched: User interesting




                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        16 / 14
Overall Workflow
   . Submits source code
   1

   .
   2 Detects clones

   .
   3 Mark clones as “interesting”

     or not
   . Records marked clones into
   4

     database
   .
   5 Studies characteristics of

     marks using machine learning
                                  Figure: Overall Workflow
     algorithms                   of Fica with CDT
   .
   6 Ranks unmarked clones based

     on machine learning
                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)   Fica@IWSC2012                               June 4, 2012                        17 / 14
Calc Similarity of Clones


                                   |t : t ∈ d|
                      tf(t, d) =                                                                                      (1)
                                        |d|
                                             |D|
                    idf(t, D) = log                                                                                   (2)
                                    1 + |d ∈ D : t ∈ d|
              tfidf(t, d, D) = tf(t, d) × idf(t, D)                                                                   (3)
                 −− −→
                  −−−
                 tfidf(d, D) = [tfidf(t, d, D) ∀t ∈ d]                                                                (4)


                                                      .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)    Fica@IWSC2012                                 June 4, 2012                        18 / 14
Predicting Category


                       −− −→ −− −→
                        −−−             −−−
        sim(a, b, D) = tfidf(a, D) · tfidf(b, D)                                                                          (5)
                       {
                                0       , sim(a, b, D) = 0
       nsim(a, b, D) =      sim(a,b,D)                                                                                    (6)
                           |sim(a,b,D)| , otherwise

                              {
                                  ∑
                                            1                  , |M| = 0
          poss(t, M) =                ∀m∈M nsim(t,m,M)
                                                                                                                          (7)
                                           |M|                 , otherwise


                                                          .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                     ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)       Fica@IWSC2012                                  June 4, 2012                        19 / 14
Result — bash
                                A    B   C    D   E      F             G              H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                  .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                             ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)              Fica@IWSC2012                                   June 4, 2012                        20 / 14
Result — git
                                     A     B   C    D   E      F    G             H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10       20 30 40 50 60 70 80 90 100
                                         Percentage of Training Set (%)
                                                                        .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                   ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)                    Fica@IWSC2012                                   June 4, 2012                        21 / 14
Result — xz
                                 A    B   C    D   E      F        G               H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                   .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                              ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)               Fica@IWSC2012                                   June 4, 2012                        22 / 14
Result — e2fsprogs
                                A    B   C   D   E       F        G                 H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                  .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                             ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)              Fica@IWSC2012                                   June 4, 2012                        23 / 14
Result — All Projects
                                     A   B     C   D    E    F             G           H
                      100



                           75
            Accuracy (%)




                           50



                           25



                            0
                                10   20 30 40 50 60 70 80 90 100
                                     Percentage of Training Set (%)
                                                                      .    .    .      . . . . . . . . . . . . . . .                  .        .    .    .
                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..       ..   ..   ..
Jiachen Yang (IST, Osaka-U)                  Fica@IWSC2012                                   June 4, 2012                        24 / 14

More Related Content

More from Jiachen Yang

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性Jiachen Yang
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Jiachen Yang
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テストJiachen Yang
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsJiachen Yang
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object OwnershipJiachen Yang
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究Jiachen Yang
 

More from Jiachen Yang (7)

データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性データモデルの更新を効率よく検証するの並列可能性
データモデルの更新を効率よく検証するの並列可能性
 
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
Slides for Semantic Versioning versus Breaking Changes: A Study of the Maven ...
 
チェックリストと分割に基づく 網羅と使用テスト
チェックリストと分割に基づく  網羅と使用テストチェックリストと分割に基づく  網羅と使用テスト
チェックリストと分割に基づく 網羅と使用テスト
 
Active Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly ReportsActive Refinement of Clone Anomaly Reports
Active Refinement of Clone Anomaly Reports
 
Inference and Checking of Object Ownership
Inference  and  Checking  of  Object OwnershipInference  and  Checking  of  Object Ownership
Inference and Checking of Object Ownership
 
基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究基于OpenNEbula的虚拟化服务器集群中节能 的研究
基于OpenNEbula的虚拟化服务器集群中节能 的研究
 
Cloud sim report
Cloud sim reportCloud sim report
Cloud sim report
 

Recently uploaded

BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseribangash
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 

Recently uploaded (20)

BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 

Output fica.beamer.43

  • 1. . Filtering Clones for Individual User Based on . Machine Learning Analysis Jiachen Yang, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University June 4, 2012 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 1 / 14
  • 2. Motivating Example Participants of survey Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  • 3. Motivating Example Participants of survey 1 2 3 4 5 6 7 8 Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  • 4. Motivating Example Participants of survey Clonesets Red: Un-interesting Blue: Interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 2 / 14
  • 5. Interesting U:0 vs I:8 1542 static . har *. c . 126 c . har *. . 1543 . istory_substring ( string , start , end). h . 127 . ubstring ( string , start , end). s . 1544 . const char *string;. . 128 . const char *string;. . 1545 . int start , end;.. 129 . int start , end;. . 1546 . . { 130 . . { 1547 . register int len ;. . 131 . register int len ;. . 1548 . register char *result ;. . 132 . register char *result ;. . 1549 . len = end − start;. . 133 . len = end − start;. . 1550 . result = (char *)xmalloc (len + 1);. . 134 . result = (char *)xmalloc (len + 1);. . 1551 . strncpy ( result , string + start, len);. . 135 . strncpy ( result , string + start, len);. . 1552 . result [ len ] = '0';. . 136 . result [ len ] = '0';. . 1553 . return result ;. . 137 . return ( result );.. 1554 . . } 138 . . } (a) lib/readline/histexpand.c (b) stringlib.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 3 / 14
  • 6. Un-Interesting U:8 vs I:0 191 ... __P((char *, arrayind_t, . har *));. 309 static c . int run_one_command __P((. har *));. c . 192 .static intmax_t subexpr __P((char *));. 310 .static . int run_wordexp __P((char *));. . 193 .static intmax_t expcomma __P((void));.311 .static . int uidget __P((void));.. 194 .static intmax_t expassign __P((void));. 312 .static . void init_interactive __P((void));. . 195 .static intmax_t expcond __P((void));. 313 .static . void init_noninteractive __P((void));.. 196 .static intmax_t explor __P((void));. . 314 .static void init_interactive_script __P((void));.. 197 .static intmax_t expland __P((void. ); ) . 315 .static void set_shell_name __P((char. *)); . (a) expr.c (b) shell.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 4 / 14
  • 7. Disagreed U:4 vs I:4 710 static int 711 displen (s) 1098 else 712 const char *s; 1099 { 713 { 1100 if ( wcharlist == 0) 714 wchar_t *wcstr; 1101 { 715 size_t wclen, slen ; 1102 size_t len. . ; 716 wcstr = 0..; 1103 . len = mbstowcs (wcharlist, charlist , 0);. . 717 . len = mbstowcs (wcstr, s, 0);. s . 1104 . if (len == −1). . 718 .if (slen == −1). . 1105 . len = 0;.. 719 . slen = 0;. . 1106 . wcharlist = (wchar_t *)xmalloc (sizeof .... . 720 w . cstr = (wchar_t *)xmalloc (sizeof .... . 1107 . mbstowcs (wcharlist, charlist , len + 1);.. 721 m . bstowcs (wcstr, s, slen + 1);. . 1108 } 722 wclen = wcswidth (wcstr, slen); 1109 if (wcschr (wcharlist , wc)) 723 free (wcstr); 1110 break; 724 return (( int)wclen); 1111 } 725 } (b) subst.c (a) execute_cmd.c Figure: Example of source code in bash-4.2 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 5 / 14
  • 8. Fica — the name Filter for Individual user on code Clone Analysis . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 6 / 14
  • 9. Fica — the website Figure: Snapshot of Fica . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 7 / 14
  • 10. ... ... ........ ........ ........ ....... . . .... .
  • 11. ... ... ........ ........ ........ ....... . . .... .
  • 12. ... ... ........ ........ ........ ....... . . .... .
  • 13. ... ... ........ ........ ........ ....... . . .... .
  • 14. Compare Code Clone Similarity Pi = possibility to be interesting Pu = possibility to be un-interesting Len Pi Pi /Pu Pu Comp 50 5.56% 1.18 4.72% O 87 2.89% 1.11 2.59% O 79 1.97% 0.69 2.87% X 63 3.55% 0.64 5.57% O 77 2.66% 0.46 5.83% X . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 11 / 14
  • 15. Good Experiment Result All training 44 Matched 32 un-interesting 1 All evaluation 34 Accuracy 94.12% interesting 1 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 12 / 14
  • 16. Bad Experiment Result All training 47 Matched 14 un-interesting 16 All evaluation 31 Accuracy 45.16% interesting 1 . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 13 / 14
  • 17. Open Question How to improve accuracy? By combining metrics like McCabe Cyclomatic Complexity? Thank you! . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 14 / 14
  • 18. Unmatched: User un-interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 15 / 14
  • 19. Unmatched: User interesting . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 16 / 14
  • 20. Overall Workflow . Submits source code 1 . 2 Detects clones . 3 Mark clones as “interesting” or not . Records marked clones into 4 database . 5 Studies characteristics of marks using machine learning Figure: Overall Workflow algorithms of Fica with CDT . 6 Ranks unmarked clones based on machine learning . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 17 / 14
  • 21. Calc Similarity of Clones |t : t ∈ d| tf(t, d) = (1) |d| |D| idf(t, D) = log (2) 1 + |d ∈ D : t ∈ d| tfidf(t, d, D) = tf(t, d) × idf(t, D) (3) −− −→ −−− tfidf(d, D) = [tfidf(t, d, D) ∀t ∈ d] (4) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 18 / 14
  • 22. Predicting Category −− −→ −− −→ −−− −−− sim(a, b, D) = tfidf(a, D) · tfidf(b, D) (5) { 0 , sim(a, b, D) = 0 nsim(a, b, D) = sim(a,b,D) (6) |sim(a,b,D)| , otherwise { ∑ 1 , |M| = 0 poss(t, M) = ∀m∈M nsim(t,m,M) (7) |M| , otherwise . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 19 / 14
  • 23. Result — bash A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 20 / 14
  • 24. Result — git A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 21 / 14
  • 25. Result — xz A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 22 / 14
  • 26. Result — e2fsprogs A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 23 / 14
  • 27. Result — All Projects A B C D E F G H 100 75 Accuracy (%) 50 25 0 10 20 30 40 50 60 70 80 90 100 Percentage of Training Set (%) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Jiachen Yang (IST, Osaka-U) Fica@IWSC2012 June 4, 2012 24 / 14