Your SlideShare is downloading. ×
MeCC: Memory Comparison based Clone Detector
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MeCC: Memory Comparison based Clone Detector

1,466
views

Published on

ICSE 2011 technical session presentation by Yungbum Jung from SNU.

ICSE 2011 technical session presentation by Yungbum Jung from SNU.


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,466
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MeCC: Memory Comparison- based Clone DetectorHeejung Kim1,Yungbum Jung1, Sunghun Kim2, and Kwangkeun Yi1 Seoul National University 12 The Hong Kong University of Science and Technology http://ropas.snu.ac.kr/mecc/ 1
  • 2. Code Clones • similar code fragments (syntactically or semantically)static PyObject * static PyObject *float_add(PyObject *v, PyObject *w) float_mul(PyObject *v, PyObject *w){ { double a,b; double a,b; CONVERT_TO_DOUBLE(v,a); CONVERT_TO_DOUBLE(v,a); CONVERT_TO_DOUBLE(w,b); CONVERT_TO_DOUBLE(w,b); PyFPE_START_PROTECT(“add”,return 0) PyFPE_START_PROTECT(“multiply”,return 0) a = a + b; a = a * b; PyFPE_END_PROTECT(a) PyFPE_END_PROTECT(a) return PyFloat_FromDouble(a); return PyFloat_FromDouble(a);} } 2
  • 3. Applications of Code Clones• software refactoring• detecting potential bugs• understanding software evolution• detecting software plagiarism (malicious duplication) 3
  • 4. Clone Detectors• CCFinder [TSE’02] textual tokens• DECKARD [ICSE’07] AST characteristic vectors• PDG-based [ICSE‘08, SAS’01] program dependence graph Effective for syntactic code clones limited for semantic code clones 4
  • 5. Three code clonesmissed by syntax-based clone detection 5
  • 6. #1 Control ReplacementPyObject *PyBool_FromLong (long ok) static PyObject *get_pybool (int istrue){ { PyObject *result; PyObject *result = if (ok) result = Py_True; istrue? Py_True: Py_False; else result = Py_False; Py_INCREF(result); Py_INCREF(result); return result; return result;} } syntactically different but semantically identical 6
  • 7. #2 Capturing Procedural Effectsvoid appendPQExpBufferChar (PQExpBuffer str, char ch) { /* Make more room if needed *. if (!enlargePQExpBuffer(str, 1)) return; /* OK, append the data */ str->data[str->len] = ch; str->len++; str->data[str->len] = ‘0’;}void appendBinaryPQExpBuffer (PQExpBuffer str, const char* data, size_t datalen) { /* Make more room if needed *. if (!enlargePQExpBuffer(str, datalen)) return; /* OK, append the data */ memcpy(str->data + str->len, data, datalen); understanding memory str->len+= datalen; str->data[str->len] = ‘0’; behavior of procedures} 7
  • 8. ... *set_access_name(cmd_parms *cmd, void *dummy, const char *arg){ void *sconf = cmd->server->module_config; core_server_config *conf = ap_get_module_config(sconf, &core_module); const char *err = ap_check_cmd_context(sconf,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); if (err != NULL) { return err; } conf->access_name = apr_pstrdup(cmd->pool,arg); return NULL;} #3 More Complex Clone... *set_protocol(cmd_parms *cmd, void *dummy, const char *arg){ const char *err = ap_check_cmd_context(cmd,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); core_server_config *conf = ap_get_module_config(cmd->server->module_config, &core_module); char *proto; if (err != NULL) { return err; } proto = apr_pstrdup(cmd->pool,arg); ap_str_tolower(proto); conf->protocol = proto; return NULL; 8}
  • 9. ... *set_access_name(cmd_parms *cmd, void *dummy, const char *arg){ void *sconf = cmd->server->module_config; core_server_config *conf = ap_get_module_config(sconf, &core_module); const char *err = ap_check_cmd_context(sconf,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); if (err != NULL) { return err; } conf->access_name = apr_pstrdup(cmd->pool,arg); return NULL;} statement reordering... *set_protocol(cmd_parms *cmd, void *dummy, const char *arg){ const char *err = ap_check_cmd_context(cmd,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); core_server_config *conf = ap_get_module_config(cmd->server->module_config, &core_module); char *proto; if (err != NULL) { return err; } proto = apr_pstrdup(cmd-pool,arg); ap_str_tolower(proto); conf-protocol = proto; return NULL; 9}
  • 10. ... *set_access_name(cmd_parms *cmd, void *dummy, const char *arg){ void *sconf = cmd-server-module_config; core_server_config *conf = ap_get_module_config(sconf, core_module); const char *err = ap_check_cmd_context(sconf,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); if (err != NULL) { return err; } conf-access_name = apr_pstrdup(cmd-pool,arg); return NULL;} statement intermediate reordering variables... *set_protocol(cmd_parms *cmd, void *dummy, const char *arg){ const char *err = ap_check_cmd_context(cmd,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); core_server_config *conf = ap_get_module_config(cmd-server-module_config, core_module); char *proto; if (err != NULL) { return err; } proto = apr_pstrdup(cmd-pool,arg); ap_str_tolower(proto); conf-protocol = proto; return NULL; 10}
  • 11. ... *set_access_name(cmd_parms *cmd, void *dummy, const char *arg){ void *sconf = cmd-server-module_config; core_server_config *conf = ap_get_module_config(sconf, core_module); const char *err = ap_check_cmd_context(sconf,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); if (err != NULL) { return err; } conf-access_name = apr_pstrdup(cmd-pool,arg); return NULL;} statement intermediate statement reordering variables splitting... *set_protocol(cmd_parms *cmd, void *dummy, const char *arg){ const char *err = ap_check_cmd_context(cmd,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); core_server_config *conf = ap_get_module_config(cmd-server-module_config, core_module); char *proto; if (err != NULL) { return err; } proto = apr_pstrdup(cmd-pool,arg); ap_str_tolower(proto); conf-protocol = proto; return NULL; 11}
  • 12. ... *set_access_name(cmd_parms *cmd, void *dummy, const char *arg){ void *sconf = cmd-server-module_config; core_server_config *conf = ap_get_module_config(sconf, core_module); const char *err = ap_check_cmd_context(sconf,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); if (err != NULL) { return err; } conf-access_name = apr_pstrdup(cmd-pool,arg); return NULL;} statement intermediate statement reordering variables splitting... *set_protocol(cmd_parms *cmd, void *dummy, const char *arg){ const char *err = ap_check_cmd_context(cmd,NOT_IN_DIR_LOC_FILE | NOT_IN_LIMIT); core_server_config *conf = ap_get_module_config(cmd-server-module_config, core_module); char *proto; if (err != NULL) { return err; } proto = apr_pstrdup(cmd-pool,arg); ap_str_tolower(proto); conf-protocol = proto; return NULL; 12}
  • 13. These Semantic Clonesare Identified by MeCC 13
  • 14. MeCC: Our Approach• Static analyzer estimates the semantics of programs• Abstract memories are results of analysis• Comparing abstract memories is a measure 14
  • 15. Clone Detection Process procedures P P1 P2 P3 P4program 15
  • 16. Clone Detection Process procedures P abstract P1 P2 memories P3 P4 Static F (P ) = Mprogram Analyzer
  • 17. Clone Detection Process procedures P abstract P1 P2 memories P3 P4 Static F (P ) = Mprogram Analyzer Comparing Memories S(M, M ) similarities 17
  • 18. Clone Detection Process procedures P abstract P1 P2 memories P3 P4 Static F (P ) = Mprogram Analyzer Comparing Memories Code Clones Grouping P1 P2 S(M, M ) P3 P4 similarities 18
  • 19. Clone Detection Process procedures P abstract P1 P2 memories P3 P4 Static F (P ) = Mprogram Analyzer Comparing Memories Code Clones Grouping P1 P2 S(M, M ) P3 P4 similarities 19
  • 20. Estimating Semantics by log MinEntry Abstract Memories S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82int make (list *a, int count){ int r = count + 1; Address Values if (a!=0){ a → {(true, α)} a-next = malloc(...); count → {(true, β)} a-next-val = count; r → {(true, β + 1)} } else { α.next → {(α = 0, )} return r - 1; .val → {(α = 0, β)} } RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} return r;} a → {(true, α)} b → {(true, β)}• Estimating an abstract memory at the α.n .v → → {(α = 0, )} {(α = 0, β)} procedure’s exit point RETV → {(α = 0, β), (α = 0, β + 2)} {}, {} P ⇓ v, M• Abstract memory is a map from abstract {}, {} P : τ addresses to abstractlist next} type list = {int x, values 20 let list node = {x:=1, next:={}}
  • 21. Estimating Semantics by log MinEntry Abstract Memories S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82int make (list *a, int count){ int r = count + 1; Address Values if (a!=0){ a → {(true, α)} a-next = malloc(...); count → {(true, β)} a-next-val = count; r → {(true, β + 1)} } else { α.next → {(α = 0, )} return r - 1; .val → {(α = 0, β)} } RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} return r;} a → {(true, α)} b → {(true, β)}• Estimating an abstract memory at the α.n .v → → {(α = 0, )} {(α = 0, β)} procedure’s exit point RETV → {(α = 0, β), (α = 0, β + 2)} {}, {} P ⇓ v, M• Abstract memory is a map from abstract {}, {} P : τ addresses to abstractlist next} type list = {int x, values 21 let list node = {x:=1, next:={}}
  • 22. Estimating Semantics by log MinEntry Abstract Memories S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 int make (list *a, int count){ int r = count + 1; Address Values if (a!=0){ a → {(true, α)} a-next = malloc(...); count → {(true, β)} a-next-val = count; r → {(true, β + 1)} } else { α.next → {(α = 0, )} return r - 1; .val → {(α = 0, β)} } RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} return r; } a → {(true, α)} b → {(true, β)}• Use symbols for unknown input values α.n → {(α = 0, )} .v → {(α = 0, β)} RETV → {(α = 0, β), (α = 0, β + 2)}• All abstract values are guarded by execution {}, {} P ⇓ v, M path conditions {}, {} P : τ type list = {int x, list next} 22 let list node = {x:=1, next:={}}
  • 23. Estimating Semantics by log MinEntry Abstract Memories S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 int make (list *a, int count){ int r = count + 1; Address Values if (a!=0){ a → {(true, α)} a-next = malloc(...); count → {(true, β)} a-next-val = count; r → {(true, β + 1)} } else { α.next → {(α = 0, )} return r - 1; .val → {(α = 0, β)} } RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} return r; } a → {(true, α)} b → {(true, β)}• Use symbols for unknown input values α.n → {(α = 0, )} .v → {(α = 0, β)} RETV → {(α = 0, β), (α = 0, β + 2)}• All abstract values are guarded by execution {}, {} P ⇓ v, M path conditions {}, {} P : τ type list = {int x, list next} 23 let list node = {x:=1, next:={}}
  • 24. Estimating Semantics by log MinEntry Abstract Memories S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82int make (list *a, int count){ int r = count + 1; Address Values if (a!=0){ a → {(true, α)} a-next = malloc(...); count → {(true, β)} a-next-val = count; r → {(true, β + 1)} } else { α.next → {(α = 0, )} return r - 1; .val → {(α = 0, β)} } RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} return r;} a → {(true, α)} b → {(true, β)}copy and modify α.n .v → → {(α = 0, )} {(α = 0, β)} RETV → {(α = 0, β), (α = 0, β + 2)}int make2 (list2 *a, int b){ if (a==0) return b; {}, {} P ⇓ v, M a-n = malloc(...); a-n-v = b; return b + 2; {}, {} P : τ} type list = {int x, list next} 24 let list node = {x:=1, next:={}}
  • 25. Estimating Semantics by log MinEntry Abstract Memories S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82int make (list *a, int count){ int r = count + 1; Address Values log MinEntry if (a!=0){ a S(M→M ) log(| M1 {(true, α)} , 1 2 | + | M2 |) a-next = malloc(...); count → {(true, β)} a-next-val = count; r → {(true, β + 1)} } else { 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 α.next → {(α = 0, )} return r - 1; .val → {(α = 0, β)} } a RETV → → {(α = 0, {(true, α)}(α = 0, β + 1)} β + 1 − 1), return r; count → {(true, β)}} r a→ → {(true, 1)} {(true, β +α)} α.next b→ → {(true, β)} {(α = 0, )}copy and modify .val α.n →→ {(α = = 0, )} {(α 0, β)} RETV Address → = 0, β + Values(α = 0, β + 1)} .v {(α → {(α = 1 − 1), 0, β)} RETV → {(α = 0, β), (α = 0, β + 2)}int make2 (list2 *a, int b){ a → {(true, α)} if (a==0) return b; b → {}, {} {(true, β)} P ⇓ v, M a-n = malloc(...); α.n → {(α = 0, )} a-n-v = b; → return b + 2; .v {}, {(α = 0, τ {} P : β)} RETV → {(α = 0, β), (α = 0, β + 2)}} type list = {int x, list next} {}, {} P ⇓ v, M 25 let list node = {x:=1, next:={}}
  • 26. Clone Detection Process procedures P abstract P1 P2 memories P3 P4 Static F (P ) = Mprogram Analyzer Comparing Memories Code Clones Grouping P1 P2 S(M, M ) P3 P4 similarities 26
  • 27. a → {(true, α)} log MinEntry count → {(true, β)} S(M1 , M2 ) log(| M1 | + | M2 |) r → {(true, β + 1)} Comparing Abstract Memories 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 α.next .val → → {(α = 0, )} {(α = 0, β)} RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} a → {(true, α)} a → {(true, α)} count → {(true, β)} b → {(true, β)} r → {(true, β + 1)} α.n → {(α = 0, )} α.next → {(α = 0, )} .v → {(α = 0, β)} .val → {(α = 0, β)} RETV a {(tru → {(α = 0, β), (α = 0, β + 2)} RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} count {(tru {}, {} P ⇓ v, M aa → {(true, α)} {(true, α)} r {(true, b → {(true, β)} α.next {(α = count α.n → {(α = 0, )} {(true,log MinEntry β)} {}, {} P : τ α.val {(α = r → {(α = 0, β)} M2 ) log(| M1 | + | M2 |) {(true, β + 1)} 1. Classifying addresses into similar classes .v α.next S(M1 , type list = {int x, list next} MinEntry RETV a log RETV → {(α = 0, β), (α = 0, {(α2)} 0, )} log MinEntry β + = {(true, α)} {(α = 0, β + 1 − 1 S(M , M log(| M1 | + | M2 |) a {(true α.val {}, {} 2(2letM S(M1.0 +21)1·log(| )M+ |5) = M2 |) {(true, β)} {(α ,= {x:=1,2 next:={}} 0.82 list node =0, β)} count + | ·v, local return parameters P ⇓in1.0 + 2 · 1 M field addresses {(true, β + 1)} {(true 0.5)/(6 1 r 0, β + 1)} {(true, α)} b RETV {(α = 0, βa 1 − 1), (α = + a node.next.x 2(2{(true, α)} α.next address variables· 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) ={(αα.n0, )} = 0.82β)} {(α = 0 {}, {} P : τ count .val 1.0 + 1α.n .v α.val {(true,α.v β)} +{(true, in0.5)/(6 + 5) = 0.82 = 0, 2(2x· 1.0 {a:=1,α)} β)}E 2 · b:=2} · {(α {(α = 0 count a let {(true, :== {int x, list next} a b a r RETV {(true, .val α)} {(true, β)}βlist1)} .v {(true, βRETV (α{(α = 0, β), (α {(α = 0, β + 1 −+ 1)} 0, β + 1)} 1), = r type list{(true, {(true, α)} → α.next x, = {int + next} {(α {(α )}0, .vprev} 0, x, tsil α.n .val β)} α.nextcount type tsil =={(true,)} β)}ode = {x:=1, next:={}} countα.n {int = {(true, a {(α {(true,)} {}, {} P ⇓ v = 0, α)} → 27 → = 0, β b {(true, α)} = 0, β)} α.val {(true, a{(α = 0, β)} β)} + 1)} {(α {(true, β)}xt.x r α.v α.val r let→ {(true, β + {(α ... {x:=1, next:={}} 1)}
  • 28. a → {(true, α)} log MinEntry count → {(true, β)} S(M1 , M2 ) log(| M1 | + | M2 |) r → {(true, β + 1)} Comparing Abstract Memories 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 α.next .val → → {(α = 0, )} {(α = 0, β)} RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} a → {(true, α)} a → {(true, α)} count → {(true, β)} b → {(true, β)} r → {(true, β + 1)} α.n → {(α = 0, )} α.next → {(α = 0, )} .v → {(α = 0, β)} .val → {(α = 0, β)} RETV → {(α = 0, β), (α = 0, β + 2)} RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} a {(true, α)} {}, {} P ⇓ v, M counta → {(true, α)} β)} {(true, b → {(true, β)} r {(true, )}+ 1)} {(α = 0, β {}, {} P : τ α.n → α.next .v 2. Compareβ)} )} → {(α = guarded values in the same {(α = 0, 0, type list = {int x, list next} α.val similar classes (score 0.0 to 1.0) RETV → {(α = 0, {(α = 0, β + 2)} β), (α = 0, β)} RETV {(α {}, {} β P ⇓letM1), (α = 0, β + 1)} − = 0, + 1v, list node = {x:=1, next:={}} a in {(true, α)} count {(true, α)} α)} a {(true, {(true, β)} {}, {} P : τ node.next.x score 1.0 t r b {(true, β)} x{(true, β b:=2} in E {(true, β)} let := {a:=1, + 1)} α.next = 0, β+ = − )}(α= 0, )} 1)} = {int x, listα.n next} {(α 1 0, {(α {(true, β +1)} 1), = 0, β + {(α α.val{(α = {(α = 0, β)}= = 0, β)} α.v 0, )} ={(αβ + 2)} type list score {int x, list next}ext = {x:=1, next:={}}{(α = 0, β), (α tsil = {int x, tsil prev} ode 0.5 RETV RETV {(α =typeβ +0, − 1), (α = 0, β + 1)} 0, 1 28al xt.x {(α = 0, β)} let ... {x:=1, next:={}}
  • 29. a → {(true, α)} log MinEntry count → {(true, β)} S(M1 , M2 ) log(| M1 | + | M2 |) r → {(true, β + 1)} Comparing Abstract Memories 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 α.next .val → → {(α = 0, )} {(α = 0, β)} RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} a → {(true, α)} a → {(true, α)} count → {(true, β)} b → {(true, β)} r → {(true, β + 1)} α.n → {(α = 0, )} α.next → {(α = 0, )} .v → {(α = 0, β)} .val → {(α = 0, β)} {(true, α)}RETV → {(α = 0, β), (α = 0, β + 2)} RETV → {(α = 0, β + 1 − 1), (α = 0, β + 1)} {}, {} P ⇓ v, M a → {(true, α)} →(4 × 1.0 + 1 β)} 0.0 + 4 × 1.0 + 2 × {(true, × 0.5) 3. Find the best combination that maximizes the b α.n → {}, = P : τ {} 0.82 {(α = 0, )} 6 + 5 total score .v → {(α = 0, β)} type list = {int x, list next} RETV → {(α = 0, β), (α = 0, β + 2)} maximum score {}, {} P ⇓ v, M1 , M2 ) = S(Mlist node = {x:=1, next:={}} let in {(true, α)} 1 | + | M2 | |M node.next.x {}, {} P : τ | {a:=1, − F(c )E| let x := F(c) b:=2} in (4 × 1.0 + 1 × 0.0 + 4 × 1.0 + 2 × 0.5)= {int x, list next} type list = {int x, list next} = 0.82 ≥ 0.8ode = {x:=1, next:={}} type6tsil 5 {int x, tsil prev} + = 29 10xt.x let ... {x:=1, next:={}}
  • 30. Experimental Results 30
  • 31. Subject Projects Projects KLOC Procedures Application Python 435 7,657 interpreter Apache 343 9,483 web serverPostgreSQL 937 10,469 database 31
  • 32. Detected Clones Total 623 6% 2% code clones 39% 53% Type-1 Type-2 Type-3 Type-4C. K. Roy and J. R. Cordy. A survey on software clone detection research. SCHOOL OF COMPUTING TR 2007-541, QUEENʼS UNIVERSITY, 115, 2007.
  • 33. Semantic Clones45% Total 623 6% 2% code clones 39% 53% Type-1 Type-2 Type-3 Type-4
  • 34. Comparison CCfinder CCfinder textual tokensPDG-basedDECKARD PDG-based MeCC program 0 75 150 225 300 dependency graphs CCfinder DECKARDPDG-based characteristic vectorsDECKARD MeCC Type-3 Type-4 0 10 20 30 40 34
  • 35. Applications of Code Clones• software refactoring• detecting potential bugs• understanding software evolution• detecting software plagiarism (malicious duplication) 35
  • 36. Finding Potential Bugs• A large portion of semantic clones are due to inconsistent changes• Inconsistent changes may lead to potential bugs (inconsistent clones) Two semantic clones with potential bugs 36
  • 37. #1 Missed Null Checkconst char *GetVariable (VariableSpace space, const char *name){ struct_variable *current; if (!space) parameter name also should be checked! return NULL; for (current=space-next;current;current=current-next) { if (strcmp(current-name,name) == 0) { return current-value; } } return NULL;}const char *PQparameterStatus (const PGconn *conn, const char *paramName){ const pgParameterStatus *pstatus; if (!conn || !paramName) return NULL; for (pstatus=conn-pstatus; pstatus!=NULL; pstatus = pstatus-next) { if (strcmp(pstatus-name,paramName)== 0) return pstatus-value; } return NULL;} 37
  • 38. #2 A Resource Leak BugPyObject *pwd_getpwall (PyObject *self){ PyObject *d; struct passwd *p; if ((d = PyList_New(0)) == NULL) return NULL; setpwent(); open user database while ((p = getpwent()) != NULL) { PyObject *v = mkpwent(p); if (v==NULL || PyList_Append(d,v)!=0) { Py_XDECREF(v); Py_DECREF(d); return NULL; A resource leak without } Py_DECREF(v); endpwent() procedure call } endpwent(); close user database return d;} Python project revision #20157 38
  • 39. A Bug-free Procedure PyObject *spwd_getspall (PyObject *self,PyObject *pwd_getpwall (PyObject *self) PyObject *args){ { PyObject *d; PyObject *d; struct passwd *p; struct spwd *p; if ((d = PyList_New(0)) == NULL) if ((d = PyList_New(0)) == NULL) return NULL; return NULL; setpwent(); setspent(); while ((p = getpwent()) != NULL) { while ((p = getspent()) != NULL) { PyObject *v = mkpwent(p); PyObject *v = mkspent(p); if (v==NULL || PyList_Append(d,v)!=0) { if (v==NULL || PyList_Append(d,v)!=0) { Py_XDECREF(v); Py_XDECREF(v); Py_DECREF(d); Py_DECREF(d); endspent(); return NULL; return NULL; } } Py_DECREF(v); Py_DECREF(v); } } endpwent(); endspent(); return d; return d;} } Python project revision #38359 39
  • 40. The Bug is Fixed Later PyObject *spwd_getspall (PyObject *self,PyObject *pwd_getpwall (PyObject *self) PyObject *args){ { PyObject *d; PyObject *d; struct passwd *p; struct spwd *p; if ((d = PyList_New(0)) == NULL) if ((d = PyList_New(0)) == NULL) return NULL; return NULL; setpwent(); setspent(); while ((p = getpwent()) != NULL) { while ((p = getspent()) != NULL) { PyObject *v = mkpwent(p); PyObject *v = mkspent(p); if (v==NULL || PyList_Append(d,v)!=0) { if (v==NULL || PyList_Append(d,v)!=0) { Py_XDECREF(v); Py_XDECREF(v); Py_DECREF(d); Py_DECREF(d); endpwent(); return NULL; bug-fixed endspent(); return NULL; } } Py_DECREF(v); Py_DECREF(v); } } endpwent(); endspent(); return d; return d;} } Python project revision #73017 40
  • 41. Procedure A was created revision #20157 with a resource leak Procedure B (a code clone of A) revision #38359 is introduced without resource leaks4 years the resource leak can be fixed if MeCC were applied The resource leak bug in revision #73017 procedure A is fixed 41
  • 42. const char *GetVariable (VariableSpace space, const char *name) const char *PQparameterStatus (const PGconn *conn, const char *paramName){ { struct_variable *current; const pgParameterStatus *pstatus; if (!space) if (!conn || !paramName) return NULL; return NULL; for (current=space-next;current;current=current-next) for (pstatus=conn-pstatus; pstatus!=NULL; pstatus = pstatus-next) { { if (strcmp(current-name,name) == 0) if (strcmp(pstatus-name.paramName)== 0) { return pstatus-value; return current-value; } } return NULL; } } return NULL;} MeCC successfully identifies these procedures PyObject *spwd_getspall (PyObject *self, PyObject *pwd_getpwall (PyObject *self) PyObject *args) { { PyObject *d; PyObject *d; struct passwd *p; struct spwd *p; if ((d = PyList_New(0)) == NULL) if ((d = PyList_New(0)) == NULL) return NULL; return NULL; setpwent(); setspent(); while ((p = getpwent()) != NULL) { while ((p = getspent()) != NULL) { PyObject *v = mkpwent(p); PyObject *v = mkspent(p); if (v==NULL || PyList_Append(d,v)!=0) { if (v==NULL || PyList_Append(d,v)!=0) { Py_XDECREF(v); Py_XDECREF(v); Py_DECREF(d); Py_DECREF(d); endspent(); return NULL; return NULL; } } Py_DECREF(v); Py_DECREF(v); } } endpwent(); endspent(); return d; return d; } } 42
  • 43. Potential Bugs and Code Smells #Semantic Potential Code Clones Bugs (%) Smells (%)Python 95 26 (27.4%) 23 (24.2%)Apache 81 8 ( 9.9%) 27 (33.3%)PostgreSQL 102 21 (20.6%) 20 (19.6%)Total 278 55 (19.8%) 70 (25.2%) detected by MeCC 43
  • 44. Study Limitation• Projects are open source and may not be representative• All clones are manually inspected• Default options are used for other tools (CCfinder, Deckard, PDG-based) 44
  • 45. Conclusion• MeCC: Memory Comparison-based Clone Detector • a new clone detector using semantics- based static analysis • tolerant to syntactic variations • can be used to find potential bugs 45
  • 46. Thank You! http://ropas.snu.ac.kr/mecc/ 46
  • 47. Backup Slides 47
  • 48. Time Spent Projects KLOC FP Total Time Python 435 39 264 1h Apache 343 24 191 5hPostgreSQL 937 47 278 7h Ubuntu 64-bit machine with a 2.4 GHz Intel Core 2 Quad CPU and 8 GB RAM.• False positive ratio is less than 15%• Slower than other tools (deep semantic analysis) 48
  • 49. Structure Initialization 49
  • 50. Structure Initialization 50
  • 51. Judgement of Clones• Two parameters • In our experiment, similarity threshold 0.8 is used • Penalty function for small size of code clones log MinEntry S(M1 , M2 ) log(| M1 | + | M2 |) 2(2 · 1.0 + 2 · 1.0 + 1 · 0.5)/(6 + 5) = 0.82 51 a {(true, α)}
  • 52. Static Analyzer• Flow-sensitive• Context-sensitive by procedural summaries• Path-sensitive• Abstract interpretation http://spa-arrow.com 52