MeCC: Memory Comparison-
based Clone Detector
Heejung Kim1,Yungbum Jung1, Sunghun Kim2, and Kwangkeun Yi1
Seoul National University
1
2 The Hong Kong University of Science and Technology
http://ropas.snu.ac.kr/mecc/
1
Code Clones
• similar code fragments
(syntactically or semantically)
static PyObject * static PyObject *
float_add(PyObject *v, PyObject *w) float_mul(PyObject *v, PyObject *w)
{ {
double a,b; double a,b;
CONVERT_TO_DOUBLE(v,a); CONVERT_TO_DOUBLE(v,a);
CONVERT_TO_DOUBLE(w,b); CONVERT_TO_DOUBLE(w,b);
PyFPE_START_PROTECT(“add”,return 0) PyFPE_START_PROTECT(“multiply”,return 0)
a = a + b; a = a * b;
PyFPE_END_PROTECT(a) PyFPE_END_PROTECT(a)
return PyFloat_FromDouble(a); return PyFloat_FromDouble(a);
} }
2
MeCC: Our Approach
• Static analyzer estimates the semantics of
programs
• Abstract memories are results of analysis
• Comparing abstract memories is a measure
14
Subject Projects
Projects KLOC Procedures Application
Python 435 7,657 interpreter
Apache 343 9,483 web server
PostgreSQL 937 10,469 database
31
Detected Clones
Total 623
6% 2% code clones
39%
53%
Type-1 Type-2
Type-3 Type-4
C. K. Roy and J. R. Cordy. A survey on software clone detection research. SCHOOL OF COMPUTING TR 2007-541, QUEENʼS UNIVERSITY, 115, 2007.
Finding Potential Bugs
• A large portion of semantic clones are due
to inconsistent changes
• Inconsistent changes may lead to potential
bugs (inconsistent clones)
Two semantic clones with potential bugs
36
#1 Missed Null Check
const char *GetVariable (VariableSpace space, const char *name)
{
struct_variable *current;
if (!space) parameter name also should be checked!
return NULL;
for (current=space-next;current;current=current-next)
{
if (strcmp(current-name,name) == 0)
{
return current-value;
}
}
return NULL;
}
const char *PQparameterStatus (const PGconn *conn, const char *paramName)
{
const pgParameterStatus *pstatus;
if (!conn || !paramName)
return NULL;
for (pstatus=conn-pstatus; pstatus!=NULL; pstatus = pstatus-next)
{
if (strcmp(pstatus-name,paramName)== 0)
return pstatus-value;
}
return NULL;
} 37
#2 A Resource Leak Bug
PyObject *pwd_getpwall (PyObject *self)
{
PyObject *d;
struct passwd *p;
if ((d = PyList_New(0)) == NULL)
return NULL;
setpwent(); open user database
while ((p = getpwent()) != NULL) {
PyObject *v = mkpwent(p);
if (v==NULL || PyList_Append(d,v)!=0) {
Py_XDECREF(v);
Py_DECREF(d);
return NULL;
A resource leak without
}
Py_DECREF(v); endpwent() procedure call
}
endpwent(); close user database
return d;
}
Python project revision #20157
38
Procedure A was created
revision #20157
with a resource leak
Procedure B (a code clone of A)
revision #38359 is introduced
without resource leaks
4 years the resource leak can be fixed
if MeCC were applied
The resource leak bug in
revision #73017
procedure A is fixed
41
Study Limitation
• Projects are open source and may not be
representative
• All clones are manually inspected
• Default options are used for other tools
(CCfinder, Deckard, PDG-based)
44
Conclusion
• MeCC: Memory Comparison-based Clone
Detector
• a new clone detector using semantics-
based static analysis
• tolerant to syntactic variations
• can be used to find potential bugs
45
Time Spent
Projects KLOC FP Total Time
Python 435 39 264 1h
Apache 343 24 191 5h
PostgreSQL 937 47 278 7h
Ubuntu 64-bit machine with a 2.4 GHz Intel Core 2 Quad CPU and 8 GB RAM.
• False positive ratio is less than 15%
• Slower than other tools
(deep semantic analysis)
48