The presentation will start by summarizing some results of the Eureka/ITEA project GGCC (Global GNU Compiler Collection) where Julio collaborated in the design of an open platform for coding rule validation.
Then, the presentation continues on ellaboration on the different connections between formal techniques, in a broad sense, and open source software development.
Finally, I will discuss how these examples lead naturally to the emergent concept of semantic forge.
08448380779 Call Girls In Friends Colony Women Seeking Men
GCC RULES
1. The Eureka/ITEA Global GCC Project
Julio Mari˜ o
n
(joint work with Guillem Marpons and others)
Babel Research Group — Universidad Polit´cnica de Madrid
e
FOSSA09, Grenoble
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 1 / 30
2. Overview
1 Project Overview
2 Coding Rule Validation
Structural Rule Validation
Domain-specific language: CRISP
3 The need for static analysis
4 Lessons learned
5 The way ahead
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 2 / 30
3. Context
The Global GCC Project (2006–2008)
ITEA-labeled consortium of industrial / research partners
Industrial: Mandriva, Bertin, Telefonica I+D, small/medium-sized
companies
Research labs: INRIA, CEA-LIST, UPM
Goal: make the GNU Compiler Collection (GCC) more attractive to
the (european) software industry by transferring academic results
in three areas:
Project-wide static analysis
Global optimization
Minimise programming hazards by means of coding rules
Global GCC knowledge base: integrates heterogeneous information
provided by the different components of GGCC
http://www.ggcc.info
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 3 / 30
4. Coding Rules
Definition
Coding Rules constrain admissible constructs of a
language to help produce more reliable and
maintainable code.
Standard coding rule sets do exist, e.g.:
High-Integrity C++ (HICPP): general C++ applications
MISRA-C (C language): automotive industry / embedded systems
Many organisations need to write their own rule sets
or adapt existing ones.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 4 / 30
5. Coding Rules
Some Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not be (MISRA-C 12.6)
used as operands to operators other than (&&, || and !)”
“If a virtual function in a base class is not overridden in (HICPP 3.3.6)
any derived class, then make it non virtual”
“All automatic variables shall have been assigned a value (MISRA-C 9.1)
before being used”
“Behaviour should be implemented by only one member (HICPP 3.1.9)
function in a class”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 5 / 30
6. Coding Rules
Some Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not be (MISRA-C 12.6)
used as operands to operators other than (&&, || and !)”
“If a virtual function in a base class is not overridden in (HICPP 3.3.6)
any derived class, then make it non virtual”
“All automatic variables shall have been assigned a value (MISRA-C 9.1)
before being used”
“Behaviour should be implemented by only one member (HICPP 3.1.9)
function in a class”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 5 / 30
7. Coding Rules
Some Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not be (MISRA-C 12.6)
used as operands to operators other than (&&, || and !)”
“If a virtual function in a base class is not overridden in (HICPP 3.3.6)
any derived class, then make it non virtual”
“All automatic variables shall have been assigned a value (MISRA-C 9.1)
before being used”
“Behaviour should be implemented by only one member (HICPP 3.1.9)
function in a class”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 5 / 30
8. Coding Rules
Some Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not be (MISRA-C 12.6)
used as operands to operators other than (&&, || and !)”
“If a virtual function in a base class is not overridden in (HICPP 3.3.6)
any derived class, then make it non virtual”
“All automatic variables shall have been assigned a value (MISRA-C 9.1)
before being used”
“Behaviour should be implemented by only one member (HICPP 3.1.9)
function in a class”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 5 / 30
9. Coding Rules
Some Actual Examples
“Do not call the malloc() function” (MISRA-C 20.4)
“Do not use the ‘inline’ keyword for member functions” (HICPP 3.1.7)
“Expressions that are effectively Boolean should not be (MISRA-C 12.6)
used as operands to operators other than (&&, || and !)”
“If a virtual function in a base class is not overridden in (HICPP 3.3.6)
any derived class, then make it non virtual”
“All automatic variables shall have been assigned a value (MISRA-C 9.1)
before being used”
“Behaviour should be implemented by only one member (HICPP 3.1.9)
function in a class”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 5 / 30
10. Rule Conformance Checking
Problems with Current Approaches
Rules are specified in natural language:
Ambiguity
Automatic checking hindered
Closed tools
Lack of extensibility
Proposed Solution
Define a logic based language that allows for precisely specifying
rule sets such as MISRA-C or HICPP
Use logic programming to get an automatic rule conformance
checking procedure
Integrate information provided by different program analyses
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 6 / 30
11. Rule Conformance Checking
Problems with Current Approaches
Rules are specified in natural language:
Ambiguity
Automatic checking hindered
Closed tools
Lack of extensibility
Proposed Solution
Define a logic based language that allows for precisely specifying
rule sets such as MISRA-C or HICPP
Use logic programming to get an automatic rule conformance
checking procedure
Integrate information provided by different program analyses
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 6 / 30
12. Other Tools
Proprietary tools:
Compilers: IAR Systems (C)
QA: Parasoft, Klocwork, Coverity, Semmle Code (Java)
Free software:
Checkstyle (Java)
Gendarme (ECMA CIL, Mono and .Net)
Drawbacks:
Lack of appropriate extensibility mechanisms
Ambiguity in natural language
Interoperability is difficult
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 7 / 30
13. Motivation: C++ “Strange” Behavior
class A
{
public : A :: A () {
A (); func ();
virtual void func (); }
};
class B : public A B * d = new B ();
{ // A :: func or B :: func ?
B () : A () {}
virtual void func ();
};
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 8 / 30
14. Motivation: C++ “Strange” Behavior
class A
{
public : A :: A () {
A (); func ();
virtual void func (); }
};
class B : public A B * d = new B ();
{ // A :: func or B :: func ?
B () : A () {}
virtual void func ();
};
Coding Rule:
“Do not invoke virtual methods of the declared class
in a constructor or destructor.”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 8 / 30
15. C++ “strange” behavior (2)
class Base {};
class Derived : public Base
{
public :
~ Derived () {}
};
void foo ()
{
Derived * d = new Derived ;
delete d ; // c o r r e c t l y calls derived d e s t r u c t o r
}
void boo ()
{
Derived * d = new Derived ;
Base * b = d ;
delete b ; // problem ! does not call derived d e s t r u c t o r !
}
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 9 / 30
16. C++ “strange” behavior (2)
class Base {};
class Derived : public Base
{
public :
~ Derived () {}
};
void foo ()
{
Derived * d = new Derived ;
delete d ; // c o r r e c t l y calls derived d e s t r u c t o r
}
void boo ()
{
Derived * d = new Derived ;
Base * b = d ;
delete b ; // problem ! does not call derived d e s t r u c t o r !
}
Rule HICPP 3.3.2
“Write a ‘virtual’ destructor for base classes.”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 9 / 30
17. Example
Rule Formalisation
Rule HICPP 3.3.15
“Ensure base classes common to more than one derived class are
virtual”
violate hicpp 3,3,15(a, b, c, d) ←
b=c ∧
direct base of (a, b) ∧
direct base of (a, c) ∧
base of (b, d) ∧ base of (c, d) ∧
¬virtual base of (a, c)
Rules are specified in an enriched LP-language with: disequality,
quantifiers, constructive negation and sorts.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 10 / 30
18. Example
Extraction of Program Information and Search of Violations
Rule HICPP 3.3.15 in Prolog
v i o l a t e _ h i c p p _ 3 _ 3 _ 1 5 (A ,B ,C , D ) : -
class ( B ) , class ( C ) ,
B = C ,
class ( D ) , class ( A ) ,
direc t_base_ of (A , B ) ,
direc t_base_ of (A , C ) ,
base_of (B , D ) ,
base_of (C , D ) ,
+ vi rt u al _b as e _o f (A , C ).
class(’:: Animal ’). class(’:: WingedAnimal ’).
class(’:: Mammal ’). class(’:: Bat ’).
direct base of (’:: Animal ’, ’:: Mammal ’).
direct base of (’:: Animal ’, ’:: WingedAnimal ’).
direct base of (’:: Mammal ’, ’:: Bat ’).
direct base of (’:: WingedAnimal ’, ’:: Bat ’).
virtual base of (’:: Animal ’, ’:: Mammal ’).
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 11 / 30
19. Proposed Approach
1 Formalize rules in a logic-based specification language
that is executable: CRISP
2 Use GCC ?? for gathering necessary program
information
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 12 / 30
20. Our Rule Checking Procedure
Coding rules C++ project
(in English) source files
Coding rules
formalized 1 Coding rule(s) written once
in CRISPC++
in the logic-based formalism
Coding rule g++’ 2 Extract program information
compiler (project build)
(+ analysis information if
Coding rules
Project facts
available) using GCC, and
compiled store it
in Prolog
into Prolog
3 Search (using a Prolog
Ciao Prolog
engine) for a counterexample
engine
Rule viola-
tions report
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 13 / 30
21. Our Rule Checking Procedure
Coding rules C++ project
(in English) source files
Coding rules
formalized 1 Coding rule(s) written once
in CRISPC++
in the logic-based formalism
Coding rule g++’ 2 Extract program information
compiler (project build)
(+ analysis information if
Coding rules
Project facts
available) using GCC, and
compiled store it
in Prolog
into Prolog
3 Search (using a Prolog
Ciao Prolog
engine) for a counterexample
engine
Rule viola-
tions report
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 13 / 30
22. Our Rule Checking Procedure
Coding rules C++ project
(in English) source files
Coding rules
formalized 1 Coding rule(s) written once
in CRISPC++
in the logic-based formalism
Coding rule g++’ 2 Extract program information
compiler (project build)
(+ analysis information if
Coding rules
Project facts
available) using GCC, and
compiled store it
in Prolog
into Prolog
3 Search (using a Prolog
Ciao Prolog
engine) for a counterexample
engine
Rule viola-
tions report
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 13 / 30
23. Our Rule Checking Procedure
Coding rules C++ project
(in English) source files
Coding rules
formalized 1 Coding rule(s) written once
in CRISPC++
in the logic-based formalism
Coding rule g++’ 2 Extract program information
compiler (project build)
(+ analysis information if
Coding rules
Project facts
available) using GCC, and
compiled store it
in Prolog
into Prolog
3 Search (using a Prolog
Ciao Prolog
engine) for a counterexample
engine
Rule viola-
tions report
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 13 / 30
24. CRISP Building Blocks 1: Sorts
Variable, DataMember, LocalVariable
Function, MemberFunction, Constructor
Type, PointerType, Record
Scope, Namespace, Record, CompoundStatement
Operator
ArgumentTypeInFunctionType
ClassMember
Thing
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 14 / 30
25. CRISP Building Blocks 2: (Binary) Relations
Function calls Function
Record hasImmediateBase Record
Variable hasType NonFunctionType
Function hasType FunctionType
Thing isDefinedIn Scope
Scope isNestedIn Scope
Record hasMember MemberFunction
Record hasMember DataMember
Record hasBase Record
Record isPrivateBaseOf Record
Record isVirtualBaseOf Record
PointerType hasPointedType Type
FunctionType hasReturnType Type
Record hasFriend Record
Record hasFriend MemberFunction
ClassMember hasVisibility Visibility
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 15 / 30
26. Example of Rule Formalization
Rule HICPP 3.3.13:
“Do not invoke virtual methods of the declared class
in a constructor or destructor.”
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 16 / 30
27. Example of Rule Formalization
Rule HICPP 3.3.13:
“Do not invoke virtual methods of the declared class
in a constructor or destructor.”
rule HICPP 3.3.13
violated by Caller : MemberFunction; Callee : VirtualFunction
when exists R : Record such that
(
R hasMember Caller
and R hasMember Callee
and
(
Caller is Constructor
or Caller is Destructor
)
and Caller calls+ Callee
)
.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 16 / 30
28. Formalization of Rule HICPP 3.3.2
Rule HICPP 3.3.13:
“Write a ‘virtual’ destructor for base classes.”
rule HICPP 3.3.2
violated by C : Record
when exists C’ such that C’ hasBase C
and not exist VD : Destructor such that
(
VD isDefinedIn C
and VD is VirtualFunction
)
.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 17 / 30
29. Auxiliary Sorts and Relations
relation F : Function overloads F’ : Function
when exists S : Scope ; N : String such that
(
F isDefinedIn S
and F’ isDefinedIn S
and F hasUnqualifiedName N
and F’ hasUnqualifiedName N
and F = F’
)
.
sort M : ClassMember is PrivateClassMember
when exists V : Visibility such that
(
M hasVisibility V and V is ‘private’
)
.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 18 / 30
30. Experimental Results
P ROJECT KL OC L OAD T IME # V IOLATIONS (C HECKING T IME )
3.3.1 3.3.2 3.3.11 3.3.15
Bacula 20 0.24 0 (0.0) 3 (0.0) 0 (0.0) 0 (0.0)
CLAM 46 1.62 1 (0.0) 15 (0.5) 115 (0.1) 0 (0.2)
Firebird 439 2.61 16 (0.0) 60 (1.0) 115 (0.2) 0 (0.3)
IT++ 39 0.42 0 (0.0) 6 (0.0) 12 (0.0) 0 (0.0)
OGRE 209 3.05 0 (0.0) 15 (0.9) 79 (0.2) 0 (0.3)
Orca 89 1.17 1 (0.0) 12 (0.4) 0 (0.1) 0 (0.2)
Qt 595 10.42 15 (0.0) 75 (10.5) 1155 (1.3) 4 (1.2)
All times expressed in seconds.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 19 / 30
31. Work in Progress
1 Implement / Enrich the CRISP Language
2 Implement more rules with information given by other tools
3 Open our abstract representation of programs to external tools
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 20 / 30
32. Implement / enrich the CRISP language
Quantification and true negation needed
Both performed over certain domains (sorts)
Infinite domains may appear with templates / generics
We have an implementation of constructive intensional negation
Goals automatically reordered
Extend CRISP to other languages: Java, Ada, C, Fortran, . . .
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 21 / 30
33. Integration of Information from External Analyzers
Coding rules C++ project
(in English) source files
Coding rules
formalized
in CRISPC++
Coding rule g++’
compiler (project build)
Coding rules
Project facts
compiled
in Prolog
into Prolog
Ciao Prolog
engine
Rule viola-
tions report
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 22 / 30
34. Integration of Information from External Analyzers
Coding rules C++ project
(in English) source files
Coding rules
External
formalized
Analyzer
in CRISPC++
Coding rule g++’
compiler (project build) Translation
Knowledge Base about the compiled program
Ciao Prolog
engine
Rule viola-
tions report
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 22 / 30
35. Example of New Relation that Needs Specific Analysis
relation F : MemberFunction maySelfCall G : MemberFunction
when (
exists C : Record ; R : ProgramLocation such that
(
C hasMember F
and C hasMember G
and F = G
and F hasProgramLocation L
and G calledOn L
and L mayAlias ’this’
)
)
or F mustSelfCall G
.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 23 / 30
36. Example of Rule that Needs Specific Analysis (1)
Rule HICPP 3.4.2:
“Do not return non-const handles to class data from const member functions”
rule HICPP 3.4.2
violated by F : ConstMemberFunction
when exists C : Record;
L : ProgramLocation;
A : PrivateDataMember;
P : PointerType
such that
(
A hasType P
and not P is ConstType
and C hasMember A
and C hasMember F
and F returns L
and L mayAlias A
)
.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 24 / 30
37. Example of Rule that Needs Specific Analyses (2)
Rule HICPP 3.2.5:
“Ensure destructors release all objects owned by the object”
rule HICPP 3.2.5
violated by D : Destructor
when exists C : Record; A : DataMember; F : MemberFunction;
L : ProgramLocation such that
(
C hasMember D
and C hasMember A
and not D releases A
and L isFreshLocationIn F
and A mayPointTo L
and not exists G : MemberFunction such that
(
C hasMember G
and not A mustBeLinkedFromHeapIn G
)
)
.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 25 / 30
38. New Relations
ProgramLocation mayPointTo AbstractMemoryLocation
ProgramLocation mustPointTo AbstractMemoryLocation
ProgramLocation mayAlias ProgramLocation
ProgramLocation mustAlias ProgramLocation
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 26 / 30
39. Lessons learned
go out & meet people
Industrial projects are different, but there is a whole world of
problems to solve out there.
Take advantage of european instruments to get in contact with the
industry / overall impression with ITEA quite positive.
Do not try to include your own research agenda in the proposal,
that will not work!
. . . but it can work in the opposite direction:
DESAF10S (2010–2012), Spanish Ministry of Science and
Innovation
PROMETIDOS (2010–2013), Madrid Regional
Goverment/European Social Fund
A PhD on its way!
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 27 / 30
40. Lessons learned
be open, in several ways if possible
Adding the open source label to your project proposal may be beneficial
but try to avoid the obvious, naive argumentations.
Global GCC exemplified the benefits of openness in several aspects:
The GCC suite itself, as a vehicle for efficient transfer of advanced
compilation techniques to the european industry, alleviating their
dependency from external proprietary solutions.
Our proposal for an extensible platform for coding rule
specification and validation is itself open source in the sense that
specs are code that can be shared and enhanced by a new market
of potential users.
This is only possible thanks to a variety of existing static analysers
and tools (e.g. CIAO) from academia already distributed on open
source licenses.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 28 / 30
41. Lessons learned
keep your ears open for unexpected applications
Coding rules for COBOL and beyond. . .
Tools for semi-automatic refactoring
Better source code searches at Google
SAFE-GCC: NXP, Trimedia. . .
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 29 / 30
42. Lessons learned
some negative bits. . .
The GNU compiler collection itself may be a problem, sometimes,
due to an obsolete architecture
Issues with copyright transfer to the FSF
Multiplicity of languages has been a problem as well (i.e. multiple
front-ends)
Do not try to solve all the problems of our planet. . . Get focused!
Read the small print — national issues concerning european
projects, etc.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 30 / 30
43. The way ahead
current state of affairs
Preliminary conclusions:
Clean (declarative) semantics given to potentially ambiguous
coding rules by means of (extended) logic programming
A number of rules implemented using plain Prolog
Rule violations found in highly regarded C++ projects!
Checker: little resource (memory and time) consumption
Future work:
Complete definition of a highly expressive language aimed at
specifying rules and translation scheme into efficient Prolog
Connect the framework with other parts of the GGCC project
Improve performance of overall checking procedure
http://www.ggcc.info
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 31 / 30
44. The way ahead
a research agenda
Focus on tools
Do not miss reliability of open software as a real issue!
Bring semantics to open source software development
type systems
description logics (ontologies, etc.)
static program analysis (abstract interpretation, model checking,
etc.)
programming language design (DSLs, concurrency. . . )
The future is. . . SF
searching sources based on types (Foogle)
ontology powered semantic desktops (Nepomuk)
coherent management of packages (Mancoosi)
automatic discovery and composition of sw (AMOS, EZweb)
safe composition of components
etc.
Mari˜ o et al. (UPM)
n Global GCC FOSSA, November 2009 32 / 30