The use of the code analysis library OpenC++: modifications, improvements, error corrections
Upcoming SlideShare
Loading in...5
×
 

The use of the code analysis library OpenC++: modifications, improvements, error corrections

on

  • 411 views

The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The author tells about his experience of improving OpenC++ library and modifying the library for solving ...

The article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). The author tells about his experience of improving OpenC++ library and modifying the library for solving special tasks.

Statistics

Views

Total Views
411
Views on SlideShare
411
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The use of the code analysis library OpenC++: modifications, improvements, error corrections The use of the code analysis library OpenC++: modifications, improvements, error corrections Document Transcript

  • The use of the code analysis libraryOpenC++: modifications, improvements,error correctionsAuthor: Andrey KarpovDate: 12.01.2008AbstractThe article may be interesting for developers who use or plan to use OpenC++ library (OpenCxx). Theauthor tells about his experience of improving OpenC++ library and modifying the library for solvingspecial tasks.IntroductionOne may often here in forums that there are a lot of C++ syntax analyzers ("parsers"), and many ofthem are free. Or that one may take YACC, for example, and realize his own analyzer easily. Dontbelieve, it is not so easy [1, 2]. One may understand it especially if one remembers that it is even nothalf a task to parse syntax. It is necessary to realize structures for storing the program tree and semantictables containing information about different objects and their scopes. It is especially important whiledeveloping specialized applications related to the processing and static analysis of C++ code. It isnecessary for their realization to save the whole program tree what may be provided by few libraries.One of them is open library OpenC++ (OpenCxx) [3] about which well speak in this article.Wed like to help developers in mastering OpenC++ library and share our experience of modernizationand improvement of some defects. The article is a compilation of pieces of advice, each of which isdevoted to correction of some defect or realization of improvement.The article is based on recollections about changes that were carried out in VivaCore library [4] based onOpenC++. Of course, only a small part of these changes is discussed here. It is a difficult task toremember and describe them all. And, for example, description of addition of C language support intoOpenC++ library will take too much place. But you can always refer to original texts of VivaCore libraryand get a lot of interesting information.It remains to say that OpenC++ library is unfortunately out-of-date now and needs serious improvementfor supporting the modern C++ language standard. Thats why if you are going to realize a moderncompiler for example, youd better pay your attention to GCC or to commercial libraries [5, 6]. ButOpenC++ still remains a good and convenient tool for many developers in the sphere of systems ofspecialized processing and modification of program code. With the use of OpenC++ many interestingsolutions are developed, for example, execution environment OpenTS [7] for T++ programminglanguage (development of Program systems Institution RAS), static code analyzer Viva64 [8] or Synopsistool for preparing documentation on the original code [9].The purpose of the article is to show by examples how one can modify and improve OpenC++ librarycode. The article describes 15 library modifications related to error correction or addition of new
  • functionality. Each of them not only allows to make OpenC++ library better but also gives an opportunityto study its work principles deeper. Lets get acquainted with them.1. Skip of development environment keywords not influencing theprogram processingWhile developing a code analyzer for a specific development environment, you are likely to come acrosswith its specific language constructions. These constructions are often guidance for a concrete compilerand may not be of interest for you. But such constructions cannot be processed by OpenC++ library asthey are not a part of C++ language. In this case one of the simplest ways to ignore them is to add theminto rw_table table with ignore key. For example:static rw_table table[] = { ... { "__ptr32", Ignore}, { "__ptr64", Ignore}, { "__unaligned", Ignore}, ...};While adding you should keep in mind that words in rw_table table should be arranged in alphabeticorder. Be careful.2. Addition of a new lexemeIf you want to add a keyword which should be processed, you need to create a new lexeme ("token").Lets look at the example of adding a new keyword "__w64". At first create an identifier of the newlexeme (see token-name.h file), for example in this way:enum { Identifier = 258, Constant = 262, ... W64 = 346, // New token name ...};Modernize the table "table" in lex.cc file:static rw_table table[] = { ...
  • { "__w64", W64 }, ...};The next step is to create a class for the new lexeme, which well call LeafW64.namespace Opencxx{class LeafW64 : public LeafReserved {public: LeafW64(Token& t) : LeafReserved(t) {} LeafW64(char* str, ptrdiff_t len) : LeafReserved(str, len) {} ptrdiff_t What() { return W64; }};}To create an object well need to modify optIntegralTypeOrClassSpec() function:...case UNSIGNED : flag = U; kw = new (GC) LeafUNSIGNED(tk); break;case W64 : // NEW! flag = W; kw = new (GC) LeafW64(tk); break;...Pay attention that as far as weve decided to refer "__w64" to data types, well need the W symbol forcoding this type. You may learn more about type coding mechanism in Encoding.cc file.Introducing a new type we must remember that we need to modernize such functions asParser::isTypeSpecifier() for example.And the last important point is modification of Encoding::MakePtree function:
  • Ptree* Encoding::MakePtree(unsigned char*& encoded, Ptree* decl){ ... case W : typespec = PtreeUtil::Snoc(typespec, w64_t); break; ...}Of course, it is only an example, and adding other lexemes may take much more efforts. A good way toadd a new lexeme correctly is to take one close to it in sense and then find and examine all the places inOpenC++ library where it is used.3. Skip of development environment complex key constructions notinfluencing the program processingWe have already examined the way of skipping single keywords which are senseless for our program butimpede code parsing. Unfortunately, sometimes it is even more difficult. Lets take for demonstrationsuch constructions as __pragma and __noop which you may see in header files of VisualC++:__forceinline DWORD HEAP_MAKE_TAG_FLAGS ( DWORD TagBase, DWORD Tag ){ __pragma(warning(push)) __pragma(warning(disable : 4548)) do{__noop(TagBase);} while((0,0) __pragma(warning(pop)) ); return ((DWORD)((TagBase) + ((Tag) << 18)));}You may look for description of __pragma and __noop constructions in MSDN. The next points areimportant for our program: a) they are not of interest for us; b) they have some parameters; c) theyimpede code analysis.Lets add new lexemes at first, as it was told before, but now lets use InitializeOtherKeywords() functionfor this purpose:static void InitializeOtherKeywords(bool recognizeOccExtensions){ ... verify(Lex::RecordKeyword("__pragma", MSPRAGMA)); verify(Lex::RecordKeyword("__noop", MS__NOOP));
  • ...}Solution consists in modifying Lex::ReadToken function so that when we come across with DECLSPEC orMSPRAGMA lexeme we skip it. And then we skip all the lexemes related to __pragma and __noopparameters. For skipping all the unnecessary lexemes we use SkipDeclspecToken() function as it isshown further.ptrdiff_t Lex::ReadToken(char*& ptr, ptrdiff_t& len){ ... else if(t == DECLSPEC){ SkipDeclspecToken(); continue; } else if(t == MSPRAGMA) { // NEW SkipDeclspecToken(); continue; } else if(t == MS__NOOP) { //NEW SkipDeclspecToken(); continue; } ...}4. Function of full file paths disclosureIn tasks of analysis of original code a large amount of functionality is related to creation of errormessages and also to navigation on original files. What is inconvenient is that file names returned bysuch functions as Program::LineNumber() may be presented in different ways. Here are some examples:C:Program FilesMSVS 8VCatlmfcincludeafx.h.drawing.cppc:srcwxwindows-2.4.2samplesdrawingwx/defs.hBoostboost-1_33_1boost/variant/recursive_variant.hpp
  • ..FieldEdit2SrcamsEdit.cpp......srcbaseftbase.cThe way may be full or relative. Different delimiters may be used. All this makes the use of such waysinconvenient for processing or for output in information messages. Thats why we offer realization ofFixFileName() function bringing paths to uniform full way. An auxiliary function GetInputFileDirectory() isused to return the path to the catalogue where the processed file is situated.const string &GetInputFileDirectory() { static string oldInputFileName; static string fileDirectory; string dir; VivaConfiguration &cfg = VivaConfiguration::Instance(); string inputFileName; cfg.GetInputFileName(inputFileName); if (oldInputFileName == inputFileName) return fileDirectory; oldInputFileName = inputFileName; filesystem::path inputFileNamePath(inputFileName,filesystem::native); fileDirectory = inputFileNamePath.branch_path().string(); if (fileDirectory.empty()) { TCHAR curDir[MAX_PATH]; if (GetCurrentDirectory(MAX_PATH, curDir) != 0) { fileDirectory = curDir; } else { assert(false); } } algorithm::replace_all(fileDirectory, "/", ""); to_lower(fileDirectory); return fileDirectory;}
  • typedef map<string, string> StrStrMap;typedef StrStrMap::iterator StrStrMapIt;void FixFileName(string &fileName) { static StrStrMap FileNamesMap; StrStrMapIt it = FileNamesMap.find(fileName); if (it != FileNamesMap.end()) { fileName = it->second; return; } string oldFileName = fileName; algorithm::replace_all(fileName, "/", ""); algorithm::replace_all(fileName, "", ""); filesystem::path tmpPath(fileName, filesystem::native); fileName = tmpPath.string(); algorithm::replace_all(fileName, "/", ""); to_lower(fileName); if (fileName.length() < 2) { assert(false); FileNamesMap.insert(make_pair(oldFileName, fileName)); return; } if (fileName[0] == . && fileName[1] != .) { const string &dir = GetInputFileDirectory(); if (!dir.empty()) fileName.replace(0, 1, dir); FileNamesMap.insert(make_pair(oldFileName, fileName)); return; } if (isalpha(fileName[0]) && fileName[1] == : ) {
  • FileNamesMap.insert(make_pair(oldFileName, fileName)); return; } const string &dir = GetInputFileDirectory(); if (dir.empty()) fileName.insert(0, "."); else { fileName.insert(0, ""); fileName.insert(0, dir); } FileNamesMap.insert(make_pair(oldFileName, fileName));}5. Getting values of numerical literalsThe function of getting a value of a numerical literal may be useful in systems of building documentationon the code. For example, with its help one may see that the argument of "void foo(a = 99)" function is99 and use this for some purpose.GetLiteralType() function that we offer allows to get the literal type and its value if it is integer.GetLiteralType() function is created for getting information needed most often and doesnt supportrarely used record types. But if you need to support UCNs for example or get values of double type, youmay expand functionality of the functions given below by yourself.", 5) == 0) { retValue = 0; ; } ; } IsHexLiteral(*from, size_t len) { (len < 3) ; (from[0] != 0) ;(from[1] != x && from[1] != X) ; ; } SimpleTypeGetTypeBySufix( *from, size_t len) { assert(from != NULL); (len== 0) ST_INT; assert(!isdigit(*from)); suffix_8 = ;suffix_16 = ; suffix_32 = ; suffix_64 = ; suffix_i = ;suffix_l = ; suffix_u = ; (len != 0) { --len; c =*from++; (c) { 8: suffix_8 = ; ; 1:(len == 0 || *from++ != 6) { assert();ST_UNKNOWN; } --len; suffix_16 = ; ;3: (len == 0 || *from++ != 2) { assert();ST_UNKNOWN; } --len; suffix_32 = ; ;6: (len == 0 || *from++ != 4) { assert();ST_UNKNOWN; } --len; suffix_64 = ; ;I: i: suffix_i = ; ; U: u: suffix_u = ; ;L: l: suffix_l = ; ; : assert();ST_UNKNOWN; } } assert(suffix_8 + suffix_16 + suffix_32 +suffix_64 <= 1); (suffix_8 || suffix_16) ST_LESS_INT;(suffix_32) { (suffix_u) ST_UINT; ST_INT; }
  • (suffix_64) { (suffix_u) ST_UINT64; ST_INT64;} (suffix_l) { (suffix_u) ST_ULONG;ST_LONG; } (suffix_u) ST_UINT; assert(suffix_i);ST_INT; } SimpleType GetHexLiteral( *from, size_t len,&retValue) { assert(len >= 3); *p = from + 2; (!GetHex(p,len, retValue)) { ST_UNKNOWN; } ptrdiff_t newLen = len - (p -from); assert(newLen >= 0 && newLen < <ptrdiff_t>(len));GetTypeBySufix(p, newLen); } IsOctLiteral( *from, size_t len) {(len < 2) ; (from[0] != 0) ; ; } SimpleTypeGetOctLiteral( *from, size_t len,&retValue) { assert(len >= 2); *p = from + 1; (!GetOct(p,len, retValue)) { ST_UNKNOWN; } ptrdiff_t newLen = len - (p -from); assert(newLen >= 0 && newLen < <ptrdiff_t>(len));GetTypeBySufix(p, newLen); } SimpleType GetDecLiteral( *from, size_tlen, &retValue) { assert(len >= 1);*limit = from + len; n = 0; (from < limit) { c = *from;(c < 0 || c > 9) ; from++; n = n * 10 + (c - 0);} ptrdiff_t newLen = limit - from; (newLen == <ptrdiff_t>(len))ST_UNKNOWN; retValue = n; assert(newLen >= 0 && newLen <<ptrdiff_t>(len)); GetTypeBySufix(from, newLen); } SimpleTypeGetLiteralType( *from, size_t len,&retValue) { (from == NULL || len == 0) ST_UNKNOWN; retValue= 1; (from == NULL || len == 0) ST_UNKNOWN;(GetCharLiteral(from, len, retValue)) ST_LESS_INT;(GetStringLiteral(from, len)) ST_POINTER;(GetBoolLiteral(from, len, retValue)) ST_LESS_INT;(IsRealLiteral(from, len)) GetRealLiteral(from, len);(IsHexLiteral(from, len)) GetHexLiteral(from, len, retValue);(IsOctLiteral(from, len)) GetOctLiteral(from, len, retValue);GetDecLiteral(from, len, retValue); }unsigned __int64 GetHexValue(unsigned char c) { if (c >= 0 && c <= 9) return c - 0; if (c >= a && c <= f) return c - a + 0x0a; if (c >= A && c <= F) return c - A + 0x0a; assert(false); return 0;}bool GetHex(const char *&from, size_t len, unsigned __int64 &retValue) {
  • unsigned __int64 c, n = 0, overflow = 0; int digits_found = 0; const char *limit = from + len; while (from < limit) { c = *from; if (!isxdigit(c)) break; from++; overflow |= n ^ (n << 4 >> 4); n = (n << 4) + GetHexValue(c); digits_found = 1; } if (!digits_found) return false; if (overflow) { assert(false); } retValue = n; return true;}bool GetOct(const char *&from, size_t len, unsigned __int64 &retValue) { unsigned __int64 c, n = 0; bool overflow = false; const char *limit = from + len; while (from < limit) { c = *from; if (c < 0 || c > 7)
  • break; from++; overflow |= static_cast<bool>(n ^ (n << 3 >> 3)); n = (n << 3) + c - 0; } retValue = n; return true;}#define HOST_CHARSET_ASCIIbool GetEscape(const char *from, size_t len, unsigned __int64 &retValue) { /* Values of a b e f n r t v respectively. */ // HOST_CHARSET_ASCII static const char charconsts[] = { 7, 8, 27, 12, 10, 13, 9, 11 }; // HOST_CHARSET_EBCDIC //static const uchar charconsts[] = { 47, 22, 39, 12, 21, 13, 5, 11 }; unsigned char c; c = from[0]; switch (c) { /* UCNs, hex escapes, and octal escapes are processed separately. */ case u: case U: // convert_ucn - not supported. Return: 65535. retValue = 0xFFFFui64; return true; case x: { const char *p = from + 1;
  • return GetHex(p, len, retValue); } case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7: { const char *p = from + 1; return GetOct(p, len, retValue); } case : case : case ": case ?: break; case a: c = charconsts[0]; break; case b: c = charconsts[1]; break; case f: c = charconsts[3]; break; case n: c = charconsts[4]; break; case r: c = charconsts[5]; break; case t: c = charconsts[6]; break; case v: c = charconsts[7]; break; case e: case E: c = charconsts[2]; break; default: assert(false); return false; } retValue = c; return true;}//A, t, LA, xFEstatic bool GetCharLiteral(const char *from, size_t len, unsigned __int64 &retValue) { if (len >= 3) {
  • if (from[0] == && from[len - 1] == ) { unsigned char c = from[1]; if (c == ) { verify(GetEscape(from + 2, len - 3, retValue)); } else { retValue = c; } return true; } } if (len >= 4) { if (from[0] == L && from[1] == && from[len - 1] == ) { unsigned char c = from[2]; if (c == ) { verify(GetEscape(from + 3, len - 4, retValue)); } else { retValue = c; } return true; } } return false;}// "string"static bool GetStringLiteral(const char *from, size_t len) { if (len >= 2) { if (from[0] == " && from[len - 1] == ") return true;
  • } if (len >= 3) { if (from[0] == L && from[1] == " && from[len - 1] == ") return true; } return false;}bool IsRealLiteral(const char *from, size_t len) { if (len < 2) return false; bool isReal = false; bool digitFound = false; for (size_t i = 0; i != len; ++i) { unsigned char c = from[i]; switch(c) { case x: return false; case X: return false; case f: isReal = true; break; case F: isReal = true; break; case .: isReal = true; break; case e: isReal = true; break; case E: isReal = true; break; case l: break; case -: break; case +: break; case L: break; default: if (!isdigit(c))
  • return false; digitFound = true; } } return isReal && digitFound;}SimpleType GetRealLiteral(const char *from, size_t len) { assert(len > 1); unsigned char rc1 = from[len - 1]; if (is_digit(rc1) || rc1 == . || rc1 == l || rc1 == L || rc1 == e || rc1 == E) return ST_DOUBLE; if (rc1 == f || rc1 == F) return ST_FLOAT; assert(false); return ST_UNKNOWN;}bool GetBoolLiteral(const char *from, size_t len, unsigned __int64 &retValue) { if (len == 4 && strncmp(from, "true", 4) == 0) { retValue = 1; return true; } if (len == 5 && strncmp(from, "false", 5) == 0) { retValue = 0; return true; } return false;}
  • bool IsHexLiteral(const char *from, size_t len) { if (len < 3) return false; if (from[0] != 0) return false; if (from[1] != x && from[1] != X) return false; return true;}SimpleType GetTypeBySufix(const char *from, size_t len) { assert(from != NULL); if (len == 0) return ST_INT; assert(!isdigit(*from)); bool suffix_8 = false; bool suffix_16 = false; bool suffix_32 = false; bool suffix_64 = false; bool suffix_i = false; bool suffix_l = false; bool suffix_u = false; while (len != 0) { --len; const char c = *from++; switch(c) { case 8: suffix_8 = true; break; case 1: if (len == 0 || *from++ != 6) { assert(false); return ST_UNKNOWN;
  • } --len; suffix_16 = true; break; case 3: if (len == 0 || *from++ != 2) { assert(false); return ST_UNKNOWN; } --len; suffix_32 = true; break; case 6: if (len == 0 || *from++ != 4) { assert(false); return ST_UNKNOWN; } --len; suffix_64 = true; break; case I: case i: suffix_i = true; break; case U: case u: suffix_u = true; break; case L: case l: suffix_l = true; break; default: assert(false); return ST_UNKNOWN;}
  • } assert(suffix_8 + suffix_16 + suffix_32 + suffix_64 <= 1); if (suffix_8 || suffix_16) return ST_LESS_INT; if (suffix_32) { if (suffix_u) return ST_UINT; else return ST_INT; } if (suffix_64) { if (suffix_u) return ST_UINT64; else return ST_INT64; } if (suffix_l) { if (suffix_u) return ST_ULONG; else return ST_LONG; } if (suffix_u) return ST_UINT; assert(suffix_i); return ST_INT;}SimpleType GetHexLiteral(const char *from, size_t len,
  • unsigned __int64 &retValue) { assert(len >= 3); const char *p = from + 2; if (!GetHex(p, len, retValue)) { return ST_UNKNOWN; } ptrdiff_t newLen = len - (p - from); assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len)); return GetTypeBySufix(p, newLen);}bool IsOctLiteral(const char *from, size_t len) { if (len < 2) return false; if (from[0] != 0) return false; return true;}SimpleType GetOctLiteral(const char *from, size_t len, unsigned __int64 &retValue) { assert(len >= 2); const char *p = from + 1; if (!GetOct(p, len, retValue)) { return ST_UNKNOWN; } ptrdiff_t newLen = len - (p - from); assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len)); return GetTypeBySufix(p, newLen);}SimpleType GetDecLiteral(const char *from, size_t len, unsigned __int64 &retValue) {
  • assert(len >= 1); const char *limit = from + len; unsigned __int64 n = 0; while (from < limit) { const char c = *from; if (c < 0 || c > 9) break; from++; n = n * 10 + (c - 0); } ptrdiff_t newLen = limit - from; if (newLen == static_cast<ptrdiff_t>(len)) return ST_UNKNOWN; retValue = n; assert(newLen >= 0 && newLen < static_cast<ptrdiff_t>(len)); return GetTypeBySufix(from, newLen);}SimpleType GetLiteralType(const char *from, size_t len, unsigned __int64 &retValue) { if (from == NULL || len == 0) return ST_UNKNOWN; retValue = 1; if (from == NULL || len == 0) return ST_UNKNOWN; if (GetCharLiteral(from, len, retValue)) return ST_LESS_INT; if (GetStringLiteral(from, len)) return ST_POINTER; if (GetBoolLiteral(from, len, retValue)) return ST_LESS_INT;
  • if (IsRealLiteral(from, len)) return GetRealLiteral(from, len); if (IsHexLiteral(from, len)) return GetHexLiteral(from, len, retValue); if (IsOctLiteral(from, len)) return GetOctLiteral(from, len, retValue); return GetDecLiteral(from, len, retValue);}6. Correction of string literal processing functionWe offer you to modify Lex::ReadStrConst() function as it is shown further. This will allow to correct twoerrors related to processing of separated string literals. The first error occurs while processing strings ofthe following kind:const char *name = "VivaCore";The second:const wchar_t *str = L"begin"L"end".The corrected function variant:bool Lex::ReadStrConst(size_t top, bool isWcharStr){ char c; for(;;){ c = file->Get(); if(c == ){ c = file->Get(); // Support: "" if (c == r) { c = file->Get(); if (c != n) return false; } else if(c == 0)
  • return false;}else if(c == "</str>){ size_t pos = file->GetCurPos() + 1; ptrdiff_t nline = 0; do{ c = file->Get(); if(c == n) ++nline; } while(is_blank(c) || c == n); if (isWcharStr && c == L) { //Support: L"123" L"456" L "789". c = file->Get(); if(c == ") /* line_number += nline; */ ; else{ file->Unget(); return false; } } else { if(c == ") /* line_number += nline; */ ; else{ token_len = ptrdiff_t(pos - top); file->Rewind(pos); return true; } }}else if(c == n || c == 0)
  • return false; }}7. Partial correction of the processing of "bool r = a < 1 || b > (int) 2;"type expressionsThere is an error in OpenC++ related to the processing of some expressions which are wrongly taken fortemplates. For example, in a string "bool r = a < 1 || b > (int) 2;" "a" variable will be taken for a templatename and then a lot of troubles with syntactical analysis will follow... Full correction of this errorrequires great changes and is not realized by now. We offer you a temporary solution excluding themajor part of errors. Further the functions are given which may be added or modified.bool VivaParser::MaybeTypeNameOrClassTemplate(Token &token) { if (m_env == NULL) { return true; } const char *ptr = token.GetPtr(); ptrdiff_t len = token.GetLen(); Bind *bind; bool isType = m_env->LookupType(ptr, len, bind); return isType;}static bool isOperatorInTemplateArg(ptrdiff_t t) { return t == AssignOp || t == EqualOp || t == LogOrOp || t == LogAndOp || t == IncOp || t == RelOp;}/* template.args : < any* > template.args must be followed by ( or ::*/bool Parser::isTemplateArgs(){ ptrdiff_t i = 0;
  • ptrdiff_t t = lex->LookAhead(i++); if(t == <){ ptrdiff_t n = 1; while(n > 0){ ptrdiff_t u = lex->LookAhead(i++); /* TODO. :( Fixing: bool r = a < 1 || b > (int) 2; Well correct not all the cases but it will be betteranyway. Editing method. If an identifier is found near theoperator, it is obviously not a template because only a type or a constant expression may stay inside the brackets. An example which doesnt work anyway: r = a < fooi() || 1 > (int) b; Unfortunately, the following expression is processedincorrectly now, but such cases are fewer than corrected ones. template <int z> unsigned TFoo(unsigned a) { return a + z; } enum EEnum { EE1, EE2 }; b = TFoo < EE1 && EE2 > (2); */ ptrdiff_t next = lex->LookAhead(i); if (u == Identifier && isOperatorInTemplateArg(next))
  • return false; if (isOperatorInTemplateArg(u) && next == Identifier) return false; if(u == <) ++n; else if(u == >) --n; else if(u == (){ ptrdiff_t m = 1; while(m > 0){ ptrdiff_t v = lex->LookAhead(i++); if(v == () ++m; else if(v == )) --m; else if(v == 0 || v == ; || v == }) return false; } } else if(u == 0 || u == ; || u == }) return false; } t = lex->LookAhead(i); return bool(t == Scope || t == (); } return false;}
  • 8. Improved error correctionUnfortunately, the error correction mechanism in OpenC++ sometimes causes program crash. Problemplaces in OpenC++ are the code similar to this:if(!rDefinition(def)){ if(!SyntaxError()) return false; SkipTo(}); lex->GetToken(cp); // WARNING: crash in the same case. body = PtreeUtil::List(new Leaf(op), 0, new Leaf(cp)); return true;}One should pay attention to those places where the processing of errors occurs and correct them theway shown by the example of Parser::rLinkageBody() and Parser::SyntaxError() functions. The generalsense of the corrections is that after an error occurs, at first presence of the next lexeme should bechecked with the use of CanLookAhead() function instead of immediate extraction of it by usingGetToken,().bool Parser::rLinkageBody(Ptree*& body){ Token op, cp; Ptree* def; if(lex->GetToken(op) != {) return false; body = 0; while(lex->LookAhead(0) != }){ if(!rDefinition(def)){ if(!SyntaxError()) return false; // too many errors if (lex->CanLookAhead(1)) { SkipTo(}); lex->GetToken(cp); if (!lex->CanLookAhead(0)) return false;
  • } else { return false; } body = PtreeUtil::List(new (GC) Leaf(op), 0, new (GC) Leaf(cp)); return true; // error recovery } body = PtreeUtil::Snoc(body, def); } lex->GetToken(cp); body = new (GC) PtreeBrace(new (GC) Leaf(op), body, new (GC) Leaf(cp)); return true;}bool Parser::SyntaxError(){ syntaxErrors_ = true; Token t, t2; if (lex->CanLookAhead(0)) { lex->LookAhead(0, t); } else { lex->LookAhead(-1, t); } if (lex->CanLookAhead(1)) { lex->LookAhead(1, t2); } else { t2 = t; }
  • SourceLocation location(GetSourceLocation(*this, t.ptr)); string token(t2.ptr, t2.len); errorLog_.Report(ParseErrorMsg(location, token)); return true;}9. Update of rTemplateDecl2 functionWithout going into details we offer you to replace rTemplateDecl2() function with the given variant. Thiswill exclude some errors while working with template classes.bool Parser::rTemplateDecl2(Ptree*& decl, TemplateDeclKind &kind){ Token tk; Ptree *args = 0; if(lex->GetToken(tk) != TEMPLATE) return false; if(lex->LookAhead(0) != <) { if (lex->LookAhead(0) == CLASS) { // template instantiation decl = 0; kind = tdk_instantiation; return true; // ignore TEMPLATE } decl = new (GC) PtreeTemplateDecl(new (GC) LeafReserved(tk)); } else { decl = new (GC) PtreeTemplateDecl(new (GC) LeafReserved(tk)); if(lex->GetToken(tk) != <) return false;
  • decl = PtreeUtil::Snoc(decl, new (GC) Leaf(tk)); if(!rTempArgList(args)) return false; if(lex->GetToken(tk) != >) return false; } decl = PtreeUtil::Nconc(decl, PtreeUtil::List(args, new (GC) Leaf(tk))); // ignore nested TEMPLATE while (lex->LookAhead(0) == TEMPLATE) { lex->GetToken(tk); if(lex->LookAhead(0) != <) break; lex->GetToken(tk); if(!rTempArgList(args)) return false; if(lex->GetToken(tk) != >) return false; } if (args == 0) // template < > declaration kind = tdk_specialization; else // template < ... > declaration kind = tdk_decl; return true;}
  • 10. Detection of Ptree position in the program textIn some cases it is necessary to know in what places of the program text there is the code from which aparticular Ptree object was built.The function given below returns the address of the beginning and the end of memory space with thetext of the program from which the mentioned Ptree object was created.void GetPtreePos(const Ptree *p, const char *&begin, const char *&end) { if (p == NULL) return; if (p->IsLeaf()) { const char *pos = p->GetLeafPosition(); if (begin == NULL) { begin = pos; } else { begin = min(begin, pos); } end = max(end, pos); } else { GetPtreePos(p->Car(), begin, end); GetPtreePos(p->Cdr(), begin, end); }}11. Support of const A (a) type definitionsOpenC++ library doesnt support definition of variables of "const A (a)" type. To correct this defect a partof the code should be changed inside Parser::rOtherDeclaration function:if(!rDeclarators(decl, type_encode, false)) return false;Instead of it the following code should be used:if(!rDeclarators(decl, type_encode, false)) { // Support: const A (a);
  • Lex::TokenIndex after_rDeclarators = lex->Save(); lex->Restore(before_rDeclarators); if (lex->CanLookAhead(3) && lex->CanLookAhead(-2)) { ptrdiff_t c_2 = lex->LookAhead(-2); ptrdiff_t c_1 = lex->LookAhead(-1); ptrdiff_t c0 = lex->LookAhead(0); ptrdiff_t c1 = lex->LookAhead(1); ptrdiff_t c2 = lex->LookAhead(2); ptrdiff_t c3 = lex->LookAhead(3); if (c_2 == CONST && c_1 == Identifier && c0 == ( && c1 == Identifier && c2 == ) && (c3 == ; || c3 == =)) { Lex::TokenContainer newEmptyContainer; ptrdiff_t pos = before_rDeclarators; lex->ReplaceTokens(pos + 2, pos + 3, newEmptyContainer); lex->ReplaceTokens(pos + 0, pos + 1, newEmptyContainer); lex->Restore(before_rDeclarators - 2); bool res = rDeclaration(statement); return res; } }}In this code some auxiliary functions are used which are not discussed in this article. But you can findthem in VivaCore library.12. Support of definitions in classes of T (min)() { } type functionsSometimes while programming one has to use workarounds to reach the desirable result. For example,a widely known macro "max" often causes troubles while defining in a class a method of "T max(){return m;}" type. In this case one resorts to some tricks and define the method as "T (max)() {returnm;}". Unfortunately, OpenC++ doesnt understand such definitions inside classes. To correct this defectParser::isConstructorDecl() function should be changed in the following way:
  • bool Parser::isConstructorDecl(){ if(lex->LookAhead(0) != () return false; else{ // Support: T (min)() { } if (lex->LookAhead(1) == Identifier && lex->LookAhead(2) == ) && lex->LookAhead(3) == () return false; ptrdiff_t t = lex->LookAhead(1); if(t == * || t == & || t == () return false; // declarator else if(t == CONST || t == VOLATILE) return true; // constructor or declarator else if(isPtrToMember(1)) return false; // declarator (::*) else return true; // maybe constructor }}13. Processing of constructions "using" and "namespace" insidefunctionsOpenC++ library doesnt know that inside functions "using" and "namespace" constructions may beused. But one can easily correct it by modifying Parser::rStatement() function:bool Parser::rStatement(Ptree*& st){... case USING : return rUsing(st);
  • case NAMESPACE : if (lex->LookAhead(2) == =) return rNamespaceAlias(st); return rExprStatement(st);...}14. Making "this" a pointerAs it is known "this" is a pointer. But its not so in OpenC++. Thats why we should correctWalker::TypeofThis() function to correct the error of type identification.Replace the codevoid Walker::TypeofThis(Ptree*, TypeInfo& t){ t.Set(env->LookupThis());}withvoid Walker::TypeofThis(Ptree*, TypeInfo& t){ t.Set(env->LookupThis()); t.Reference();}15. Optimization of LineNumber() functionWe have already mentioned Program::LineNumber() function when saying that it returns file names indifferent formats. Then we offered FixFileName() function to correct this situation. But LineNumber()function has one more disadvantage related to its slow working speed. Thats why we offer you anoptimized variant of LineNumber() function./* LineNumber() returns the line number of the line pointed to by PTR.*/size_t Program::LineNumber(const char* ptr, const char*& filename,
  • ptrdiff_t& filename_length, const char *&beginLinePtr) const{ beginLinePtr = NULL; ptrdiff_t n; size_t len; size_t name; ptrdiff_t nline = 0; size_t pos = ptr - buf; size_t startPos = pos; if(pos > size){ // error? assert(false); filename = defaultname.c_str(); filename_length = defaultname.length(); beginLinePtr = buf; return 0; } ptrdiff_t line_number = -1; filename_length = 0; while(pos > 0){ if (pos == oldLineNumberPos) { line_number = oldLineNumber + nline; assert(!oldFileName.empty()); filename = oldFileName.c_str(); filename_length = oldFileName.length(); assert(oldBeginLinePtr != NULL); if (beginLinePtr == NULL) beginLinePtr = oldBeginLinePtr; oldBeginLinePtr = beginLinePtr;
  • oldLineNumber = line_number; oldLineNumberPos = startPos; return line_number;}switch(buf[--pos]) {case n : if (beginLinePtr == NULL) beginLinePtr = &(buf[pos]) + 1; ++nline; break;case # : len = 0; n = ReadLineDirective(pos, -1, name, len); if(n >= 0){ // unless #pragma if(line_number < 0) { line_number = n + nline; } if(len > 0 && filename_length == 0){ filename = (char*)Read(name); filename_length = len; } } if(line_number >= 0 && filename_length > 0) { oldLineNumberPos = pos; oldBeginLinePtr = beginLinePtr; oldLineNumber = line_number; oldFileName = std::string(filename, filename_length); return line_number; }
  • break; } } if(filename_length == 0){ filename = defaultname.c_str(); filename_length = defaultname.length(); oldFileName = std::string(filename, filename_length); } if (line_number < 0) { line_number = nline + 1; if (beginLinePtr == NULL) beginLinePtr = buf; oldBeginLinePtr = beginLinePtr; oldLineNumber = line_number; oldLineNumberPos = startPos; } return line_number;}16. Correction of the error occurring while analyzing "#line" directiveIn some cases Program::ReadLineDirective() function glitches taking irrelevant text for "#line" directive.The corrected variant of the function looks as follows:ptrdiff_t Program::ReadLineDirective(size_t i, ptrdiff_t line_number, size_t& filename, size_t& filename_length) const{ char c; do{ c = Ref(++i); } while(is_blank(c));
  • #if defined(_MSC_VER) || defined(IRIX_CC) if(i + 5 <= GetSize() && strncmp(Read(i), "line ", 5) == 0) { i += 4; do{ c = Ref(++i); }while(is_blank(c)); } else { return -1; }#endif if(is_digit(c)){ /* # <line> <file> */ unsigned num = c - 0; for(;;){ c = Ref(++i); if(is_digit(c)) num = num * 10 + c - 0; else break; } /* line_numberll be incremented soon */ line_number = num - 1; if(is_blank(c)){ do{ c = Ref(++i); }while(is_blank(c)); if(c == "){ size_t fname_start = i; do{ c = Ref(++i);
  • } while(c != "); if(i > fname_start + 2){ filename = fname_start; filename_length = i - fname_start + 1; } } } } return line_number;}ConclusionOf course, this article covers only a small part of possible improvements. But we hope that they will beuseful for developers while using OpenC++ library and will become examples of how one can specializethe library for ones own tasks.Wed like to remind you once more that the improvements shown in this article and many othercorrections can be found in VivaCore librarys code. VivaCore library may be more convenient for manytasks than OpenC++.If you have questions or would like to add or comment on something, our Viva64.com [10] team isalways glad to communicate. We are ready to discuss appearing questions, give recommendations andhelp you to use OpenC++ library or VivaCore library. Write us!References 1. Zuev E.A. The rare occupation. PC Magazine/Russian Edition. N 5(75), 1997. http://www.viva64.com/go.php?url=43. 2. Margaret A. Ellis, Bjarne Stroustrup. The Annotated C++ Reference Manual. Addison Wesley, 1990. 3. OpenC++ library. http://www.viva64.com/go.php?url=16. 4. Andrey Karpov, Evgeniy Ryzhkov. The essence of the code analysis library VivaCore. http://www.viva64.com/art-2-2-449187005.html 5. Semantic Designs site. http://www.viva64.com/go.php?url=19. 6. Interstron Company. http://www.viva64.com/go.php?url=42. 7. What is OpenTS? http://www.viva64.com/go.php?url=17. 8. Evgeniy Ryzhkov. Viva64: what is it and for whom is it meant? 9. http://www.viva64.com/art-1-2-903037923.html 10. Synopsis: A Source-code Introspection Tool. http://www.viva64.com/go.php?url=18. 11. OOO "Program Verification Systems" site. http://www.viva64.com.