SlideShare a Scribd company logo
1 of 3
Download to read offline
An unusual bug in Lucene.Net
Author: Ilya Ivanov
Date: 14.03.2016
Listening to stories about static analysis, some programmers say that they don't really need it, as their
code is entirely covered by unit tests, and that's enough to catch all the bugs. Recently I have found a
bug that is theoretically possible to find using unit tests, but if you are not aware that it's there, it's
almost unreal to write such a test to check it.
Introduction
Lucene.Net is a port of the Lucene search engine library, written in C#, and targeted at .NET runtime
users. The source code is open and available on the project website https://lucenenet.apache.org/.
The analyzer managed to detect only 5 suspicious fragments due to the slow pace of development, small
size and the fact that the project is widely used in other projects for full-text search [1].
To be honest, I didn't expect to find more bugs. One of these errors seemed especially interesting to me,
so I decided to tell our readers about it in our blog.
About the bug found
We have a diagnostic, V3035, about an error when instead of += a programmer may mistakenly write
=+, where + is a unary plus. When I was writing it by analogy with the V588 diagnostic, designed for C++,
I was thinking - can a programmer really make the same error, coding in C#? It could be understandable
in C++ - people use various text editors instead of IDE, and a typo can be easily left unnoticed. But typing
text in Visual Studio, which automatically aligns the code once a semicolon is put, is it possible to
overlook the misprint? It turns out that it is. Such a bug was found in Lucene.Net. It is of great interest to
us, mostly because it's rather hard to detect it using means other than static analysis. Let's take a look at
the code:
protected virtual void Substitute( StringBuilder buffer )
{
substCount = 0;
for ( int c = 0; c < buffer.Length; c++ )
{
....
// Take care that at least one character
// is left left side from the current one
if ( c < buffer.Length - 1 )
{
// Masking several common character combinations
// with an token
if ( ( c < buffer.Length - 2 ) && buffer[c] == 's' &&
buffer[c + 1] == 'c' && buffer[c + 2] == 'h' )
{
buffer[c] = '$';
buffer.Remove(c + 1, 2);
substCount =+ 2;
}
....
else if ( buffer[c] == 's' && buffer[c + 1] == 't' )
{
buffer[c] = '!';
buffer.Remove(c + 1, 1);
substCount++;
}
....
}
}
}
There is also a class GermanStemmer, which cuts off suffixes of german words to mark out a common
root. It works in the following way: first, the Substitute method replaces different combinations of
letters with other symbols, so that they are not confused with a suffix. There are such substitutions as -
'sch' to '$', 'st' to '!' (you can see it in the code example). At the same time the number of characters by
which such changes will shorten the word, is stored in the substCount variable. Further on, the Strip
method cuts off extra suffixes and finally, the Resubstitute method does the reverse substitution: '$' to
'sch', '!' to 'st'. For instance, if we have a word "kapitalistischen" (capitalistic), the stemmer will do the
following: kapitalistischen => kapitali!i$en (Substitute) => kapitali!i$ (Strip) => kapitalistisch
(Resubstitute).
Because of this typo, during the substitution of 'sch' with '$', the substCount variable will be assigned
with 2, instead of adding 2 to substCount. This error is really hard to find using methods other than
static analysis. That's the answer to those who think "Do I need static analysis, if I have unit-tests?"
Thus, to catch such a bug with the help of unit tests one should test Lucene.Net on German texts, using
GermanStemmer; the tests should index a word containing the 'sch' combination, and one more letter
combination, for which the substitution will be performed. At the same time it should be present in the
word before 'sch', so that the substCount will be not zero by the time the expression substCount =+ 2 is
executed. Quite an unusual combination for a test, especially if you don't see the bug.
Conclusion:
Unit tests and static analysis need not exclude, but rather complement, each other as methods of
software development [2]. I suggest downloading PVS-Studio static analyzer, and finding those bugs that
weren't detected by means of unit-testing.
Additional links
1. Andrey Karpov. Reasons why the error density is low in small programs
2. Andrey Karpov. How to complement TDD with static analysis

More Related Content

Similar to An unusual bug in Lucene.Net

Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!PVS-Studio
 
Re-analysis of Umbraco code
Re-analysis of Umbraco codeRe-analysis of Umbraco code
Re-analysis of Umbraco codePVS-Studio
 
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-Studio
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-StudioAnalysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-Studio
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-StudioPVS-Studio
 
Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindAccord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindPVS-Studio
 
Date Processing Attracts Bugs or 77 Defects in Qt 6
Date Processing Attracts Bugs or 77 Defects in Qt 6Date Processing Attracts Bugs or 77 Defects in Qt 6
Date Processing Attracts Bugs or 77 Defects in Qt 6Andrey Karpov
 
Static and Dynamic Code Analysis
Static and Dynamic Code AnalysisStatic and Dynamic Code Analysis
Static and Dynamic Code AnalysisAndrey Karpov
 
Heading for a Record: Chromium, the 5th Check
Heading for a Record: Chromium, the 5th CheckHeading for a Record: Chromium, the 5th Check
Heading for a Record: Chromium, the 5th CheckPVS-Studio
 
Analysis of Godot Engine's Source Code
Analysis of Godot Engine's Source CodeAnalysis of Godot Engine's Source Code
Analysis of Godot Engine's Source CodePVS-Studio
 
PVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to BoostPVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to BoostAndrey Karpov
 
CppCat Static Analyzer Review
CppCat Static Analyzer ReviewCppCat Static Analyzer Review
CppCat Static Analyzer ReviewAndrey Karpov
 
PVS-Studio vs Clang
PVS-Studio vs ClangPVS-Studio vs Clang
PVS-Studio vs ClangPVS-Studio
 
Static code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xStatic code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xPVS-Studio
 
Static code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xStatic code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xAndrey Karpov
 
How to Improve Visual C++ 2017 Libraries Using PVS-Studio
How to Improve Visual C++ 2017 Libraries Using PVS-StudioHow to Improve Visual C++ 2017 Libraries Using PVS-Studio
How to Improve Visual C++ 2017 Libraries Using PVS-StudioPVS-Studio
 
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source CodeA Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source CodePVS-Studio
 
Why Windows 8 drivers are buggy
Why Windows 8 drivers are buggyWhy Windows 8 drivers are buggy
Why Windows 8 drivers are buggyPVS-Studio
 
Errors that static code analysis does not find because it is not used
Errors that static code analysis does not find because it is not usedErrors that static code analysis does not find because it is not used
Errors that static code analysis does not find because it is not usedAndrey Karpov
 
Finding bugs in the code of LLVM project with the help of PVS-Studio
Finding bugs in the code of LLVM project with the help of PVS-StudioFinding bugs in the code of LLVM project with the help of PVS-Studio
Finding bugs in the code of LLVM project with the help of PVS-StudioPVS-Studio
 
A Boring Article About a Check of the OpenSSL Project
A Boring Article About a Check of the OpenSSL ProjectA Boring Article About a Check of the OpenSSL Project
A Boring Article About a Check of the OpenSSL ProjectAndrey Karpov
 

Similar to An unusual bug in Lucene.Net (20)

Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!
 
Re-analysis of Umbraco code
Re-analysis of Umbraco codeRe-analysis of Umbraco code
Re-analysis of Umbraco code
 
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-Studio
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-StudioAnalysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-Studio
Analysis of PascalABC.NET using SonarQube plugins: SonarC# and PVS-Studio
 
Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindAccord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
Accord.Net: Looking for a Bug that Could Help Machines Conquer Humankind
 
Date Processing Attracts Bugs or 77 Defects in Qt 6
Date Processing Attracts Bugs or 77 Defects in Qt 6Date Processing Attracts Bugs or 77 Defects in Qt 6
Date Processing Attracts Bugs or 77 Defects in Qt 6
 
Static and Dynamic Code Analysis
Static and Dynamic Code AnalysisStatic and Dynamic Code Analysis
Static and Dynamic Code Analysis
 
Grounded Pointers
Grounded PointersGrounded Pointers
Grounded Pointers
 
Heading for a Record: Chromium, the 5th Check
Heading for a Record: Chromium, the 5th CheckHeading for a Record: Chromium, the 5th Check
Heading for a Record: Chromium, the 5th Check
 
Analysis of Godot Engine's Source Code
Analysis of Godot Engine's Source CodeAnalysis of Godot Engine's Source Code
Analysis of Godot Engine's Source Code
 
PVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to BoostPVS-Studio Has Finally Got to Boost
PVS-Studio Has Finally Got to Boost
 
CppCat Static Analyzer Review
CppCat Static Analyzer ReviewCppCat Static Analyzer Review
CppCat Static Analyzer Review
 
PVS-Studio vs Clang
PVS-Studio vs ClangPVS-Studio vs Clang
PVS-Studio vs Clang
 
Static code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xStatic code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0x
 
Static code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0xStatic code analysis and the new language standard C++0x
Static code analysis and the new language standard C++0x
 
How to Improve Visual C++ 2017 Libraries Using PVS-Studio
How to Improve Visual C++ 2017 Libraries Using PVS-StudioHow to Improve Visual C++ 2017 Libraries Using PVS-Studio
How to Improve Visual C++ 2017 Libraries Using PVS-Studio
 
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source CodeA Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
A Unicorn Seeking Extraterrestrial Life: Analyzing SETI@home's Source Code
 
Why Windows 8 drivers are buggy
Why Windows 8 drivers are buggyWhy Windows 8 drivers are buggy
Why Windows 8 drivers are buggy
 
Errors that static code analysis does not find because it is not used
Errors that static code analysis does not find because it is not usedErrors that static code analysis does not find because it is not used
Errors that static code analysis does not find because it is not used
 
Finding bugs in the code of LLVM project with the help of PVS-Studio
Finding bugs in the code of LLVM project with the help of PVS-StudioFinding bugs in the code of LLVM project with the help of PVS-Studio
Finding bugs in the code of LLVM project with the help of PVS-Studio
 
A Boring Article About a Check of the OpenSSL Project
A Boring Article About a Check of the OpenSSL ProjectA Boring Article About a Check of the OpenSSL Project
A Boring Article About a Check of the OpenSSL Project
 

Recently uploaded

Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 

Recently uploaded (20)

Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 

An unusual bug in Lucene.Net

  • 1. An unusual bug in Lucene.Net Author: Ilya Ivanov Date: 14.03.2016 Listening to stories about static analysis, some programmers say that they don't really need it, as their code is entirely covered by unit tests, and that's enough to catch all the bugs. Recently I have found a bug that is theoretically possible to find using unit tests, but if you are not aware that it's there, it's almost unreal to write such a test to check it. Introduction Lucene.Net is a port of the Lucene search engine library, written in C#, and targeted at .NET runtime users. The source code is open and available on the project website https://lucenenet.apache.org/. The analyzer managed to detect only 5 suspicious fragments due to the slow pace of development, small size and the fact that the project is widely used in other projects for full-text search [1]. To be honest, I didn't expect to find more bugs. One of these errors seemed especially interesting to me, so I decided to tell our readers about it in our blog. About the bug found We have a diagnostic, V3035, about an error when instead of += a programmer may mistakenly write =+, where + is a unary plus. When I was writing it by analogy with the V588 diagnostic, designed for C++, I was thinking - can a programmer really make the same error, coding in C#? It could be understandable in C++ - people use various text editors instead of IDE, and a typo can be easily left unnoticed. But typing text in Visual Studio, which automatically aligns the code once a semicolon is put, is it possible to overlook the misprint? It turns out that it is. Such a bug was found in Lucene.Net. It is of great interest to
  • 2. us, mostly because it's rather hard to detect it using means other than static analysis. Let's take a look at the code: protected virtual void Substitute( StringBuilder buffer ) { substCount = 0; for ( int c = 0; c < buffer.Length; c++ ) { .... // Take care that at least one character // is left left side from the current one if ( c < buffer.Length - 1 ) { // Masking several common character combinations // with an token if ( ( c < buffer.Length - 2 ) && buffer[c] == 's' && buffer[c + 1] == 'c' && buffer[c + 2] == 'h' ) { buffer[c] = '$'; buffer.Remove(c + 1, 2); substCount =+ 2; } .... else if ( buffer[c] == 's' && buffer[c + 1] == 't' ) { buffer[c] = '!'; buffer.Remove(c + 1, 1); substCount++; } .... } } }
  • 3. There is also a class GermanStemmer, which cuts off suffixes of german words to mark out a common root. It works in the following way: first, the Substitute method replaces different combinations of letters with other symbols, so that they are not confused with a suffix. There are such substitutions as - 'sch' to '$', 'st' to '!' (you can see it in the code example). At the same time the number of characters by which such changes will shorten the word, is stored in the substCount variable. Further on, the Strip method cuts off extra suffixes and finally, the Resubstitute method does the reverse substitution: '$' to 'sch', '!' to 'st'. For instance, if we have a word "kapitalistischen" (capitalistic), the stemmer will do the following: kapitalistischen => kapitali!i$en (Substitute) => kapitali!i$ (Strip) => kapitalistisch (Resubstitute). Because of this typo, during the substitution of 'sch' with '$', the substCount variable will be assigned with 2, instead of adding 2 to substCount. This error is really hard to find using methods other than static analysis. That's the answer to those who think "Do I need static analysis, if I have unit-tests?" Thus, to catch such a bug with the help of unit tests one should test Lucene.Net on German texts, using GermanStemmer; the tests should index a word containing the 'sch' combination, and one more letter combination, for which the substitution will be performed. At the same time it should be present in the word before 'sch', so that the substCount will be not zero by the time the expression substCount =+ 2 is executed. Quite an unusual combination for a test, especially if you don't see the bug. Conclusion: Unit tests and static analysis need not exclude, but rather complement, each other as methods of software development [2]. I suggest downloading PVS-Studio static analyzer, and finding those bugs that weren't detected by means of unit-testing. Additional links 1. Andrey Karpov. Reasons why the error density is low in small programs 2. Andrey Karpov. How to complement TDD with static analysis