SlideShare a Scribd company logo
1 of 6
Download to read offline
A Case Study of a Reusable Component Collection
William B. Frakes
Computer Science Department
Virginia Tech, Falls Church
wfrakes@vt.edu
Abstract
This paper reports on practical issues in the
development, distribution, use, and evolution of a
reusable component collection in the domain of
information retrieval.
1. Introduction
Software reuse is the use of existing software
knowledge or artifacts to build new software. There are
many types of software reuse [9]. The reuse described in
this paper is ad-hoc, black box, compositional, code
reuse. Ad hoc means that the reuse is not part of a
repeatable mandated organizational process. Ad hoc reuse
is by far more common than systematic reuse, though the
latter is thought to be more powerful. Black box reuse is
reuse of a software item without modification.
Compositional reuse means that the software system was
built by a human programmer out of components, as
opposed to generating a system automatically from
specifications. The reuse described in this paper is
primarily vertical rather than horizontal since it is focused
in the domain of information retrieval, though some of
the components such as string searching might also be
considered horizontal.
One source of reusable software is the code that is
developed to accompany books. This paper concerns code
from a book on data structures and algorithms for
information retrieval (IR) systems [6]. Information
retrieval systems retrieve textual documents from
a database in response to queries submitted to
the system by users.
IR systems can be defined more formally using set and
function notation as follows.
D = set of textual documents
D’=subset of D
Q=set of queries
M=matching function
Systems in the domain of information retrieval can
now be specified as follows.
S : S computes D’=M(D,Q)
That is, all systems S such that S returns a subset of
documents D’ of D that match the set of queries Q are IR
systems.
One of the goals for the book was development of
reusable IR code. Authors were asked to develop software
components for their chapters in C following industrial
coding guidelines. This was partly successful, and with
some rework, the following components were developed
and tested:
• Lexical Analysis and Stop List operations - this code
breaks text into words and removes words considered
unimportant for indexing.
• Stemmer Code - implements the Porter stemming
algorithm. A stemmer conflates words by finding a
common root form of the words.
• Thesaurus Construction - supports the automatic
construction of thesauri from source text.
• Boolean Operations - implements standard Boolean
operations (AND, OR, NOT) on sets of documents.
• Hashing Algorithms - including an algorithm for
minimal perfect hashing.
• String searching - implementations for basic
algorithms for finding patterns in text strings.
This paper is about practical issues encountered in the
creation, distribution, and use of the components. These
issues are not particular to the domain of information
retrieval, nor particular to C functions. They may well
arise in any domain and for any type of reusable asset.
0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
2. What is a component?
What is a component? The term is ambiguous. A
component can be any lifecycle object or part thereof.
Usually a code component is a subroutine (function or
subprogram), or an object or class, but it could also be
many other things like macros, header files, subsystems,
processes, or patterns. This paper discusses collections of
C functions. This simplifies things a little since this is a
kind of reuse familiar to many. Even this kind of reuse,
however, can still be complicated.
The 3 C’s model of reuse design [12] says that there
are three aspects of a reusable component--the concept, the
content, and the context. The concept corresponds to the
abstract functionality of a component such as might be
specified in an abstract data type or a formal algorithm
specification. The purpose of such abstractions is to focus
on the essence of the component, whatever that might be,
and ignore other details, usually implementation details.
The chapters in the book provide the specifications for the
concepts of the components.
The content is the implementation of the component.
This involves selection of a programming language and a
design. The implementations of the components in C are
the content. The transition from concept to content
involves moving from the problem, or domain, space to
the solution space. The problem space is only concerned
with the concepts and operations of the domain in
question--in this case information retrieval. The solution
space involves the concepts and operations of the
implementation environment--in this case the C language.
The context is the environment needed to use the
components. Context for code components might be the
required machine, operating systems, compiler version,
and so on. The code for the IR components was
developed for and tested on a Unix system and certain
assumptions were made regarding implementation.
Porting the code to DOS, for example, required changes
to make filenames have the required length of no more
than eight characters.
3. Language
Software reuse is now generally regarded to be a good
thing, and most modern languages make some claim for
their support of reuse. The C language, for example, was
designed for extensive reuse in the sense that it is a small
language extensively augmented by reusable function
libraries. Newer languages like C++ provide reuse of
higher level programming constructs such as objects,
classes, and templates and directly supports type
polymorphism via function overloading. A summary of
the reuse aspects of C++, for example, can be found in
[14].
I selected C as the component implementation
language because C was and is a widely known and used
language in both industry and academia. It is also the
programming language I know best, and the one I’ve used
to develop industrial software. There are also many good
free software engineering support tools for C, including
free compilers. Was C the best choice? This of course
opens the door to language lawyering. Let me just say
that the components got developed.
Some of the components have been rewritten in other
languages, sometimes with attribution of the source,
sometimes not. Versions of the stemmer, for example,
have appeared on the web in Perl and Java.
4. Source or Binary?
The argument is sometimes made that only the
executable code for reusable components should be
distributed, not the source code. The reasoning here is
that distributing source code means that it will be
modified which will break the design abstraction, thus
losing much of the reuse benefit. Executable distribution
could be done in C by making and distributing archive
files containing object code for the functions. This
assumes that all of the users will have an environment
where the archives can be used.
Distributing only executable code may be a good idea
if the user of the components can be assured that someone
is available to fix problems and make enhancements as
needed. With software such as the IR components there
was no readily available maintenance organization, so we
distributed the source.
5. Testing and Optimization
The quality assurance of software is important to its
reuse. Code that does not meet the software quality
standards of an organization will not be reused by the
organization. Inside an organization, thorough testing and
optimization of components can be justified since the
higher costs for these activities can be amortized across
the multiple reuses.
Before release, the code was inspected for conformance
to programming standards such as the use of standard
headers on code modules and so on, and run through lint,
and coverage analyzed to 90% branch coverage. Code
portability was checked by moving the code to another
environment.
0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
A rule of thumb sometimes used by designers of
reusable components is that if the reusable components is
more than 25% slower than an equivalent one use
component, it will not be reused. Optimization of code
components can, therefore, be important. Optimizations
must be done carefully, since increased optimization often
decreases code readability and maintainability. Bentley
provides a good summary of proper techniques[3]. For the
IR components, however, no systematic optimization was
done, nor have there been any requests for it from users.
6. Delivery Methods
A key question with a component collection is how to
make the components available? The first plan for the IR
components was for a disk to be included with the book,
but for various logistical reasons that didn’t work. So,
plan two was to make the code available via ftp. I put the
code for each chapter in a separate directory at a Virginia
Tech ftp site (ftp.vt.edu). I originally stored the code for
each sub-collection in a separate directory. I started
getting requests to put the code in a single file to make
downloading easier. I did that by creating a compressed
tar file and putting that on the web site. Then I started
getting email from people outside the U.S. saying that
they couldn’t get into the ftp site. I referred them to the
ftp site technician. I think that people usually got the
code they wanted, but the problem persisted. I decided to
put the components into software repositories as well.
In the 1990’s the U.S. government started supporting
research and development of reuse repositories. Two such
were Asset and Mountainnet. I submitted the IR
components to both libraries. Submission of components
to the library required that I fill out a template describing
the components. The components were available in these
repositories for several years. Government funding for the
repositories was stopped in 1998, and the repositories are
now no longer available.
In 1994, Prentice-Hall licensed the book to Dr. Dobbs
who created a CD-Rom containing the IR book, and
several other algorithms books [5]. The text of the book,
code included, was put into a hypertext format and a
search engine was included.
Many other web sites now either reference the IR code
ftp site, or keep a copy of the code. There is, however, no
mechanism for keeping consistency among web sites
offering the code. This is a version control problem (see
maintenance section below).
GNU (Gnu’s not Unix) is a collection of software
managed by the Free Software Foundation[10]. While
examining the holdings of the GNU library, I saw they
had nothing on IR. I contacted the Free Software
Foundation offering the IR components. After several
email exchanges, the following facts emerged.
1. GNU would like to have the code.
2. Some rework of the code would be necessary to put
the code into the GNU format.
3. Having the code in GNU would require a
commitment to long term maintenance (see
discussion of maintenance below).
4. Putting the code in GNU would require that the code
meet the GNU standards for free software. This
requires, among other things, that the code in the
GNU library make no reference to the book, and that
the code be freely available for modification by any
user. This raised many copyright and other legal
issues that have not yet been resolved.
7. Legal Ownership
Legal ownership of components is concerned with
three types of legal claims: copyright, patents, and trade
secrets[11]. A copyright protects the expression of an
idea. Copyright has traditionally been used to protect
books and other print material, and music. Current
copyright law allows copying of software for backup and
archival purposes. Copyright protection is relatively
inexpensive and easy to obtain. Copyright claims need
not be formally filed, though failure to do so may limit
legal claims.
There has been some work on assuring versions of
software using encryption methods [13]. In this scheme,
each component would be assigned a unique identifier.
Once published, the component could not be changed
even by the author without changing the identifier. This
method might also be used to protecting copyrighted
software components. Collberg and Thomborsen describe
a method called watermarking for embedding a secret tag
in a component that can be used to uniquely identify the
component, and therefore to tell if it has been stolen[4].
A patent protects an idea, rather than the expression of
the idea. Current patent law restricts others from using the
patented idea for seventeen years after the patent is
granted. Software, algorithms, and processes are typically
patented rather than copyrighted. Obtaining a patent is a
long expensive process, involving an extensive search to
determine if the patent is original. Patents are granted by
government agencies such as the U.S. patent office. Over
20,000 software patents were issued from 1994-96 [1].
Twenty-nine of ninety two respondents to a survey on
software reuse agreed at least somewhat with that they
were inhibited from reusing software by legal issues [7].
Legal issues, unfortunately, are likely to grow in
importance as reuse crosses organizational boundaries and
0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
moves into the open marketplace. Our experience with
GNU and with the user who wanted a legal document
giving him the right to use the IR components (see usage
section below) reinforces this point.
8. Maintenance and Configuration
Management
Perhaps the most difficult problems about the
components concern maintenance and configuration
management. Maintenance is expensive. Maintenance
costs can easily exceed half of the total costs for a
software project, and numbers for reusable component
collections are probably similar. Code contributors
usually do not want to be responsible for maintenance, so
component collections like the IR components usually do
not have adequate maintenance support. In this section,
the main issues of maintenance are briefly reviewed.
Software configuration management is about how to
monitor and control changes to software, in this case
reusable software assets. Versions of reusable assets must
also be coordinated with other software lifecycle items to
produce correct and consistent product releases.
Configuration management has three major activities:
• Version control. Reusable software components, like
any software product, will have versions because of error
fixes and enhancements. To build a system using these
assets, one will need to know which version to use. Old
versions of assets must be recoverable for reference, and
so they can be used to make corrections and
enhancements. As software assets change, they form
successive versions. Version control is the activity of
keeping track of these versions. To handle this problem,
the IR components were put under change control using
SCCS (source code control system). Since the code
appears in various places—ftp site, cd-rom, various other
web sites, keeping these versions current and coordinated
is a very hard problem. One solution to knowing for sure
which version of a component you have is to use
encryption techniques on the component [13].
• Change control. Change control is the procedure for
requesting changes, deciding what changes to make,
making changes, and recording and verifying changes.
Changes to reusable assets in a library should not be
made haphazardly, but must be made under a controlled
process, though this is often not the case. Change
requests for the IR code generally comes via email. I put
reports of known bugs at the ftp website, but reports of
the same bug keep coming in, in part because the code
appears in so many places
• Build control. Keeping track of which versions of
work products go together to form a release, and
generating derived assets and systems correctly, is called
build control. Build control for reuse has two aspects.
One is the general specification of which versions of
assets to use in a system build. The other aspect is that
reusable assets may themselves be composites of other
items, so specifications of how to build assets may also
be required. Build control for the IR components is
handled with Make.
9. Searching and Understanding
Much early work on reuse focused on the building of
reuse libraries and methods for indexing components and
searching for them. Many researchers began to feel that
this aspect of reuse was sufficiently understood, and that
too much attention was given to it. The focus of reuse
research moved to design of reusable components, domain
analysis, and so on.
The internet is probably the main source now
consulted by software engineers looking for reusable
software outside their own development environment. The
main types of indexing used on the web are free text
keyword searching, and to a lesser degree enumerated
classifications. Searching on the web is made difficult by
the size and dynamics of the database, and by the fact that
different search engines will find different web pages
given the same query.
In teaching reuse courses to graduate students at
Virginia Tech, I found that they had difficulty finding
existing components on the web. For example, in one
course students needed to find stemmers on the web. I
had searched myself and knew that several different ones
could be found. Typical of their input was the following
email I received from the student who eventually received
the highest grade in the class.
"I'm still a little confused about what we should produce
for the code analysis part of the project. I know we will
try to come up with a generic architecture by looking for
similarities in the code. I think this will be hard,
considering the fact that I have only found code for one
algorithm (Porter). Are we supposed to compare different
implementations of the same algorithm?"
I found in working with the students that they did not
know how to formulate good search queries.
Another problem is helping users understand reusable
software components. This is important because if
software engineers cannot understand components, they
will not be able to reuse them. Current methods for
representing reusable components are inadequate. A study
0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
of four common representation methods for reusable
software components showed that none of the methods
worked very well for helping users understand the
components [8].
We are currently doing research on visualization
techniques, such as hypertrees, hierarchical trees, and
tables, for helping users understand reusable software
components [2]. We are using the IR components as a
testbed for this research. Our visualizations are grounded
in reuse design principles, such as the 3 C's model, and
in general principles of information design such as those
of Tufte. We use an extension of XML as a modeling
language for the components.
10. Usage
Because of the different venues used to distribute the
IR components, usage data and user feedback comes from
various sources. One source is email from users typically
asking where they can find copies of the code, reporting a
bug in the code, or occasionally asking if the code can be
included in a commercial application such as the
following one received recently.
What is the status of your stemming code
(implementation of the Porter algorithm) located in
ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/irbook/stem
mer/,
is it public domain or copyrighted? The reason that
I ask is I want to know if it is okay to use it in a
search engine I am creating for my commercial
website.
I typically pass these requests on to the editor of the
book at Pretntice-Hall who approves them and asks that
the source of the component be referenced in the code and
documentation of the system in which it will be used.
This time I got a followup message,
"The email address I used was from the read.me file
that came with the stemming code - the address is
frakes@sarvis.cs.vt.edu. My lawyer wants me to have
sign something confirming the info below - is your
address at Virginia Polytechnic Institute and
State University still valid?"
This message point to two problems--how to keep
information associated with the code, in this case my
email address up to date, and how to handle legal
problems. I sent the message onto the editor at Prentice-
Hall.
The proliferation of the code on various websites is
also an indicator of usage, as is references to the code in
various web pages. Some of the web pages are papers that
reference the book or code from the book, some are
syllabi for courses, others contain variants of some of the
components written in different languages. Another source
of feedback from users can be found in reviews of the
book at websites like amazon.com.
11. Current Status and A Proposal
I am currently working with available personnel (i.e. a
graduate student) to address some of the problems
identified above. Specifically the student is doing a
semester project to:
• place the code, which now has two versions, under
change and version control using RCS.
• place the code on at least two ftp servers
• convert the code to the GNU coding and “free
software” standards.
• create a web page for the code that provides
information and pointers to the distribution sites.
• Create Documentation that will allow continuity
in the maintenance of the software.
Experience with the IR code collection shows that
current methods of development, maintenance, and
distribution work, but need improvement. Some
recommendations follow.
There is much inefficiency in the development of
components that accompany texts. For example, there are
many books that provide code that implements the basic
data structures and algorithms of computer science such as
sorting, searching, lists, stacks, queues and so on. A
standard way of cataloging these data structures and
algorithms could be quite helpful. For example, each
unique algorithm or data structure specification might be
assigned a product number similar to an ISBN number for
a book. Implementations of these specifications might
also be assigned a number that references the number of
the implemented specification. Such components might
also include information on quality assurance, indexing
terms, repository locations, and so on.
This will only happen if it makes legal and financial
sense, and the legal and financial issues are far from
solved. The case of GNU, for example, shows the
complexity of issues related to “free software”. The recent
trend towards patenting software algorithms also adds to
the difficulty of freely sharing and reusing software.
There is also the continuing question of who will provide
resources for long term maintenance tasks. These
0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
important problems must be solved if we are to make
better use of existing reusable software sources.
References
[1] Aharonian, G., 1995 US Patent Statistics. 1995,
http://www.baker.com/grandunificationtheory/archive
/199601/19960121.html.
[2] Alonso, O., & William B. Frakes (2000). Visualization of
Reusable Software Assets. In W. B. Frakes (Ed.), ICSR6 Sixth
International Conference on Software Reuse, . Vienna,
Austria: Springer-Verlag.
[3] Bentley, J. (1982). Writing Efficient Programs.
Englewood Cliffs, NJ: Prentice-Hall.
[4] Collberg, C., & Thomborsen, C. (1999). Software
watermarking: Models and dynamic embeddings. In
POPL’99, 26th Annual SIGPLAN–SIGACT Symposium on
Principles of Programming Languages, (pp. 311–324).
[5] Dr.Dobbs Essential Books on Algorithms and Data
Structures, 1999
[6] Frakes, W., & Baeza-Yates, R. (Eds.). (1992). Information
Retrieval: Data Structures and Algorithms. Englewood Cliffs,
N.J.: Prentice-Hall.
[7]Frakes, W. B., & Fox., C. J. (1995). Sixteen
Questions about Software Reuse. CACM, 38(6), 75-87.
[8] Frakes, W., & Pole, T. (1994). An Empirical Study of
Representation Methods for Reusable Software Components.
IEEE Transactions on Software Engineering, , V20 n8, pp.
617-630, 1994..
[9] Frakes, W., & Terry, C. (1996). Software Reuse and
Reusability Models and Metrics. ACM Computing Surveys,
28(2), 415-435.
[10] GNU Coding Standards Copyright 1998 Free Software
Foundation, Inc.
[11] Huber, T. Reducing Business and Legal Risks in
Software Reuse Libraries. in ICSR-3. 1994. Rio de Janeiro:
IEEE-CS Press.
[12] Latour, L., Wheeler, T., & Frakes, B. (1991). Descriptive
and Prescriptive Aspects of the 3 C's Model: SETA1 Working
Group Summary. Ada Letters, XI(3), 9-17.
[13] Moore, J. W. (1994). The Use of Encryption to Ensure the
Integrity of Reusable Software Components. Third
International Conference on Software Reuse, (pp. 118-125).
Rio de Janeiro: IEEE CS Press.
[14] Stroustrup, B. (1996). Language-technical Aspects of
Reuse. In Fourth International Conference on Software Reuse,
(pp. 11-19). Orlando, FL: IEEE CS Press.
0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE

More Related Content

Similar to A Case Study Of A Reusable Component Collection

Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersFrequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersnishajj
 
NamingConvention
NamingConventionNamingConvention
NamingConventionJabed Hossain
 
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Dr. Amarjeet Singh
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit TestJill Bell
 
CS8251_QB_answers.pdf
CS8251_QB_answers.pdfCS8251_QB_answers.pdf
CS8251_QB_answers.pdfvino108206
 
C programming interview questions
C programming interview questionsC programming interview questions
C programming interview questionsadarshynl
 
Software Engineering with Objects (M363) Final Revision By Kuwait10
Software Engineering with Objects (M363) Final Revision By Kuwait10Software Engineering with Objects (M363) Final Revision By Kuwait10
Software Engineering with Objects (M363) Final Revision By Kuwait10Kuwait10
 
An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileIDES Editor
 
Software_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptxSoftware_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptxArifaMehreen1
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)IT Industry
 
Introduction to object oriented language
Introduction to object oriented languageIntroduction to object oriented language
Introduction to object oriented languagefarhan amjad
 
Data structures and algorithms 2
Data structures and algorithms 2 Data structures and algorithms 2
Data structures and algorithms 2 Mark John Lado, MIT
 
Technical Interview
Technical InterviewTechnical Interview
Technical Interviewprashant patel
 
Paper review
Paper reviewPaper review
Paper reviewNadia Nahar
 
A Methodology To Manage Victim Components Using Cbo Measure
A Methodology To Manage Victim Components Using Cbo MeasureA Methodology To Manage Victim Components Using Cbo Measure
A Methodology To Manage Victim Components Using Cbo Measureijseajournal
 
Advanced Software Engineering.ppt
Advanced Software Engineering.pptAdvanced Software Engineering.ppt
Advanced Software Engineering.pptRvishnupriya2
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionYasir Raza Khan
 

Similar to A Case Study Of A Reusable Component Collection (20)

Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersFrequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answers
 
NamingConvention
NamingConventionNamingConvention
NamingConvention
 
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...Improved Presentation and Facade Layer Operations for Software Engineering Pr...
Improved Presentation and Facade Layer Operations for Software Engineering Pr...
 
OOPS_Unit_1
OOPS_Unit_1OOPS_Unit_1
OOPS_Unit_1
 
Cs121 Unit Test
Cs121 Unit TestCs121 Unit Test
Cs121 Unit Test
 
CS8251_QB_answers.pdf
CS8251_QB_answers.pdfCS8251_QB_answers.pdf
CS8251_QB_answers.pdf
 
C programming interview questions
C programming interview questionsC programming interview questions
C programming interview questions
 
Software Engineering with Objects (M363) Final Revision By Kuwait10
Software Engineering with Objects (M363) Final Revision By Kuwait10Software Engineering with Objects (M363) Final Revision By Kuwait10
Software Engineering with Objects (M363) Final Revision By Kuwait10
 
An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired File
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Software_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptxSoftware_Engineering_Presentation (1).pptx
Software_Engineering_Presentation (1).pptx
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)Reuse Software Components (IMS 2006)
Reuse Software Components (IMS 2006)
 
Introduction to object oriented language
Introduction to object oriented languageIntroduction to object oriented language
Introduction to object oriented language
 
Data structures and algorithms 2
Data structures and algorithms 2 Data structures and algorithms 2
Data structures and algorithms 2
 
Technical Interview
Technical InterviewTechnical Interview
Technical Interview
 
Paper review
Paper reviewPaper review
Paper review
 
A Methodology To Manage Victim Components Using Cbo Measure
A Methodology To Manage Victim Components Using Cbo MeasureA Methodology To Manage Victim Components Using Cbo Measure
A Methodology To Manage Victim Components Using Cbo Measure
 
Advanced Software Engineering.ppt
Advanced Software Engineering.pptAdvanced Software Engineering.ppt
Advanced Software Engineering.ppt
 
Systematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd editionSystematic software development using vdm by jones 2nd edition
Systematic software development using vdm by jones 2nd edition
 

More from Jennifer Strong

Step By Step How To Write A
Step By Step How To Write AStep By Step How To Write A
Step By Step How To Write AJennifer Strong
 
Scholarship Personal Statement What To Includ
Scholarship Personal Statement What To IncludScholarship Personal Statement What To Includ
Scholarship Personal Statement What To IncludJennifer Strong
 
Essay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples ExamplesEssay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples ExamplesJennifer Strong
 
Someone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College HomeworSomeone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College HomeworJennifer Strong
 
Effective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics FEffective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics FJennifer Strong
 
002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism CJennifer Strong
 
15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your StuJennifer Strong
 
Basildon Bond Watermarked Pe
Basildon Bond Watermarked PeBasildon Bond Watermarked Pe
Basildon Bond Watermarked PeJennifer Strong
 
Admission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An EssayAdmission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An EssayJennifer Strong
 
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst EvolveFluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst EvolveJennifer Strong
 
Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20Jennifer Strong
 
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay TopicsThe Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay TopicsJennifer Strong
 
Writing A Thesis Statement For Resea
Writing A Thesis Statement For ReseaWriting A Thesis Statement For Resea
Writing A Thesis Statement For ReseaJennifer Strong
 
The Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayThe Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayJennifer Strong
 
Pmi Charleston Scholarship Essay
Pmi Charleston Scholarship EssayPmi Charleston Scholarship Essay
Pmi Charleston Scholarship EssayJennifer Strong
 
Printable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.DePrintable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.DeJennifer Strong
 
Descriptive Essay Topics
Descriptive Essay TopicsDescriptive Essay Topics
Descriptive Essay TopicsJennifer Strong
 
Paper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89PenPaper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89PenJennifer Strong
 
Literary Narrative Essay Telegraph
Literary Narrative Essay  TelegraphLiterary Narrative Essay  Telegraph
Literary Narrative Essay TelegraphJennifer Strong
 
Greatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.PageGreatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.PageJennifer Strong
 

More from Jennifer Strong (20)

Step By Step How To Write A
Step By Step How To Write AStep By Step How To Write A
Step By Step How To Write A
 
Scholarship Personal Statement What To Includ
Scholarship Personal Statement What To IncludScholarship Personal Statement What To Includ
Scholarship Personal Statement What To Includ
 
Essay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples ExamplesEssay Purposes, Types And Examples Examples
Essay Purposes, Types And Examples Examples
 
Someone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College HomeworSomeone To Write My Essay For Me - College Homewor
Someone To Write My Essay For Me - College Homewor
 
Effective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics FEffective Persuasive Writing. Persuasive Essay Topics F
Effective Persuasive Writing. Persuasive Essay Topics F
 
002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C002 Essay Example About Plagiarism C
002 Essay Example About Plagiarism C
 
15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu15 Rhetorical Analysis Questions To Ask Your Stu
15 Rhetorical Analysis Questions To Ask Your Stu
 
Basildon Bond Watermarked Pe
Basildon Bond Watermarked PeBasildon Bond Watermarked Pe
Basildon Bond Watermarked Pe
 
Admission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An EssayAdmission Essay How To Write A Good Introductory Paragraph For An Essay
Admission Essay How To Write A Good Introductory Paragraph For An Essay
 
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst EvolveFluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
Fluid Lucky Behalf Ielts Writing Linking Words Business Analyst Evolve
 
Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20Academic Conclusion. Conclusion Paragraphs. 20
Academic Conclusion. Conclusion Paragraphs. 20
 
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay TopicsThe Best Argumentative Essay Topics. 100 Argumentative Essay Topics
The Best Argumentative Essay Topics. 100 Argumentative Essay Topics
 
Writing A Thesis Statement For Resea
Writing A Thesis Statement For ReseaWriting A Thesis Statement For Resea
Writing A Thesis Statement For Resea
 
The Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication EssayThe Reason Seeking Transfer AdmissionApplication Essay
The Reason Seeking Transfer AdmissionApplication Essay
 
Pmi Charleston Scholarship Essay
Pmi Charleston Scholarship EssayPmi Charleston Scholarship Essay
Pmi Charleston Scholarship Essay
 
Printable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.DePrintable Writing Paper (75) By Aimee-Valentine-Art.De
Printable Writing Paper (75) By Aimee-Valentine-Art.De
 
Descriptive Essay Topics
Descriptive Essay TopicsDescriptive Essay Topics
Descriptive Essay Topics
 
Paper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89PenPaper Writers For Hire By Ac89Pen
Paper Writers For Hire By Ac89Pen
 
Literary Narrative Essay Telegraph
Literary Narrative Essay  TelegraphLiterary Narrative Essay  Telegraph
Literary Narrative Essay Telegraph
 
Greatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.PageGreatest Free Essay HttpsFreeessays.Page
Greatest Free Essay HttpsFreeessays.Page
 

Recently uploaded

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Recently uploaded (20)

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

A Case Study Of A Reusable Component Collection

  • 1. A Case Study of a Reusable Component Collection William B. Frakes Computer Science Department Virginia Tech, Falls Church wfrakes@vt.edu Abstract This paper reports on practical issues in the development, distribution, use, and evolution of a reusable component collection in the domain of information retrieval. 1. Introduction Software reuse is the use of existing software knowledge or artifacts to build new software. There are many types of software reuse [9]. The reuse described in this paper is ad-hoc, black box, compositional, code reuse. Ad hoc means that the reuse is not part of a repeatable mandated organizational process. Ad hoc reuse is by far more common than systematic reuse, though the latter is thought to be more powerful. Black box reuse is reuse of a software item without modification. Compositional reuse means that the software system was built by a human programmer out of components, as opposed to generating a system automatically from specifications. The reuse described in this paper is primarily vertical rather than horizontal since it is focused in the domain of information retrieval, though some of the components such as string searching might also be considered horizontal. One source of reusable software is the code that is developed to accompany books. This paper concerns code from a book on data structures and algorithms for information retrieval (IR) systems [6]. Information retrieval systems retrieve textual documents from a database in response to queries submitted to the system by users. IR systems can be defined more formally using set and function notation as follows. D = set of textual documents D’=subset of D Q=set of queries M=matching function Systems in the domain of information retrieval can now be specified as follows. S : S computes D’=M(D,Q) That is, all systems S such that S returns a subset of documents D’ of D that match the set of queries Q are IR systems. One of the goals for the book was development of reusable IR code. Authors were asked to develop software components for their chapters in C following industrial coding guidelines. This was partly successful, and with some rework, the following components were developed and tested: • Lexical Analysis and Stop List operations - this code breaks text into words and removes words considered unimportant for indexing. • Stemmer Code - implements the Porter stemming algorithm. A stemmer conflates words by finding a common root form of the words. • Thesaurus Construction - supports the automatic construction of thesauri from source text. • Boolean Operations - implements standard Boolean operations (AND, OR, NOT) on sets of documents. • Hashing Algorithms - including an algorithm for minimal perfect hashing. • String searching - implementations for basic algorithms for finding patterns in text strings. This paper is about practical issues encountered in the creation, distribution, and use of the components. These issues are not particular to the domain of information retrieval, nor particular to C functions. They may well arise in any domain and for any type of reusable asset. 0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
  • 2. 2. What is a component? What is a component? The term is ambiguous. A component can be any lifecycle object or part thereof. Usually a code component is a subroutine (function or subprogram), or an object or class, but it could also be many other things like macros, header files, subsystems, processes, or patterns. This paper discusses collections of C functions. This simplifies things a little since this is a kind of reuse familiar to many. Even this kind of reuse, however, can still be complicated. The 3 C’s model of reuse design [12] says that there are three aspects of a reusable component--the concept, the content, and the context. The concept corresponds to the abstract functionality of a component such as might be specified in an abstract data type or a formal algorithm specification. The purpose of such abstractions is to focus on the essence of the component, whatever that might be, and ignore other details, usually implementation details. The chapters in the book provide the specifications for the concepts of the components. The content is the implementation of the component. This involves selection of a programming language and a design. The implementations of the components in C are the content. The transition from concept to content involves moving from the problem, or domain, space to the solution space. The problem space is only concerned with the concepts and operations of the domain in question--in this case information retrieval. The solution space involves the concepts and operations of the implementation environment--in this case the C language. The context is the environment needed to use the components. Context for code components might be the required machine, operating systems, compiler version, and so on. The code for the IR components was developed for and tested on a Unix system and certain assumptions were made regarding implementation. Porting the code to DOS, for example, required changes to make filenames have the required length of no more than eight characters. 3. Language Software reuse is now generally regarded to be a good thing, and most modern languages make some claim for their support of reuse. The C language, for example, was designed for extensive reuse in the sense that it is a small language extensively augmented by reusable function libraries. Newer languages like C++ provide reuse of higher level programming constructs such as objects, classes, and templates and directly supports type polymorphism via function overloading. A summary of the reuse aspects of C++, for example, can be found in [14]. I selected C as the component implementation language because C was and is a widely known and used language in both industry and academia. It is also the programming language I know best, and the one I’ve used to develop industrial software. There are also many good free software engineering support tools for C, including free compilers. Was C the best choice? This of course opens the door to language lawyering. Let me just say that the components got developed. Some of the components have been rewritten in other languages, sometimes with attribution of the source, sometimes not. Versions of the stemmer, for example, have appeared on the web in Perl and Java. 4. Source or Binary? The argument is sometimes made that only the executable code for reusable components should be distributed, not the source code. The reasoning here is that distributing source code means that it will be modified which will break the design abstraction, thus losing much of the reuse benefit. Executable distribution could be done in C by making and distributing archive files containing object code for the functions. This assumes that all of the users will have an environment where the archives can be used. Distributing only executable code may be a good idea if the user of the components can be assured that someone is available to fix problems and make enhancements as needed. With software such as the IR components there was no readily available maintenance organization, so we distributed the source. 5. Testing and Optimization The quality assurance of software is important to its reuse. Code that does not meet the software quality standards of an organization will not be reused by the organization. Inside an organization, thorough testing and optimization of components can be justified since the higher costs for these activities can be amortized across the multiple reuses. Before release, the code was inspected for conformance to programming standards such as the use of standard headers on code modules and so on, and run through lint, and coverage analyzed to 90% branch coverage. Code portability was checked by moving the code to another environment. 0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
  • 3. A rule of thumb sometimes used by designers of reusable components is that if the reusable components is more than 25% slower than an equivalent one use component, it will not be reused. Optimization of code components can, therefore, be important. Optimizations must be done carefully, since increased optimization often decreases code readability and maintainability. Bentley provides a good summary of proper techniques[3]. For the IR components, however, no systematic optimization was done, nor have there been any requests for it from users. 6. Delivery Methods A key question with a component collection is how to make the components available? The first plan for the IR components was for a disk to be included with the book, but for various logistical reasons that didn’t work. So, plan two was to make the code available via ftp. I put the code for each chapter in a separate directory at a Virginia Tech ftp site (ftp.vt.edu). I originally stored the code for each sub-collection in a separate directory. I started getting requests to put the code in a single file to make downloading easier. I did that by creating a compressed tar file and putting that on the web site. Then I started getting email from people outside the U.S. saying that they couldn’t get into the ftp site. I referred them to the ftp site technician. I think that people usually got the code they wanted, but the problem persisted. I decided to put the components into software repositories as well. In the 1990’s the U.S. government started supporting research and development of reuse repositories. Two such were Asset and Mountainnet. I submitted the IR components to both libraries. Submission of components to the library required that I fill out a template describing the components. The components were available in these repositories for several years. Government funding for the repositories was stopped in 1998, and the repositories are now no longer available. In 1994, Prentice-Hall licensed the book to Dr. Dobbs who created a CD-Rom containing the IR book, and several other algorithms books [5]. The text of the book, code included, was put into a hypertext format and a search engine was included. Many other web sites now either reference the IR code ftp site, or keep a copy of the code. There is, however, no mechanism for keeping consistency among web sites offering the code. This is a version control problem (see maintenance section below). GNU (Gnu’s not Unix) is a collection of software managed by the Free Software Foundation[10]. While examining the holdings of the GNU library, I saw they had nothing on IR. I contacted the Free Software Foundation offering the IR components. After several email exchanges, the following facts emerged. 1. GNU would like to have the code. 2. Some rework of the code would be necessary to put the code into the GNU format. 3. Having the code in GNU would require a commitment to long term maintenance (see discussion of maintenance below). 4. Putting the code in GNU would require that the code meet the GNU standards for free software. This requires, among other things, that the code in the GNU library make no reference to the book, and that the code be freely available for modification by any user. This raised many copyright and other legal issues that have not yet been resolved. 7. Legal Ownership Legal ownership of components is concerned with three types of legal claims: copyright, patents, and trade secrets[11]. A copyright protects the expression of an idea. Copyright has traditionally been used to protect books and other print material, and music. Current copyright law allows copying of software for backup and archival purposes. Copyright protection is relatively inexpensive and easy to obtain. Copyright claims need not be formally filed, though failure to do so may limit legal claims. There has been some work on assuring versions of software using encryption methods [13]. In this scheme, each component would be assigned a unique identifier. Once published, the component could not be changed even by the author without changing the identifier. This method might also be used to protecting copyrighted software components. Collberg and Thomborsen describe a method called watermarking for embedding a secret tag in a component that can be used to uniquely identify the component, and therefore to tell if it has been stolen[4]. A patent protects an idea, rather than the expression of the idea. Current patent law restricts others from using the patented idea for seventeen years after the patent is granted. Software, algorithms, and processes are typically patented rather than copyrighted. Obtaining a patent is a long expensive process, involving an extensive search to determine if the patent is original. Patents are granted by government agencies such as the U.S. patent office. Over 20,000 software patents were issued from 1994-96 [1]. Twenty-nine of ninety two respondents to a survey on software reuse agreed at least somewhat with that they were inhibited from reusing software by legal issues [7]. Legal issues, unfortunately, are likely to grow in importance as reuse crosses organizational boundaries and 0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
  • 4. moves into the open marketplace. Our experience with GNU and with the user who wanted a legal document giving him the right to use the IR components (see usage section below) reinforces this point. 8. Maintenance and Configuration Management Perhaps the most difficult problems about the components concern maintenance and configuration management. Maintenance is expensive. Maintenance costs can easily exceed half of the total costs for a software project, and numbers for reusable component collections are probably similar. Code contributors usually do not want to be responsible for maintenance, so component collections like the IR components usually do not have adequate maintenance support. In this section, the main issues of maintenance are briefly reviewed. Software configuration management is about how to monitor and control changes to software, in this case reusable software assets. Versions of reusable assets must also be coordinated with other software lifecycle items to produce correct and consistent product releases. Configuration management has three major activities: • Version control. Reusable software components, like any software product, will have versions because of error fixes and enhancements. To build a system using these assets, one will need to know which version to use. Old versions of assets must be recoverable for reference, and so they can be used to make corrections and enhancements. As software assets change, they form successive versions. Version control is the activity of keeping track of these versions. To handle this problem, the IR components were put under change control using SCCS (source code control system). Since the code appears in various places—ftp site, cd-rom, various other web sites, keeping these versions current and coordinated is a very hard problem. One solution to knowing for sure which version of a component you have is to use encryption techniques on the component [13]. • Change control. Change control is the procedure for requesting changes, deciding what changes to make, making changes, and recording and verifying changes. Changes to reusable assets in a library should not be made haphazardly, but must be made under a controlled process, though this is often not the case. Change requests for the IR code generally comes via email. I put reports of known bugs at the ftp website, but reports of the same bug keep coming in, in part because the code appears in so many places • Build control. Keeping track of which versions of work products go together to form a release, and generating derived assets and systems correctly, is called build control. Build control for reuse has two aspects. One is the general specification of which versions of assets to use in a system build. The other aspect is that reusable assets may themselves be composites of other items, so specifications of how to build assets may also be required. Build control for the IR components is handled with Make. 9. Searching and Understanding Much early work on reuse focused on the building of reuse libraries and methods for indexing components and searching for them. Many researchers began to feel that this aspect of reuse was sufficiently understood, and that too much attention was given to it. The focus of reuse research moved to design of reusable components, domain analysis, and so on. The internet is probably the main source now consulted by software engineers looking for reusable software outside their own development environment. The main types of indexing used on the web are free text keyword searching, and to a lesser degree enumerated classifications. Searching on the web is made difficult by the size and dynamics of the database, and by the fact that different search engines will find different web pages given the same query. In teaching reuse courses to graduate students at Virginia Tech, I found that they had difficulty finding existing components on the web. For example, in one course students needed to find stemmers on the web. I had searched myself and knew that several different ones could be found. Typical of their input was the following email I received from the student who eventually received the highest grade in the class. "I'm still a little confused about what we should produce for the code analysis part of the project. I know we will try to come up with a generic architecture by looking for similarities in the code. I think this will be hard, considering the fact that I have only found code for one algorithm (Porter). Are we supposed to compare different implementations of the same algorithm?" I found in working with the students that they did not know how to formulate good search queries. Another problem is helping users understand reusable software components. This is important because if software engineers cannot understand components, they will not be able to reuse them. Current methods for representing reusable components are inadequate. A study 0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
  • 5. of four common representation methods for reusable software components showed that none of the methods worked very well for helping users understand the components [8]. We are currently doing research on visualization techniques, such as hypertrees, hierarchical trees, and tables, for helping users understand reusable software components [2]. We are using the IR components as a testbed for this research. Our visualizations are grounded in reuse design principles, such as the 3 C's model, and in general principles of information design such as those of Tufte. We use an extension of XML as a modeling language for the components. 10. Usage Because of the different venues used to distribute the IR components, usage data and user feedback comes from various sources. One source is email from users typically asking where they can find copies of the code, reporting a bug in the code, or occasionally asking if the code can be included in a commercial application such as the following one received recently. What is the status of your stemming code (implementation of the Porter algorithm) located in ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/irbook/stem mer/, is it public domain or copyrighted? The reason that I ask is I want to know if it is okay to use it in a search engine I am creating for my commercial website. I typically pass these requests on to the editor of the book at Pretntice-Hall who approves them and asks that the source of the component be referenced in the code and documentation of the system in which it will be used. This time I got a followup message, "The email address I used was from the read.me file that came with the stemming code - the address is frakes@sarvis.cs.vt.edu. My lawyer wants me to have sign something confirming the info below - is your address at Virginia Polytechnic Institute and State University still valid?" This message point to two problems--how to keep information associated with the code, in this case my email address up to date, and how to handle legal problems. I sent the message onto the editor at Prentice- Hall. The proliferation of the code on various websites is also an indicator of usage, as is references to the code in various web pages. Some of the web pages are papers that reference the book or code from the book, some are syllabi for courses, others contain variants of some of the components written in different languages. Another source of feedback from users can be found in reviews of the book at websites like amazon.com. 11. Current Status and A Proposal I am currently working with available personnel (i.e. a graduate student) to address some of the problems identified above. Specifically the student is doing a semester project to: • place the code, which now has two versions, under change and version control using RCS. • place the code on at least two ftp servers • convert the code to the GNU coding and “free software” standards. • create a web page for the code that provides information and pointers to the distribution sites. • Create Documentation that will allow continuity in the maintenance of the software. Experience with the IR code collection shows that current methods of development, maintenance, and distribution work, but need improvement. Some recommendations follow. There is much inefficiency in the development of components that accompany texts. For example, there are many books that provide code that implements the basic data structures and algorithms of computer science such as sorting, searching, lists, stacks, queues and so on. A standard way of cataloging these data structures and algorithms could be quite helpful. For example, each unique algorithm or data structure specification might be assigned a product number similar to an ISBN number for a book. Implementations of these specifications might also be assigned a number that references the number of the implemented specification. Such components might also include information on quality assurance, indexing terms, repository locations, and so on. This will only happen if it makes legal and financial sense, and the legal and financial issues are far from solved. The case of GNU, for example, shows the complexity of issues related to “free software”. The recent trend towards patenting software algorithms also adds to the difficulty of freely sharing and reusing software. There is also the continuing question of who will provide resources for long term maintenance tasks. These 0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE
  • 6. important problems must be solved if we are to make better use of existing reusable software sources. References [1] Aharonian, G., 1995 US Patent Statistics. 1995, http://www.baker.com/grandunificationtheory/archive /199601/19960121.html. [2] Alonso, O., & William B. Frakes (2000). Visualization of Reusable Software Assets. In W. B. Frakes (Ed.), ICSR6 Sixth International Conference on Software Reuse, . Vienna, Austria: Springer-Verlag. [3] Bentley, J. (1982). Writing Efficient Programs. Englewood Cliffs, NJ: Prentice-Hall. [4] Collberg, C., & Thomborsen, C. (1999). Software watermarking: Models and dynamic embeddings. In POPL’99, 26th Annual SIGPLAN–SIGACT Symposium on Principles of Programming Languages, (pp. 311–324). [5] Dr.Dobbs Essential Books on Algorithms and Data Structures, 1999 [6] Frakes, W., & Baeza-Yates, R. (Eds.). (1992). Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, N.J.: Prentice-Hall. [7]Frakes, W. B., & Fox., C. J. (1995). Sixteen Questions about Software Reuse. CACM, 38(6), 75-87. [8] Frakes, W., & Pole, T. (1994). An Empirical Study of Representation Methods for Reusable Software Components. IEEE Transactions on Software Engineering, , V20 n8, pp. 617-630, 1994.. [9] Frakes, W., & Terry, C. (1996). Software Reuse and Reusability Models and Metrics. ACM Computing Surveys, 28(2), 415-435. [10] GNU Coding Standards Copyright 1998 Free Software Foundation, Inc. [11] Huber, T. Reducing Business and Legal Risks in Software Reuse Libraries. in ICSR-3. 1994. Rio de Janeiro: IEEE-CS Press. [12] Latour, L., Wheeler, T., & Frakes, B. (1991). Descriptive and Prescriptive Aspects of the 3 C's Model: SETA1 Working Group Summary. Ada Letters, XI(3), 9-17. [13] Moore, J. W. (1994). The Use of Encryption to Ensure the Integrity of Reusable Software Components. Third International Conference on Software Reuse, (pp. 118-125). Rio de Janeiro: IEEE CS Press. [14] Stroustrup, B. (1996). Language-technical Aspects of Reuse. In Fourth International Conference on Software Reuse, (pp. 11-19). Orlando, FL: IEEE CS Press. 0-7695-0559-7/00 $10.00 ĂŁ 2000 IEEE