SlideShare a Scribd company logo
M
Y LAST ARTICLE1
LOOKED AT CONTAINER
indexing, arriving eventually at a class
template, sparse_vector. This had a
similar functional interface to a standard
vector but a representation that was node-based
rather than array-based. Rather than storing all
elements in a sequence consecutively, a sparse_vector
aims to store only elements that are not equal to a given
default filler value.
What use could such a container be? Saving
memory is one obvious application. If you have an
indexable container with a large number of elements
– the majority with the same value – some systems
will benefit from a sparse representation that avoids
holding redundant data instead of a potentially
memory-hungry contiguous representation.
Another use is for simplifying logic, in particular
the replacing of rambling and clumsy selection state-
ments with a lookup-based approach. The following
fragment is based on some code I saw recently:
std::string message;
switch(error)
{
case 0:
message = “No error”;
break;
case 1:
message = “Initialisation failed”;
break;
case 2:
message = “Connection failed”;
break;
case 4:
message = “Connection dropped”;
break;
case 12:
message = “Unknown format”;
break;
case 27:
message = “Memory exhaustion”;
break;
case 67:
message = “Internal error”;
break;
default:
message = “Unknown error”;
break;
}
Elegant? No. Wretched case-and-paste code seems
to litter many systems, fragmenting program flow into
a list of hardwired exceptional cases instead of… well,
something that flows, clearly and smoothly. The
antidote to most rambling switch statements or
cascading if else if statements is to adopt a table-driven
approach.This is more declarative, and as simple and
practical as it is elegant.The only problem in this case
is that the indices are not contiguous. If we were looking
up the number of days in each month, a simple
const array would suffice. But for a 16-bit error
code whose bandwidth is mostly unused, a single array
with 65,336 elements, most of which were initialised
to “Unknown error”, could be considered inept rather
than adept programming.
Using the map container
This is where the standard map container can help out
– hold the significant messages against their error
numbers.This leads to a simple initialisation that can
be kept separate from the error processing code:
std::map<error_code, std::string> messages;
messages[0] = “No error”;
messages[1] = “Initialisation failed”;
messages[2] = “Connection failed”;
messages[4] = “Connection dropped”;
messages[12] = “Unknown format”;
messages[27] = “Memory exhaustion”;
messages[67] = “Internal error”;
The error processing code becomes a little more
direct, but is still hampered a little by the catch-all
for unknown error codes and some moderately
serious syntax:
std::map<error_code, std::string>::iterator found =
messages.find(error);
58 APPLICATION DEVELOPMENT ADVISOR q www.appdevadvisor.co.uk
Kevlin Henney is an
independent software
development consultant
and trainer.He can be
reached at
www.curbralan.com
C++ WORKSHOP
q Standard associative containers are
node-based, sorted hierarchies of linked
elements.
q Random access sequences can be used
as associative containers when they are
sorted.
q New container types can be written that
offer different properties from the
standard containers, but are built from
and encapsulate them.
q The flat_set and flat_multiset class
templates are useful non-standard
commodity types.
FACTS AT A GLANCE
Kevlin Henney explores the relevance of his sparse_vector class template to
simplifying selection statements using lookup tables, and then takes things
further by encapsulating an efficient flat-file structure for lookups
Look me up sometime
60 APPLICATION DEVELOPMENT ADVISOR q www.appdevadvisor.co.uk
C++ WORKSHOP
std::string message =
found != messages.end() ? *found : “Unknown error”;
This is where a sparse_vector can simplify things:
sparse_vector<std::string> messages(“Unknown error”);
messages.resize(65536);
messages[0] = “No error”;
messages[1] = “Initialisation failed”;
messages[2] = “Connection failed”;
messages[4] = “Connection dropped”;
messages[12] = “Unknown format”;
messages[27] = “Memory exhaustion”;
messages[67] = “Internal error”;
Which reduces the original logic to the highly readable:
std::string message = messages[error];
Internally, a sparse_vector is based on a map, but all the special-
case logic is encapsulated within it rather than polluting the
application code. This highlights the fact that encapsulation is not
simply about making data private and is certainly not about
cluttering a class interface with dubious setters and getters. It is more
genuinely concerned with reducing the number of hoops a class
user has to jump through to get a job done.
Sorted
What about the other way around? How about a using a vector for
content-based rather than position-based lookup? Why might such
a need arise? Consider a large set of strings that is read-mostly. Insertions
and removals are not common or are normally grouped together,
and lookup is the most frequent action. For instance, a simple spelling
dictionary is a set that requires fast lookup but does not require
constant updating. This sounds like a job for std::set, which
holds its elements in a sorted tree of individually allocated nodes.
Both lookup and insertion take logarithmic time with respect to
the number of elements in the set. For a dictionary, the majority
of insertions take place during a single phase of the program:
std::ifstream in(“words.txt”);
std::istream_iterator<std::string> begin(in), end;
std::set<std::string> words(begin, end);
...
std::string word;
...
if(!words.count(word))
...
This model of usage means that in exchange for a lot of
dynamic memory activity, most of the benefits of a hierarchically
linked structure are never exercised. A common alternative is to
flatten out the data structure and operate on it manually: popu-
late a random access sequence, such as std::vector or std::deque, with
elements, and use standard algorithmic function templates to sort
and search it:
std::ifstream in(“words.txt”);
std::istream_iterator<std::string> begin(in), end;
std::vector<std::string> words(begin, end);
std::sort(words.begin(), words.end());
words.erase(
std::unique(words.begin(), words.end()),
words.end());
...
std::string word;
...
if(!std::binary_search(words.begin(), words.end(), word))
...
This is a little tedious because it exposes the workings of the data
structure to the whole program.The absence of encapsulation means
that logic will inevitably be duplicated and there is no simple way
to safeguard the invariant – to be a sorted sequence of unique entries
– of the data structure. Sorting is easy enough, but eliminating
duplicates requires the user to remember that the iterator range
is merely reorganised and nothing is actually removed from the
container until the container is told explicitly (this is the same idiom
that is used with std::remove2
).
Flatland
A flattened data structure is useful, but it is crying out to be encap-
sulated. A flat_set class template that supports the same function
interface as std::set, but with slightly different space and speed trade-
offs, is the design goal. The easiest way to score this is first to solve
a slightly simpler problem – that of flat_multiset. Like std::multi-
set it is sorted, but may contain duplicate elements.
As well as offering a useful facility, the implementation of
flat_multiset is a useful exercise in working with the standard
algorithms and the received wisdom on their effective use3
. I
won’t describe or define all the operations for flat_multiset or flat_set
– you can find guidance on what they should be and what is required
of them from the standard or any good STL reference4,5
.
Here is a quick sketch of the outside and inside of flat_multiset:
template<
typename value_type,
typename key_compare = std::less<value_type> >
class flat_multiset
{
...
private:
...
std::vector<value_type> elements;
key_compare comparer;
};
We know the internal representation is going to be based on a
random access sequence and std::vector fulfils this need. The
standard associative containers all take a defaulted parameter
that defines their internal ordering. The default function object
type is std::less, which gives an ascending order, and an instance
is stored for use by the member functions.
Simple construction and assignment require no extra effort to
preserve a sorted order:
template<...>
class flat_multiset
{
public:
explicit flat_multiset(
const key_compare &comparer = key_compare())
: comparer(comparer)
{
}
flat_multiset(
const flat_multiset &other,
const key_compare &comparer = key_compare())
: elements(other.elements), comparer(comparer)
{
}
flat_multiset &operator=(const flat_multiset &rhs)
MARCH 2002 61
C++ WORKSHOP
{
elements = rhs.elements;
comparer = rhs.comparer;
return *this;
}
...
};
However, constructing a new flat_multiset from an iterator
range, such as input iterators from a file or inserting a range of elements
into an existing one, requires explicit sorting:
template<...>
class flat_multiset
{
public:
...
template<typename input_iterator>
flat_multiset(
input_iterator begin, input_iterator end,
const key_compare &comparer = key_compare())
: elements(begin, end), comparer(comparer)
{
sort();
}
template<typename input_iterator>
void insert(input_iterator begin, input_iterator end)
{
elements.insert(elements.end(), begin, end);
sort();
}
...
private:
void sort()
{
std::sort(
elements.begin(), elements.end(), comparer);
}
...
};
The private sort member function has been factored out to make
the behaviour clearer, wrapping up the use of the standard sort and
the comparison object. In the case of the insert function, the first
step is to append the new elements following the end of the
existing sequence and then resort the whole lot. There are other
approaches that can be more efficient, but this one is the briefest.
Many of the smaller operations are trivial to implement and just
forward to the corresponding member of the contained std::vector:
template<...>
class flat_multiset
{
public:
...
bool empty() const
{
return elements.empty();
}
void clear()
{
elements.clear();
}
...
};
This simple delegation also applies to many of the typedefs that
the flat_multiset should export. Conveniently enough, this also holds
true for iteration. Many programmers find providing their own
iterator types a little daunting, although when taken one step a time,
the task often turns out to be comparatively simple6
. It doesn’t get
much simpler than for flat_multiset:
template<...>
class flat_multiset
{
public:
...
typedef std::vector<value_type>::const_iterator iterator;
typedef iterator const_iterator;
iterator begin() const
{
return elements.begin();
}
iterator end() const
{
return elements.end();
}
...
};
The contained vector already has iterator types that have all the
right properties; flat_multiset need only reexport them. There is
one caveat, however: iterator access to the set must be const-
only, which is why both iterator and const_iterator are equivalent
to a vector’s const_iterator. A non-const iterator would allow
elements in a sequence to be modified, which could break the
guarantee of a sorted order.
Searching for a value is simple enough when you realise the most
important member function is equal_range, which returns a pair
of iterators, delimiting the upper and lower bound of where the
value occurs or would occur in the sorted range:
template<...>
class flat_multiset
{
public:
...
std::pair<iterator, iterator> equal_range(
const value_type &to_find) const
{
return std::equal_range(
elements.begin(), elements.end(),
to_find, comparer);
}
iterator find(const value_type &to_find) const
{
std::pair<iterator, iterator> found =
equal_range(to_find);
return found.first != found.second ?
found.first : elements.end();
}
std::size_t count(const value_type &to_find) const
{
std::pair<iterator, iterator> found =
equal_range(to_find);
return found.second - found.first;
}
...
};
62 APPLICATION DEVELOPMENT ADVISOR q www.appdevadvisor.co.uk
C++ WORKSHOP
Iterator (pointer) arithmetic makes counting the number of times
a value occurs quite trivial.
So far, so good. But now things become a little hairier. We need
to consider how to insert and erase a single value efficiently. A simplistic
approach to inserting a single element would be to use push_back
to append it to the range and then resort the whole lot. A more
efficient approach is to find the location that the element should
be placed in and shuffle up the elements that follow. Erasing a value
presents a minor challenge because it requires the use of modifi-
able iterators to identify and remove the subrange in which a value
occurs. But these are only minor challenges:
template<...>
class flat_multiset
{
public:
...
iterator insert(const value_type &new_value)
{
return elements.insert(
std::upper_bound(
elements.begin(), elements.end(),
new_value, comparer),
new_value);
}
std::size_t erase(const value_type &to_erase)
{
std::pair<muterator, muterator> found =
std::equal_range(
elements.begin(), elements.end(),
to_erase, comparer);
std::size_t result = found.second - found.first;
elements.erase(found.first, found.second);
return result;
}
...
private:
typedef std::vector<value_type>::iterator muterator;
...
};
The muterator (mutable iterator) typedef is just there to simplify
the code in erase and other member functions that require such
a capability.
Regardless of language, all such algorithmic work is twiddly and
the major challenge sometimes seems to be to read LISP! For pro-
grammers without any experience in functional programming, the
bracketing density appears a little high when using the STL.
That’s the hard work done. The standard tells you which func-
tions are expected of associative containers, the standard library
provides the relevant underlying mechanism and all the resulting
member functions are short, although sometimes a little fiddly.
So, other than the names, what is the minimum change needed
to turn a flat_multiset into a flat_set? Of the functions shown, only
two require changes – the private sort function, which must now
also eliminate duplicates, and the single value insert function, which
must now ensure that duplicates are not introduced.
The sort function sounds innocent enough. Looking back
through the code in the article, you will find a use of unique that
looks like it might solve the problem, leading to the following:
template<...>
class flat_set
{
...
private:
void sort()
{
std::sort(
elements.begin(), elements.end(), comparer);
elements.erase(
std::unique(elements.begin(), elements.end()),
elements.end());
}
...
};
This solution is simple, tempting and, alas, wrong. The out-of-
the-box version of unique uses straightforward equality to cut down
the range.This may or may not match up with the templated sorting
criteria that have been used with the container. In the case of
flat_set<std::string> it will do the right thing because the < and ==
operators will be used and are consistent with one another. How-
ever, if a case-insensitive ordering has been provided as a parameter
to flat_set, equality checking will not eliminate duplicates with the
same spelling but different case.
It transpires that associative containers with unique elements do
not use equality to discard duplicates.They use equivalence, a subtly
different concept, which sometimes gives the same result and some-
times doesn’t. If we consider the ordering of an associative
container to be less-than, then equivalence of two elements is when
neither element is not less-than the other. To make sense of this
construct we can define a helper function object type to perform
this logic for us, using it with the overloaded form of unique that
permits you to pass your own filter function:
template<...>
class flat_set
{
...
private:
class equivalent
{
public:
explicit equivalent(
const key_compare &comparer)
: comparer(comparer)
{
}
bool operator()(
const value_type &lhs,
const value_type &rhs) const
{
return !comparer(lhs, rhs) &&
!comparer(rhs, lhs);
}
private:
key_compare comparer;
};
void sort()
{
std::sort(
elements.begin(), elements.end(), comparer);
elements.erase(
std::unique(
elements.begin(), elements.end(),
equivalent(comparer)),
elements.end());
C++ WORKSHOP
}
...
};
The last piece of the jigsaw is insertion of a single element. This
cannot be the same as the one for flat_multiset, or even a slightly
tweaked version of it, because its signature differs.The return value
for single-element insertion into a unique associative container must
indicate whether or not a new value was inserted – a new element
will not be inserted if the value already exists – and an iterator to
either the newly inserted value or the existing value:
template<...>
class flat_set
{
public:
std::pair<iterator, bool> insert(
const value_type &new_value)
{
std::pair<muterator, muterator> found =
std::equal_range(
elements.begin(), elements.end(),
new_value, comparer);
bool exists = found.first != found.second;
iterator result = exists ?
found.first :
elements.insert(found.first, new_value);
return std::make_pair(result, !exists);
}
...
};
In creating a family of types, even one as small as flat_set and
flat_multiset, there are design considerations in addition to performance
and conformance to STL expectations. Copy and paste would be
an inappropriate way to share the common logic between the two
types, as would public inheritance. Private inheritance of flat_multiset
is a possibility but not a particularly fetching one. Factoring out
a common but private base to both flat_set and flat_multiset
would be a cleaner and more decoupled approach. Alternatively,
delegation can be used, either one to another or, better still,
both forwarding to a third implementation type.
The STL is an open framework based on requirements and not sim-
ply a fixed collection of code – some containers and algorithms, plus
hangers on. It is an extensible model that allows you to customise and
introduceyourownabstractions,whichmaybe,asinthecaseofflat_set
and flat_multiset, hybrids of existing concepts and code – the inter-
face of existing types with the implementation of another type.The
flat associative containers have the benefit of space economy and, for
read-mostly use, speed. Balancing these trade-offs is the most com-
mon reason for defining your own commodity types. s
References
1. Kevlin Henney, “Bound and checked”, Application
Development Advisor 6(1), www.appdevadvisor.co.uk
2. Kevlin Henney, “If I had a hammer”, Application
Development Advisor, 5(6), www.appdevadvisor.co.uk
3. Scott Meyers, Effective STL, Addison-Wesley, 2001
4. Matthew Austern, Generic Programming and the STL,
Addison-Wesley, 1999
5. Nicolai Josuttis, The C++ Standard Library,
Addison-Wesley, 1999
6. Kevlin Henney, “Promoting polymorphism”, Application
Development Advisor, 5(8), www.appdevadvisor.co.uk
ADA's free e-mail newsletter
for software developers
By now many of you will have had a free on-line newsletter which is a service to
readers of ADA and packed with the very latest editorial for software and
application developers. However, some of our readers don't get this free service.
If you don't get your free newsletter it's for one of two good reasons.
b
Even though you subscribed to ADA,
you haven't supplied us with your e-mail address.
c
You ticked the privacy box at the time you subscribed
and we can't now send you our newsletter.
Sign up for this free service at www.appdevadvisor.co.uk/subscribe.htm
www.appdevadvisor.co.uk/subscribe.htm

More Related Content

Viewers also liked

Blog
BlogBlog
Blog
Siduyy
 
Charlie nichols jr resume brand new
Charlie nichols jr resume  brand newCharlie nichols jr resume  brand new
Charlie nichols jr resume brand new
Charlie Nichols
 
Güdel Whitepaper tmf march_2014 (1)
 Güdel  Whitepaper tmf march_2014 (1) Güdel  Whitepaper tmf march_2014 (1)
Güdel Whitepaper tmf march_2014 (1)
Marcelino Contreras
 
Math pre
Math preMath pre
Math pre
Jia San Oljs
 
Acid rain
Acid rainAcid rain
R Canham CV1
R Canham CV1R Canham CV1
R Canham CV1
Richard Canham
 

Viewers also liked (8)

Blog
BlogBlog
Blog
 
Fire Brochure
Fire BrochureFire Brochure
Fire Brochure
 
Charlie nichols jr resume brand new
Charlie nichols jr resume  brand newCharlie nichols jr resume  brand new
Charlie nichols jr resume brand new
 
Güdel Whitepaper tmf march_2014 (1)
 Güdel  Whitepaper tmf march_2014 (1) Güdel  Whitepaper tmf march_2014 (1)
Güdel Whitepaper tmf march_2014 (1)
 
scan0001
scan0001scan0001
scan0001
 
Math pre
Math preMath pre
Math pre
 
Acid rain
Acid rainAcid rain
Acid rain
 
R Canham CV1
R Canham CV1R Canham CV1
R Canham CV1
 

Similar to Look Me Up Sometime

ASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And Representation
Randy Connolly
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
Arno Huetter
 
Bound and Checked
Bound and CheckedBound and Checked
Bound and Checked
Kevlin Henney
 
Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
Nicole Gomez
 
Dot Net Fundamentals
Dot Net FundamentalsDot Net Fundamentals
Dot Net Fundamentals
LiquidHub
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
dwm042
 
Linked list basics
Linked list basicsLinked list basics
Linked list basics
Rajesh Kumar
 
The use of the code analysis library OpenC++: modifications, improvements, er...
The use of the code analysis library OpenC++: modifications, improvements, er...The use of the code analysis library OpenC++: modifications, improvements, er...
The use of the code analysis library OpenC++: modifications, improvements, er...
PVS-Studio
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Gabriele Baldassarre
 
If I Had a Hammer...
If I Had a Hammer...If I Had a Hammer...
If I Had a Hammer...
Kevlin Henney
 
Lect1.pptx
Lect1.pptxLect1.pptx
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
Massimo Schenone
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
PVS-Studio
 
136 latest dot net interview questions
136  latest dot net interview questions136  latest dot net interview questions
136 latest dot net interview questions
sandi4204
 
Twig Brief, Tips&Tricks
Twig Brief, Tips&TricksTwig Brief, Tips&Tricks
Twig Brief, Tips&Tricks
Andrei Burian
 
01 Persistence And Orm
01 Persistence And Orm01 Persistence And Orm
01 Persistence And Orm
Ranjan Kumar
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
Lukas Vlcek
 
Website Security
Website SecurityWebsite Security
Website Security
Carlos Z
 
Website Security
Website SecurityWebsite Security
Website Security
MODxpo
 

Similar to Look Me Up Sometime (20)

ASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And RepresentationASP.NET 08 - Data Binding And Representation
ASP.NET 08 - Data Binding And Representation
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Bound and Checked
Bound and CheckedBound and Checked
Bound and Checked
 
Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
 
Dot Net Fundamentals
Dot Net FundamentalsDot Net Fundamentals
Dot Net Fundamentals
 
Data structure and algorithm.
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
Linked list basics
Linked list basicsLinked list basics
Linked list basics
 
The use of the code analysis library OpenC++: modifications, improvements, er...
The use of the code analysis library OpenC++: modifications, improvements, er...The use of the code analysis library OpenC++: modifications, improvements, er...
The use of the code analysis library OpenC++: modifications, improvements, er...
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
 
If I Had a Hammer...
If I Had a Hammer...If I Had a Hammer...
If I Had a Hammer...
 
Lect1.pptx
Lect1.pptxLect1.pptx
Lect1.pptx
 
Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
 
136 latest dot net interview questions
136  latest dot net interview questions136  latest dot net interview questions
136 latest dot net interview questions
 
Twig Brief, Tips&Tricks
Twig Brief, Tips&TricksTwig Brief, Tips&Tricks
Twig Brief, Tips&Tricks
 
01 Persistence And Orm
01 Persistence And Orm01 Persistence And Orm
01 Persistence And Orm
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Website Security
Website SecurityWebsite Security
Website Security
 
Website Security
Website SecurityWebsite Security
Website Security
 

More from Kevlin Henney

Program with GUTs
Program with GUTsProgram with GUTs
Program with GUTs
Kevlin Henney
 
The Case for Technical Excellence
The Case for Technical ExcellenceThe Case for Technical Excellence
The Case for Technical Excellence
Kevlin Henney
 
Empirical Development
Empirical DevelopmentEmpirical Development
Empirical Development
Kevlin Henney
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
Kevlin Henney
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
Kevlin Henney
 
Solid Deconstruction
Solid DeconstructionSolid Deconstruction
Solid Deconstruction
Kevlin Henney
 
Get Kata
Get KataGet Kata
Get Kata
Kevlin Henney
 
Procedural Programming: It’s Back? It Never Went Away
Procedural Programming: It’s Back? It Never Went AwayProcedural Programming: It’s Back? It Never Went Away
Procedural Programming: It’s Back? It Never Went Away
Kevlin Henney
 
Structure and Interpretation of Test Cases
Structure and Interpretation of Test CasesStructure and Interpretation of Test Cases
Structure and Interpretation of Test Cases
Kevlin Henney
 
Agility ≠ Speed
Agility ≠ SpeedAgility ≠ Speed
Agility ≠ Speed
Kevlin Henney
 
Refactoring to Immutability
Refactoring to ImmutabilityRefactoring to Immutability
Refactoring to Immutability
Kevlin Henney
 
Old Is the New New
Old Is the New NewOld Is the New New
Old Is the New New
Kevlin Henney
 
Turning Development Outside-In
Turning Development Outside-InTurning Development Outside-In
Turning Development Outside-In
Kevlin Henney
 
Giving Code a Good Name
Giving Code a Good NameGiving Code a Good Name
Giving Code a Good Name
Kevlin Henney
 
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
Kevlin Henney
 
Thinking Outside the Synchronisation Quadrant
Thinking Outside the Synchronisation QuadrantThinking Outside the Synchronisation Quadrant
Thinking Outside the Synchronisation Quadrant
Kevlin Henney
 
Code as Risk
Code as RiskCode as Risk
Code as Risk
Kevlin Henney
 
Software Is Details
Software Is DetailsSoftware Is Details
Software Is Details
Kevlin Henney
 
Game of Sprints
Game of SprintsGame of Sprints
Game of Sprints
Kevlin Henney
 
Good Code
Good CodeGood Code
Good Code
Kevlin Henney
 

More from Kevlin Henney (20)

Program with GUTs
Program with GUTsProgram with GUTs
Program with GUTs
 
The Case for Technical Excellence
The Case for Technical ExcellenceThe Case for Technical Excellence
The Case for Technical Excellence
 
Empirical Development
Empirical DevelopmentEmpirical Development
Empirical Development
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
 
Solid Deconstruction
Solid DeconstructionSolid Deconstruction
Solid Deconstruction
 
Get Kata
Get KataGet Kata
Get Kata
 
Procedural Programming: It’s Back? It Never Went Away
Procedural Programming: It’s Back? It Never Went AwayProcedural Programming: It’s Back? It Never Went Away
Procedural Programming: It’s Back? It Never Went Away
 
Structure and Interpretation of Test Cases
Structure and Interpretation of Test CasesStructure and Interpretation of Test Cases
Structure and Interpretation of Test Cases
 
Agility ≠ Speed
Agility ≠ SpeedAgility ≠ Speed
Agility ≠ Speed
 
Refactoring to Immutability
Refactoring to ImmutabilityRefactoring to Immutability
Refactoring to Immutability
 
Old Is the New New
Old Is the New NewOld Is the New New
Old Is the New New
 
Turning Development Outside-In
Turning Development Outside-InTurning Development Outside-In
Turning Development Outside-In
 
Giving Code a Good Name
Giving Code a Good NameGiving Code a Good Name
Giving Code a Good Name
 
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
Clean Coders Hate What Happens To Your Code When You Use These Enterprise Pro...
 
Thinking Outside the Synchronisation Quadrant
Thinking Outside the Synchronisation QuadrantThinking Outside the Synchronisation Quadrant
Thinking Outside the Synchronisation Quadrant
 
Code as Risk
Code as RiskCode as Risk
Code as Risk
 
Software Is Details
Software Is DetailsSoftware Is Details
Software Is Details
 
Game of Sprints
Game of SprintsGame of Sprints
Game of Sprints
 
Good Code
Good CodeGood Code
Good Code
 

Recently uploaded

A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdfA Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
kalichargn70th171
 
ERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in CoimbatoreERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in Coimbatore
Nextskill Technologies
 
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databasesBATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATbern
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
attueb
 
Amazon Music Spelling Correction - SIGIR 2024
Amazon Music Spelling Correction - SIGIR 2024Amazon Music Spelling Correction - SIGIR 2024
Amazon Music Spelling Correction - SIGIR 2024
siddharth1729
 
To Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance SheetsTo Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance Sheets
Task Tracker
 
UMiami degree offer diploma Transcript
UMiami degree offer diploma TranscriptUMiami degree offer diploma Transcript
UMiami degree offer diploma Transcript
attueb
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
neshakor5152
 
Hotel Management Software Development Company
Hotel Management Software Development CompanyHotel Management Software Development Company
Hotel Management Software Development Company
XongoLab Technologies LLP
 
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Deliverybangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdfSEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
Balti Bloggers
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
Aarisha Shaikh
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
87tomato
 
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
simran hot girls
 
SAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple SoftwareSAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple Software
Zyple Software
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
AsifKarimJim
 
Tour and travel website management in odoo,
Tour and travel website management in odoo,Tour and travel website management in odoo,
Tour and travel website management in odoo,
Axis Technolabs
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
6m9p7qnjj8
 

Recently uploaded (20)

A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdfA Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
 
ERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in CoimbatoreERP Software Solutions Provider in Coimbatore
ERP Software Solutions Provider in Coimbatore
 
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databasesBATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databases
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
 
Amazon Music Spelling Correction - SIGIR 2024
Amazon Music Spelling Correction - SIGIR 2024Amazon Music Spelling Correction - SIGIR 2024
Amazon Music Spelling Correction - SIGIR 2024
 
To Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance SheetsTo Avoid Mistakes When Using Online Attendance Sheets
To Avoid Mistakes When Using Online Attendance Sheets
 
UMiami degree offer diploma Transcript
UMiami degree offer diploma TranscriptUMiami degree offer diploma Transcript
UMiami degree offer diploma Transcript
 
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Mysore 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Hotel Management Software Development Company
Hotel Management Software Development CompanyHotel Management Software Development Company
Hotel Management Software Development Company
 
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Deliverybangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdfSEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
SEO Cheat Sheet with Learning Resources by Balti Bloggers.pdf
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
 
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
 
SAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple SoftwareSAP implementation steps PDF - Zyple Software
SAP implementation steps PDF - Zyple Software
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
 
Tour and travel website management in odoo,
Tour and travel website management in odoo,Tour and travel website management in odoo,
Tour and travel website management in odoo,
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
 

Look Me Up Sometime

  • 1. M Y LAST ARTICLE1 LOOKED AT CONTAINER indexing, arriving eventually at a class template, sparse_vector. This had a similar functional interface to a standard vector but a representation that was node-based rather than array-based. Rather than storing all elements in a sequence consecutively, a sparse_vector aims to store only elements that are not equal to a given default filler value. What use could such a container be? Saving memory is one obvious application. If you have an indexable container with a large number of elements – the majority with the same value – some systems will benefit from a sparse representation that avoids holding redundant data instead of a potentially memory-hungry contiguous representation. Another use is for simplifying logic, in particular the replacing of rambling and clumsy selection state- ments with a lookup-based approach. The following fragment is based on some code I saw recently: std::string message; switch(error) { case 0: message = “No error”; break; case 1: message = “Initialisation failed”; break; case 2: message = “Connection failed”; break; case 4: message = “Connection dropped”; break; case 12: message = “Unknown format”; break; case 27: message = “Memory exhaustion”; break; case 67: message = “Internal error”; break; default: message = “Unknown error”; break; } Elegant? No. Wretched case-and-paste code seems to litter many systems, fragmenting program flow into a list of hardwired exceptional cases instead of… well, something that flows, clearly and smoothly. The antidote to most rambling switch statements or cascading if else if statements is to adopt a table-driven approach.This is more declarative, and as simple and practical as it is elegant.The only problem in this case is that the indices are not contiguous. If we were looking up the number of days in each month, a simple const array would suffice. But for a 16-bit error code whose bandwidth is mostly unused, a single array with 65,336 elements, most of which were initialised to “Unknown error”, could be considered inept rather than adept programming. Using the map container This is where the standard map container can help out – hold the significant messages against their error numbers.This leads to a simple initialisation that can be kept separate from the error processing code: std::map<error_code, std::string> messages; messages[0] = “No error”; messages[1] = “Initialisation failed”; messages[2] = “Connection failed”; messages[4] = “Connection dropped”; messages[12] = “Unknown format”; messages[27] = “Memory exhaustion”; messages[67] = “Internal error”; The error processing code becomes a little more direct, but is still hampered a little by the catch-all for unknown error codes and some moderately serious syntax: std::map<error_code, std::string>::iterator found = messages.find(error); 58 APPLICATION DEVELOPMENT ADVISOR q www.appdevadvisor.co.uk Kevlin Henney is an independent software development consultant and trainer.He can be reached at www.curbralan.com C++ WORKSHOP q Standard associative containers are node-based, sorted hierarchies of linked elements. q Random access sequences can be used as associative containers when they are sorted. q New container types can be written that offer different properties from the standard containers, but are built from and encapsulate them. q The flat_set and flat_multiset class templates are useful non-standard commodity types. FACTS AT A GLANCE Kevlin Henney explores the relevance of his sparse_vector class template to simplifying selection statements using lookup tables, and then takes things further by encapsulating an efficient flat-file structure for lookups Look me up sometime
  • 2. 60 APPLICATION DEVELOPMENT ADVISOR q www.appdevadvisor.co.uk C++ WORKSHOP std::string message = found != messages.end() ? *found : “Unknown error”; This is where a sparse_vector can simplify things: sparse_vector<std::string> messages(“Unknown error”); messages.resize(65536); messages[0] = “No error”; messages[1] = “Initialisation failed”; messages[2] = “Connection failed”; messages[4] = “Connection dropped”; messages[12] = “Unknown format”; messages[27] = “Memory exhaustion”; messages[67] = “Internal error”; Which reduces the original logic to the highly readable: std::string message = messages[error]; Internally, a sparse_vector is based on a map, but all the special- case logic is encapsulated within it rather than polluting the application code. This highlights the fact that encapsulation is not simply about making data private and is certainly not about cluttering a class interface with dubious setters and getters. It is more genuinely concerned with reducing the number of hoops a class user has to jump through to get a job done. Sorted What about the other way around? How about a using a vector for content-based rather than position-based lookup? Why might such a need arise? Consider a large set of strings that is read-mostly. Insertions and removals are not common or are normally grouped together, and lookup is the most frequent action. For instance, a simple spelling dictionary is a set that requires fast lookup but does not require constant updating. This sounds like a job for std::set, which holds its elements in a sorted tree of individually allocated nodes. Both lookup and insertion take logarithmic time with respect to the number of elements in the set. For a dictionary, the majority of insertions take place during a single phase of the program: std::ifstream in(“words.txt”); std::istream_iterator<std::string> begin(in), end; std::set<std::string> words(begin, end); ... std::string word; ... if(!words.count(word)) ... This model of usage means that in exchange for a lot of dynamic memory activity, most of the benefits of a hierarchically linked structure are never exercised. A common alternative is to flatten out the data structure and operate on it manually: popu- late a random access sequence, such as std::vector or std::deque, with elements, and use standard algorithmic function templates to sort and search it: std::ifstream in(“words.txt”); std::istream_iterator<std::string> begin(in), end; std::vector<std::string> words(begin, end); std::sort(words.begin(), words.end()); words.erase( std::unique(words.begin(), words.end()), words.end()); ... std::string word; ... if(!std::binary_search(words.begin(), words.end(), word)) ... This is a little tedious because it exposes the workings of the data structure to the whole program.The absence of encapsulation means that logic will inevitably be duplicated and there is no simple way to safeguard the invariant – to be a sorted sequence of unique entries – of the data structure. Sorting is easy enough, but eliminating duplicates requires the user to remember that the iterator range is merely reorganised and nothing is actually removed from the container until the container is told explicitly (this is the same idiom that is used with std::remove2 ). Flatland A flattened data structure is useful, but it is crying out to be encap- sulated. A flat_set class template that supports the same function interface as std::set, but with slightly different space and speed trade- offs, is the design goal. The easiest way to score this is first to solve a slightly simpler problem – that of flat_multiset. Like std::multi- set it is sorted, but may contain duplicate elements. As well as offering a useful facility, the implementation of flat_multiset is a useful exercise in working with the standard algorithms and the received wisdom on their effective use3 . I won’t describe or define all the operations for flat_multiset or flat_set – you can find guidance on what they should be and what is required of them from the standard or any good STL reference4,5 . Here is a quick sketch of the outside and inside of flat_multiset: template< typename value_type, typename key_compare = std::less<value_type> > class flat_multiset { ... private: ... std::vector<value_type> elements; key_compare comparer; }; We know the internal representation is going to be based on a random access sequence and std::vector fulfils this need. The standard associative containers all take a defaulted parameter that defines their internal ordering. The default function object type is std::less, which gives an ascending order, and an instance is stored for use by the member functions. Simple construction and assignment require no extra effort to preserve a sorted order: template<...> class flat_multiset { public: explicit flat_multiset( const key_compare &comparer = key_compare()) : comparer(comparer) { } flat_multiset( const flat_multiset &other, const key_compare &comparer = key_compare()) : elements(other.elements), comparer(comparer) { } flat_multiset &operator=(const flat_multiset &rhs)
  • 3. MARCH 2002 61 C++ WORKSHOP { elements = rhs.elements; comparer = rhs.comparer; return *this; } ... }; However, constructing a new flat_multiset from an iterator range, such as input iterators from a file or inserting a range of elements into an existing one, requires explicit sorting: template<...> class flat_multiset { public: ... template<typename input_iterator> flat_multiset( input_iterator begin, input_iterator end, const key_compare &comparer = key_compare()) : elements(begin, end), comparer(comparer) { sort(); } template<typename input_iterator> void insert(input_iterator begin, input_iterator end) { elements.insert(elements.end(), begin, end); sort(); } ... private: void sort() { std::sort( elements.begin(), elements.end(), comparer); } ... }; The private sort member function has been factored out to make the behaviour clearer, wrapping up the use of the standard sort and the comparison object. In the case of the insert function, the first step is to append the new elements following the end of the existing sequence and then resort the whole lot. There are other approaches that can be more efficient, but this one is the briefest. Many of the smaller operations are trivial to implement and just forward to the corresponding member of the contained std::vector: template<...> class flat_multiset { public: ... bool empty() const { return elements.empty(); } void clear() { elements.clear(); } ... }; This simple delegation also applies to many of the typedefs that the flat_multiset should export. Conveniently enough, this also holds true for iteration. Many programmers find providing their own iterator types a little daunting, although when taken one step a time, the task often turns out to be comparatively simple6 . It doesn’t get much simpler than for flat_multiset: template<...> class flat_multiset { public: ... typedef std::vector<value_type>::const_iterator iterator; typedef iterator const_iterator; iterator begin() const { return elements.begin(); } iterator end() const { return elements.end(); } ... }; The contained vector already has iterator types that have all the right properties; flat_multiset need only reexport them. There is one caveat, however: iterator access to the set must be const- only, which is why both iterator and const_iterator are equivalent to a vector’s const_iterator. A non-const iterator would allow elements in a sequence to be modified, which could break the guarantee of a sorted order. Searching for a value is simple enough when you realise the most important member function is equal_range, which returns a pair of iterators, delimiting the upper and lower bound of where the value occurs or would occur in the sorted range: template<...> class flat_multiset { public: ... std::pair<iterator, iterator> equal_range( const value_type &to_find) const { return std::equal_range( elements.begin(), elements.end(), to_find, comparer); } iterator find(const value_type &to_find) const { std::pair<iterator, iterator> found = equal_range(to_find); return found.first != found.second ? found.first : elements.end(); } std::size_t count(const value_type &to_find) const { std::pair<iterator, iterator> found = equal_range(to_find); return found.second - found.first; } ... };
  • 4. 62 APPLICATION DEVELOPMENT ADVISOR q www.appdevadvisor.co.uk C++ WORKSHOP Iterator (pointer) arithmetic makes counting the number of times a value occurs quite trivial. So far, so good. But now things become a little hairier. We need to consider how to insert and erase a single value efficiently. A simplistic approach to inserting a single element would be to use push_back to append it to the range and then resort the whole lot. A more efficient approach is to find the location that the element should be placed in and shuffle up the elements that follow. Erasing a value presents a minor challenge because it requires the use of modifi- able iterators to identify and remove the subrange in which a value occurs. But these are only minor challenges: template<...> class flat_multiset { public: ... iterator insert(const value_type &new_value) { return elements.insert( std::upper_bound( elements.begin(), elements.end(), new_value, comparer), new_value); } std::size_t erase(const value_type &to_erase) { std::pair<muterator, muterator> found = std::equal_range( elements.begin(), elements.end(), to_erase, comparer); std::size_t result = found.second - found.first; elements.erase(found.first, found.second); return result; } ... private: typedef std::vector<value_type>::iterator muterator; ... }; The muterator (mutable iterator) typedef is just there to simplify the code in erase and other member functions that require such a capability. Regardless of language, all such algorithmic work is twiddly and the major challenge sometimes seems to be to read LISP! For pro- grammers without any experience in functional programming, the bracketing density appears a little high when using the STL. That’s the hard work done. The standard tells you which func- tions are expected of associative containers, the standard library provides the relevant underlying mechanism and all the resulting member functions are short, although sometimes a little fiddly. So, other than the names, what is the minimum change needed to turn a flat_multiset into a flat_set? Of the functions shown, only two require changes – the private sort function, which must now also eliminate duplicates, and the single value insert function, which must now ensure that duplicates are not introduced. The sort function sounds innocent enough. Looking back through the code in the article, you will find a use of unique that looks like it might solve the problem, leading to the following: template<...> class flat_set { ... private: void sort() { std::sort( elements.begin(), elements.end(), comparer); elements.erase( std::unique(elements.begin(), elements.end()), elements.end()); } ... }; This solution is simple, tempting and, alas, wrong. The out-of- the-box version of unique uses straightforward equality to cut down the range.This may or may not match up with the templated sorting criteria that have been used with the container. In the case of flat_set<std::string> it will do the right thing because the < and == operators will be used and are consistent with one another. How- ever, if a case-insensitive ordering has been provided as a parameter to flat_set, equality checking will not eliminate duplicates with the same spelling but different case. It transpires that associative containers with unique elements do not use equality to discard duplicates.They use equivalence, a subtly different concept, which sometimes gives the same result and some- times doesn’t. If we consider the ordering of an associative container to be less-than, then equivalence of two elements is when neither element is not less-than the other. To make sense of this construct we can define a helper function object type to perform this logic for us, using it with the overloaded form of unique that permits you to pass your own filter function: template<...> class flat_set { ... private: class equivalent { public: explicit equivalent( const key_compare &comparer) : comparer(comparer) { } bool operator()( const value_type &lhs, const value_type &rhs) const { return !comparer(lhs, rhs) && !comparer(rhs, lhs); } private: key_compare comparer; }; void sort() { std::sort( elements.begin(), elements.end(), comparer); elements.erase( std::unique( elements.begin(), elements.end(), equivalent(comparer)), elements.end());
  • 5. C++ WORKSHOP } ... }; The last piece of the jigsaw is insertion of a single element. This cannot be the same as the one for flat_multiset, or even a slightly tweaked version of it, because its signature differs.The return value for single-element insertion into a unique associative container must indicate whether or not a new value was inserted – a new element will not be inserted if the value already exists – and an iterator to either the newly inserted value or the existing value: template<...> class flat_set { public: std::pair<iterator, bool> insert( const value_type &new_value) { std::pair<muterator, muterator> found = std::equal_range( elements.begin(), elements.end(), new_value, comparer); bool exists = found.first != found.second; iterator result = exists ? found.first : elements.insert(found.first, new_value); return std::make_pair(result, !exists); } ... }; In creating a family of types, even one as small as flat_set and flat_multiset, there are design considerations in addition to performance and conformance to STL expectations. Copy and paste would be an inappropriate way to share the common logic between the two types, as would public inheritance. Private inheritance of flat_multiset is a possibility but not a particularly fetching one. Factoring out a common but private base to both flat_set and flat_multiset would be a cleaner and more decoupled approach. Alternatively, delegation can be used, either one to another or, better still, both forwarding to a third implementation type. The STL is an open framework based on requirements and not sim- ply a fixed collection of code – some containers and algorithms, plus hangers on. It is an extensible model that allows you to customise and introduceyourownabstractions,whichmaybe,asinthecaseofflat_set and flat_multiset, hybrids of existing concepts and code – the inter- face of existing types with the implementation of another type.The flat associative containers have the benefit of space economy and, for read-mostly use, speed. Balancing these trade-offs is the most com- mon reason for defining your own commodity types. s References 1. Kevlin Henney, “Bound and checked”, Application Development Advisor 6(1), www.appdevadvisor.co.uk 2. Kevlin Henney, “If I had a hammer”, Application Development Advisor, 5(6), www.appdevadvisor.co.uk 3. Scott Meyers, Effective STL, Addison-Wesley, 2001 4. Matthew Austern, Generic Programming and the STL, Addison-Wesley, 1999 5. Nicolai Josuttis, The C++ Standard Library, Addison-Wesley, 1999 6. Kevlin Henney, “Promoting polymorphism”, Application Development Advisor, 5(8), www.appdevadvisor.co.uk ADA's free e-mail newsletter for software developers By now many of you will have had a free on-line newsletter which is a service to readers of ADA and packed with the very latest editorial for software and application developers. However, some of our readers don't get this free service. If you don't get your free newsletter it's for one of two good reasons. b Even though you subscribed to ADA, you haven't supplied us with your e-mail address. c You ticked the privacy box at the time you subscribed and we can't now send you our newsletter. Sign up for this free service at www.appdevadvisor.co.uk/subscribe.htm www.appdevadvisor.co.uk/subscribe.htm