2. As i was trying to recapitulate several important concepts of OOAD, naturally COW or
Copy-On-Write got my attention for sometime. COW is a special technique used for efficient
resource management in computer science. In C++ ‘98 standard, COW was used to define
the std::string class. Although in C++ ‘11 standard, COW is not used to define the string
class of the standard C++ library, but it is worth to have a look at the earlier implementation
of the string class in GCC’s standard library. In COW, we defer creating of a new modifiable
resource of a class in the case of copying if we keep the copy unchanged. We just refer to
the old resource in the new copy. We write only when the copy changes that modifiable
resource.
The following code is my effort to explain the COW concept in a String class. Let me explain
some of the parts of this working code.
#include <string.h>
#include <cstddef>
//This is using default constructor
//and copy constructor. Hence while copying it will
//use shallow copy.
class StringHolder{
public:
char* data;
int refCount;
virtual ~StringHolder(){
delete[] data;
}
};
class MyString{
private:
StringHolder* mStrHolder;
int length;
public:
//default constructor
MyString(){
mStrHolder = new StringHolder();
length = 0;
}
MyString(const char* inStr){
mStrHolder = new StringHolder();
unsigned l=strlen(inStr);
mStrHolder->data=new char[l+1];
memcpy(mStrHolder->data,inStr,l+1);
mStrHolder->refCount = 1;
3. length = l;
}
MyString(const MyString& inStr){
mStrHolder = inStr.mStrHolder;
mStrHolder->refCount++;
length = inStr.length;
}
//Reference counted assignment operator
MyString& operator=(const MyString& inStr){
//check against self assignment
if(this == &inStr)
return *this;
mStrHolder->refCount--;//the data of the lhs string
//does no longer point to the old char
//array
if(mStrHolder->refCount == 0)
delete mStrHolder;
mStrHolder = inStr.mStrHolder;//the lhs and rhs data
//both are pointing to the RHS data.
mStrHolder->refCount++;
length = inStr.length;
return *this;
}
//mutator service
MyString& operator+=(const MyString& inStr){
Detach();
const char* cStr = inStr.c_str();
int l = inStr.length;
const char* oldData = mStrHolder->data;
//deep copy
char* tmp=new char[length + l];
mStrHolder->data = tmp;
memcpy(mStrHolder->data, oldData, length);
strcat(mStrHolder->data,cStr);
mStrHolder->refCount=1;
length+= l;
return *this;
}
4. //Destructor. The virtual term is important
virtual ~MyString(){
mStrHolder->refCount--;
if(mStrHolder->refCount == 0)
delete mStrHolder;
}
//Call this function first for all the
//mutator services of this class
void Detach(){
if (mStrHolder->refCount == 1)
return;
StringHolder* temp = new StringHolder();
temp->data = mStrHolder->data;
temp->refCount = 1;
mStrHolder->refCount--;
mStrHolder = temp;
}
inline char* c_str()
{
return (mStrHolder->data);
}
inline const char* c_str() const
{
return (mStrHolder->data);
}
int getRefCount(){
return mStrHolder->refCount;
}
};
As you can see, there is a class called StringHolder which is responsible to store the actual
data or the character array. As I have not used any constructor or copy constructor, naturally
this class will take the help of compiler generated default constructor and copy constructor.
And as the compiler generated copy constructor does shallow copy, hence this class will
also use shallow copy in case of copy constructor. Beside the character array, this class has
a reference count member variable which actually determines how many of our user defined
String class actually are holding a reference to the character array at any particular time.
Now let us turn our attention to the MyString class. As you can see, there is a default
constructor in this class, one copy constructor, one assignment operator and a destructor.
These kinds of classes are called Canonical Classes in C++.
5. The class also has an one argument constructor which creates the MyString objects from a
character array. Now let us examine the copy constructor. As you can see, inside the copy
constructor we are simply taking the help of the shallow copy of the MyString’s member
variable mStrHolder. Hence whenever we call a copy constructor, an extra reference is
added to the character array of the StringHolder class. So we need to increase the refCount
member variable as well.
Now let us examine the assignment operator. The first line checks for the self assignment
that probably i don’t need to explain here. Next as we are assigning a new value to the LHS,
the original LHS will no longer refer to the old character array of the StringHolder class.
Hence we have decremented the reference count by one. Next we are doing a shallow copy
of the mStrHolder member variable of the MyString class, and as a result we are just
incrementing the reference count.
Now let us pay attention to a mutator member functions of this class. We have implemented
only one of such functions and that is we have defined the += operator overloading. The
other mutator functions can be implemented in the same fashion.
We have implemented a Detach() method to help us define the mutator methods. We have
to call this Detach() method at the beginning of any mutator functions of the MyString class.
Let us examine what this Detach method is doing.
//Call this function first for all the
//mutator services of this class
void Detach(){
if (mStrHolder->refCount == 1)
return;
StringHolder* temp = new StringHolder();
temp->data = mStrHolder->data;
temp->refCount = 1;
mStrHolder->refCount--;
mStrHolder = temp;
}
As you can see, whenever we call the Detach() method, it reduces the old mStringHolder’s
refCount by one and creates a new mStrHolder object with a reference count 1.
Having explained the Detach() method, now let us focus on += mutator method of the class.
As you can see from the definition of the += operator overloading method that we have first
called the Detach() method. As a result the LHS will have a brand new mStrHolder object
with a reference count of 1. Then later as you can see that we have used a deep copy of the
mStrHolder class for the output result. Hence the output of this += operator overloading will
be a brand new object having a single reference to the character array. At the same time,
the original reference count of the LHS will be reduced by one.
6. That’s it.
So let me explain from an user’s perspective of what is happening in this COW based string
class. To experiment with this class i have created few lines of code as follow.
MyString str1("Loves to study CPP...");
MyString str2("How are you?");
MyString str3("Somenath Mukhopadhyay");
Naturally all of the above strings will have a reference count of 1.
Now if we do str2 = str1, what will happen. As you can understand that now both str1 and
str2 will point to the same character array. Hence the reference count will be 2. We can
verify that by the following piece of code.
char* data1 = str1.c_str();
char* data2 = str2.c_str();
printf("%pn",&data1[0]);
printf("%pn",&data2[0]);
We will see that both of the outputs of the above code are same which implies that both str1
and str2 are referring to the same character array.
Now what will happen if we write MyString str4(str2). So now both str1, str2 and str4 are
pointing to the same character array. Hence the reference count will be 3. We can again
verify that by using the following code.
char* data1 = str1.c_str();
char* data2 = str2.c_str();
char* data4 = str4.c_str();
printf("%pn",&data1[0]);
printf("%pn",&data2[0]);
printf("%pn",&data4[0]);
As you can see in all of the above functions we are actually not creating any new strings
except for the first time. We are just incrementing the reference count.
Now the interesting part. What will happen if we do str1+=str2. In this case a brand new str1
will be created and the old str1’s reference to the character array will be lost. Hence after this
operation only str2 and str4 will hold a pointer to the character array. We can again verify
this by the following code.
7. char* data1 = str1.c_str();
char* data2 = str2.c_str();
char* data4 = str4.c_str();
printf("%pn",&data1[0]);
printf("%pn",&data2[0]);
printf("%pn",&data4[0]);
All of the above steps have been pictorially described in the following diagram.