1. SOEN 423: Project Report
in fulfillment of SOEN 423 fall 2009 – Ver. 1.2
project ID : 3
12/11/2009
Team Members
Date Rev. Description Author(s) Contributor(s)
10/12/2009 1.0 First Draft Ali Ahmed The Team
11/12/2009 1.2 Document Review Ali Ahmed The Team
Concordia University Montreal
Winter 2009
2. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Table of Contents
1. Introduction ........................................................................................................................................... 3
2. Problem Statement ................................................................................................................................ 3
3. Design Description................................................................................................................................ 3
4. Implementation Details ......................................................................................................................... 5
4.1. Corba ............................................................................................................................................. 5
4.2. Client ............................................................................................................................................. 5
4.3. Front End ...................................................................................................................................... 6
4.4. Replica Manager ........................................................................................................................... 7
4.5. Branch Servers / Replicas ............................................................................................................. 9
4.6. Byzantine Scenarios ...................................................................................................................... 9
4.7. Reliable FIFO communication via UDP ..................................................................................... 12
4.8. Synchronization .......................................................................................................................... 13
4.9. Asynchronous call back in Corba ............................................................................................... 14
5. Test cases and overview...................................................................................................................... 15
6. Team Organization and Contribution ................................................................................................. 21
7. Conclusion .......................................................................................................................................... 21
2|Page
3. Concordia University Project Report SOEN 423
CS & SE Fall 2009
1. Introduction
This report is in fulfillment of the requirements for Soen 423 Distributed System Programming
Project for Fall 2009. It describes the problem specified, the design and implementation of our
solution and the resulting output from the system and verification by test cases
2. Problem Statement
We were required to implement a Distributed Banking System ( DBS ) , extending the core idea
of the individual assignments . Our project ( group of 3 ) was to have the following features
A failure-free front end (FE) which receives requests from the clients as CORBA invocations,
atomically broadcasts the requests to the server replicas, and sends a single correct result back
to the client by properly combining the results received from the replicas. The FE should also
keep track of the replica which produced the incorrect result (if any) and informs the replica
manager (RM) to replace the failed replica. The FE should also be multi threaded so that it can
handle multiple concurrent client requests using one thread per request.
A replica manager (RM) which creates and initializes the actively replicated server subsystem.
The RM also manages the server replica group information (which the FE uses for request
forwarding) and replaces a failed replica with another one when requested by the FE.
A reliable FIFO communication subsystem over the unreliable UDP layer for the communication
between replicas.
3. Design Description
Based on the requirements we saw a possible bottleneck, the fact that the methods parameters in
our assignment code had return types which would result in blocking and hence low performance.
Additionally Branch Servers had to be destroyed and re-instantiated hence we reference them through a
3|Page
4. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Branch Server Proxy Object. Transfer operations were delegated to FE's and they split them into
deposit and withdraw requests since then may need to contact other FE’s. Each FE and associated group
of replicas deals with a subset of accounts. If in an account transfer both accounts are to referenced to
the same FE the first account is referenced to the correct FE to do the withdraw , then a message is sent
to the same FE to do the deposit operation.
Fig . High Level design overview of the implementation and deployment
4|Page
5. Concordia University Project Report SOEN 423
CS & SE Fall 2009
4. Implementation Details
4.1. Corba
All elements of the Corba were defined in our IDL file which is listed below. Things to be noted
are the call back object and void method declarations.
Our IDL File
//---------------
module dbs
{
module corba
{
interface CallBack
{
void responseMessage(in string message);
};
interface FailureFreeFE
{
string sayHello();
void deposit(in long long accountNum,in float amount);
void withdraw(in long long accountNum,in float amount);
void balance(in long long accountNum);
void transfer(in long long src_accountNum,in long long
dest_accountNum,in float amount);
void requestResponse(in CallBack cb);
oneway void shutdown();
};
};
};
//---------------
4.2. Client
The Client simply registers itself with the ORB daemon; it maps references of Front
ends with the Branch ID of requested accounts (first two digits). It also registers the call back
object with the FE when making the request, there is no blocking at any stage and the FE
response is asynchronous so multiple requests can be sent and the responses arrive later.
5|Page
6. Concordia University Project Report SOEN 423
CS & SE Fall 2009
4.3. Front End
Fig . High-level view of frontend components.
The frontend manages the communication between the clients and branch server replicas. It
provides clients with a failure free interface to branch servers allowing them to perform the
basic banking operations (deposit, withdraw, balance, transfer).
Clients are only required to know the location of the server running the ORB. They then obtain a
reference to the frontend hosting the account they wish to manipulate. Requests are sent via
CORBA invocations to the frontend who then messages to the FIFO UDP to broadcast to each of
its branch server replicas. Each replica does the requested operation on the account and returns
a result in the form of an account balance. The results are all compared and response that
reflects the correct result is sent back to the client.
The CORBA middleware provides us with transparent threading and concurrency control. Also
the UDP Server runs in its own thread and all its operations are hence asynchronous. Therefore
spawn additional threads per request was no longer needed.
6|Page
7. Concordia University Project Report SOEN 423
CS & SE Fall 2009
4.4. Replica Manager
Why replication is needed
Process failure should not imply the failure of the whole system
A distributed system needs to implement techniques guaranteeing the availability and the
correctness of the result provided to the client. We use active replication as per the specification
so that the system may provide a collection of results, which represents a consensus that the
client may rely on without the awareness of the inner process of replication.
A replica is a server process that carries the execution of a client request with peers processing
the request in its own memory space. This way we guarantee that a replicated process does not
alter the state of another replica.
But the replication comes at a cost. True or most up to date correct results must be maintained
by proper message ordering or simply avoidance of byzantine failure (malicious or non
malicious). In the best case all replicas hold the same correct value. In the worst case we expect
the value to be returned be reached by a majority vote.
We may expect the replicas to fail for arbitrary reasons. Hardware failure can be among the
most dramatic. Replicas in the same host that fail will result in the total system failure unless
you have a group that can take its place. In this project we do not make the assumption that
“replica group replacement” can exist or that replicas are in different hosts but we keep in mind
it can exist in some distributed systems.
To prevent failure we implemented replicas has entities who relieve some of the work from the
front end and insure that even if a replicated processes fails the front end can still manage a to
send correct answer to the client.
Inaccurate response should not be sent back to the client
Now we need to discuss why replication in general provides good guarantee that no incorrect
answer will be sent to the client. Suppose a single process is handling the request of a client.
Suppose this answer is incorrect. First the front end cannot know the answer is incorrect and
failure detection will be absent.
7|Page
8. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Now we suppose we replicate a server process. In some case we may expect the replicas to send
a different answer to the front end. We assume that a consensus can be reached by a majority
vote. Else the server (replicas and front end) cannot proceed further and a replica must be
rebuilt or recovered to the last know good state.
In the project we make the assumption that replies sent by the replicas always reach a
consensus and that this consensus is truthful (not altered by some byzantine general).
The place of the replica and the replica manager within the system
Replica and branch server
The BranchServerProxy is the instance that holds reference to the replica (BranchServerImpl). A
replica is a copy of a branch server holding a private copy of the set of accounts within the
branch server. According to our assumption we assume two replicas will return the correct
result but a faulty replica will “lie“when it returns its result to the front end. For the sake of
simplicity we will assume a replica is an instance of a branch server.
Life of a replica
A branch server is expected to return correct results unless otherwise instructed. To keep the
implementation general a branch server can be created, killed and synched with a trusted
replica. A replica is kept alive while it is not generating three errors. If it does it is signaled to
stop and synched with a trusted replica data.
Killing a replica means the reference to the implementation along with private accounts are
removed for garbage collection. This is not a good way to proceed in a real distributed system
but since the set of accounts is no more than ten at any time this factor does not alter the
performance of the system in any way.
Once the reference to the implementation is collected by garbage collection it is synched with a
trusted branch server. We define synching as requesting a trusted replica account data set.
8|Page
9. Concordia University Project Report SOEN 423
CS & SE Fall 2009
The replicas do not communicate directly. Messages must be sent through a UPD server
contained in the front end and send the response back to the failed replica. Accounts in the
faulty replica are updated one by one until all accounts are updated to the correct values.
The replica manager
A replica manager is an entity in the system that manages the life of replicas. We expect the
replica manager to signal to a server holding replicated data to stop referencing incorrect
key/value pairs (id ,amount) defining bank accounts. It's job is to simply send a UDP message to
the failed branch server proxy to eliminate the reference to the old replicated data.
The replica manager does not play a bigger role in the system other than initiating the
replacement of replicated bank accounts. We have decided to integrate its functionality to other
subsystems.
4.5. Branch Servers / Replicas
Branch Servers exist with their own repository which is simulated by Synchronized hash
table and have methods to conduct operations on them; they don't have a transfer operation,
as they can't call methods from a replica server having a different group of accounts. They exist
as a reference to a Branch Server Proxy Object which handles udp messaging via its udp server
and passes respective messages to it. The proxy object also handles the replica failure scenario
by reinitializing the object.
4.6. Byzantine Scenarios
The requirement for the project states that our failure free frontend must also detect a single
non-malicious Byzantine failure and correct it using active replication. To achieve this we first
had to make one of the replicas return incorrect results, which we accomplished via a command
line parameter flag that when activate gave the replica high probability of producing these
incorrect results. The frontend must also keeps a hash table containing each replica and the
numbers of times they have consecutively sent false data to determine if it’s time to actively
replicate one.
9|Page
10. Concordia University Project Report SOEN 423
CS & SE Fall 2009
When the frontend examines each response from the replicas for a corresponding client request
it is able to detect the single Byzantine failure by comparing two results that agree. The one that
does not agree is obliviously the one that produced an incorrect result and must be dealt with.
The replica error hash table entry for this replica is checked and if its error count is 3 it must be
replicated and synced with clean data. The tasks of terminating the faulty replica, instantiating a
new one and repopulating it with known good data from another replica are then delegated to
the Replica Manager. Finally the frontend resets the error count of the new replica.
How we simulate byzantine failure
Under the conditions of the network where the software was tested no byzantine failure
(malicious or not) was not expected to happen but in real conditions this has to be taken into
account. Also if the network is reliable then we can expect byzantine failure due to message
loss. To handle byzantine failure we had to simulate a replica sending inaccurate data.
We were asked to implement a system were only one byzantine failure will occur at any time.
Under this condition we assume only one replica will generate incorrect result. To implement
this we have added an extra parameter to the replica indicating whether it will generate an
incorrect result or not.
This way we know only one byzantine replica will fail and we know we can determine which one
it is. But this approach has the limitation of testing the system under unlikely circumstances but
is sufficient for the scope of the assignment.
10 | P a g e
11. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig . Sequence Diagram of the Fail Case Scenario and the respective action taken by the replica
manage to handle the response.
11 | P a g e
12. Concordia University Project Report SOEN 423
CS & SE Fall 2009
4.7. Reliable FIFO communication via UDP
One of the requirements of the project was to have reliable Communication in a fifo order
implemented over the unreliable UDP protocol. As per our design every entity requiring this
network access has an object reference of a udp server. In our project those elements are the
Branch Proxy and the Front End. The UDP server object runs in a thread and also maintains
another concurrent Re transmit thread object which runs in parallel.
To maintain FIFO ordering messages are held in queue and are numbered. Sequence number
pairs are uniquely held for each destination and subsequently at the udp receiver object. One a
packet is sent from the queue we await for a corresponding acknowledgment for that message.
Once received the next message is acted upon. Message numbering allows us to discard
messages arriving out of order and duplicate messages. The retransmit thread which runs a
check method at a specified interval monitors whether the current message sent has had its
Acknowledge returned if not he the same message is broadcast again. If the total retries exceeds
a value a system exception is thrown.
12 | P a g e
13. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig .Sequence diagram of a sample operation sequence for the FIFO UDP Subsystem.
4.8. Synchronization
This is extension of the process implemented in the assignments. The Front End uses
synchronized blocks to handle concurrency and also obtain fine grained control over the locking
mechanism for maximum performance. Additionally Concurrent data structures are used were
appropriate eg: Concurrent Queues in the UDP server, Hash tables for the account information.
13 | P a g e
14. Concordia University Project Report SOEN 423
CS & SE Fall 2009
4.9. Asynchronous call back in Corba
Fig. A simple setup with return types specified in Corba methods
To avoid the above scenario we use void methods in our interface definitions but for each
operation we register a call back object, which can be used by the FE to respond asynchronously
without blocking the client and also can keep processing additional requests from the client.
14 | P a g e
15. Concordia University Project Report SOEN 423
CS & SE Fall 2009
5. Test cases and overview
Fig .Terminal Screens of the project being initialized normally , instance shows three front
ends and associated 9 replica servers.
15 | P a g e
16. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig . Source tree overview of the project
16 | P a g e
17. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig. Output Consoles for the clients and replica serves running Driver Test1 ( Single thread test)
17 | P a g e
18. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig . Double threaded client (Branch 30)
18 | P a g e
19. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig . Double threaded client (Branch 20)
19 | P a g e
20. Concordia University Project Report SOEN 423
CS & SE Fall 2009
Fig . Fifo UDP testing
20 | P a g e
21. Concordia University Project Report SOEN 423
CS & SE Fall 2009
6. Team Organization and Contribution
As per the Project document roles were already specified for the three team members . That is
primarily how the team was organized there some overlap in areas in the later stages and the
documentation was done together in team setting.
Ali Ahmed - Primary responsibility to design and implement the FIFO UDP Message passing
system. Asynchronous call back in Corba , Design of the front end and Replica servers
*********** - Handling the Replication Manages and delegating responsibilities to the
relevant classes for the appropriate actions
7. Conclusion
Our implementation correctly implements the specifications and results are as expected, changing
variable values results in output variations in line with manual calculations. Hence we feel the
solution satisfies all criteria required in the scope of the assignment.
21 | P a g e