This document proposes an application-level checkpoint-based approach for fault tolerance in distributed systems. The system uses coordinated checkpointing and systematic process logging to monitor nodes. If a node fails, its state can be reconstructed from checkpoint information. The system is implemented for a distributed multiple sequence alignment application using genetic algorithms. Checkpoints are taken locally at each worker node and globally by the head node to monitor node status and failures.