GRID COMPUTING FRAMEWORK ANIL HARWANI KALPESH KAGRESHA YASH LONDHE GAURAV MENGHANI (Group No. 33) Under the guidance of Ms. Sakshi Surve Assistant Professor, Computer Engineering Department
Grid computing (or the use of computational grids) is the combination of computer resources from multiple administrative domains applied to a common task, usually to a scientific, technical or business problem that requires a great number of computer processing cycles or the need to process large amounts of data.
The primary goal of a Grid is to form a loosely coupled system of computers[clients] over a LAN or Internet which are capable of performing tasks issued by the server. Clients can join or leave the grid at any point of time.
Applications & Benefits
Computationally intensive tasks such as brute-forcing over a symmetric encryption key space, simulation of natural forces, prediction of cyclones, etc.
If the problem to be solved is inherently parallel in nature then the scaling provided by Grids can easily introduce a speed up factor, which is roughly proportional to the number of clients participating in the Grid.
The performance of some large Grids are comparable to some of the fastest supercomputers and hence Grids are a feasible cheaper substitute.
Setup of a grid is a complicated process, and hence is not considered a serious option.
Almost all grid computing middleware use a complicated structure and use resources of computers spread around the globe, and hence dependent on voluntary commitment of resources by unknown machines. This might not always be suitable.
Academic institutions don’t have access to easy-to-deploy grid computing middleware.
Grid Computing Framework
These concerns would be addressed in our project, Grid Computing Framework.
This Framework is a Third Party Application which helps the developer in rapidly deploying a flexible, reliable and efficient Grid.
To Create a Open Source Linux-based Grid Computing Framework which works on a moderately sized LAN and, is:
Easy to Deploy
Easy to Use
Easy to Maintain
Efficient and reliable with good performance scaling
Plan of Action
Accept the problem to be solved from the user, consisting of parallel code units called Tasks, dependency matrix of tasks, etc.
Distribute these tasks while taking in consideration the inter-dependency of tasks, and using a load-balancing algorithm.
Solve tasks at clients; record the output and errors (if any). Send the output and the error and performance logs to the server.
Collect outputs and logs from clients. Update client performance statistics.
Arrange outputs as desired by the user and present it to the user.
Submission of the Problem
The user submits the Problem at the server. A problem is described using:
Problem Solving Schema (PSS)
Task File Input Set(s)
Result Compilation Program (RCP)
Division of Tasks
The server apportions tasks to the clients using a load balancing algorithm. Each Task has the following:
Execution at Client-side
The client-side module parses the tasks being given to it, executes them and sends a packet of information called Task Execution Result . It comprises of:
Task Execution Results are received by the Server and are processed by the Result Compilation Program. Finally, the following are presented to the user
Problem Output (Generated by RCP)
Task Execution Results
Client-side State Transition Diagram
Server-side State Transition Diagram
Open Source Technologies
What is Linux?
Ubuntu - Debian Linux distribution
Programming on Linux
A free software project started in 1983
Provide tools for: development( GCC), graphical desktop(GTK+), applications and utilities(GNUzilla)
Tool for writing, compiling and executing a code
Supports various programming languages like C, C++,etc.
Tool for designing a GUI( Graphical User Interface)
Compiling a Single Source File
Example: source file name: main.c
gcc -c main.c (to compile main.c and create an object file)
gcc -o main1 main.o (to link the object file and create an executable file)
Both the above tasks can be done in a single step: gcc -o main1 main.c
. /main1 (to run the executable file)
A running instance of a program is called a process
Two or more concurrently running tasks spawned by a process
A process and its thread(s) share the same memory space and address space
Context switching between threads is faster than between processes
Each thread in a process is identified by a thread ID