50120140503017 2
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

50120140503017 2

on

  • 190 views

 

Statistics

Views

Total Views
190
Views on SlideShare
190
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

50120140503017 2 Document Transcript

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 148 A NEW MODEL OF A MASSIVELY PARALLEL VIRTUAL COMPUTER IN A DISTRIBUTED SYSTEM M. YOUSSFI1 , O. BOUATTANE1 , H. FAKHI1 , M.O. BENSALAH2 , B. CHERRADI1,3 1 Laboratoire SSDIA, ENSET, University Hassan II Mohammedia Casablanca, Morocco 2 FSR, University Mohammed V AGDAL Rabat, Morocco 3 CRMEF El Jadida, Morocco ABSTRACT In this paper, we present a new model of a massively parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modelled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU structure. All these computers are based physically on a multitude of fine grained processing elements (PE) arranged and coupled according to their associated topological pattern. In this model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host agents are interconnected throw a multi agent system platform. This feature allows combining the performances of a set of physical computers to build a high performance massively parallel virtual machine. The developed software is based on a hard kernel of a parallel virtual machine in which we translate all the physical properties of its different components. This kernel is based on an abstract layer which can be easily extended to the other topological structures. To implement a parallel program in the proposed platform, we have developed a new parallel programming language based on XML and its compiler which allow editing, compiling and running parallel programs. To illustrate the performance of this model, we present an example of a parallel program implementation for the edge detection of a brain MRI image presenting pathology. Keywords: Parallel Computing, Parallel Programming, SIMD, Emulation, Parallel Virtual Machine, Parallel Programming Language. INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 149 I. INTRODUCTION Compute-intensive applications, such as data analysis, financial forecasts or scientific simulation and applications with high volume of data as image processing, multimedia, online video gaming and the social networking, need more processing power and availability of storage resources and computing. Analysis tools, the computation methods and their technological computational models, have known a very high level of progress. This progress has oriented the scientists toward new computation strategies based on parallel approaches. Due to the large volume of data to be processed and to the large amount of computations needed to solve a given problem, the basic idea is to split tasks and data so that we can easily perform their corresponding algorithms concurrently on different physical computational units. Naturally, the use of the parallel approaches implies important data exchange between computational units. Subsequently, this generates new problems of data exchange and communications. Naturally, the obvious need allow appears new computation strategies based on parallel approaches. Compared to the sequential approaches, the parallelism is absolutely more efficient. The highlight of this paradigm is to reduce the execution time for solving a given problems, it is extensively implemented on image processing [1][2][3], on Tomography[4], and also on the research area of Multimedia Content Analysis (MMCA)[5]. The need of the new architectures and the processor efficiency improvement has been excited and encouraged by the VLSI development. As a result, we have seen the new processor technologies (e.g. Reduced Instruction Set Computer “RISC”, Transputer, Digital Signal Processor “DSP”, Cellular automata etc.) and the parallel interconnection of fine grained networks (e.g. Linear, 2-D grid of processors, pyramidal architectures, cubic and hyper cubic connection machines, etc.). Several parallel architectures were created to solve parallel problems. In the past decade some simple grids of cellular automata have been created. Due to some technological enhancements, the cellular automaton became a fine grained processing element and the resulted grid became the well known mesh connected computer MCC [6]. Using some additional communication Buses, the MCC became the mesh with multiple broadcast [7] and polymorphic torus [8]. Finally, the reconfigurable mesh computer (RMC) integrates a reconfiguration network at each processing element [9, 10]. Recently, the graphical processing unit GPU architectures [11, 12, 13] that are largely used for graphical parallel computing, are introduced as the most feasible structure to represent some parallel architectures for more complicated problems. The theorists innovate more and more by imagining new parallel machines having well adapted topologies to specific problems. These innovations lead to new algorithms for high performance calculation. Unfortunately, the technology is not able to follow the scientist’s imaginations. Due to the unavailability or to the high cost of this kind of real parallel machines, we conclude that creating emulators for these architectures is the first good way to test and validate the parallel algorithms on serial machines. In this context, the realization of an emulator for SIMD parallel machines has been the first exploited model in our research works [14], [15], [16] and [17]. This emulator has been of great use to elaborate, validate and test new parallel algorithms without need the real parallel machines. However, emulating a parallel machine on a single processor machine do not introduce any enhancement in terms of performance at runtime. It is therefore necessary to look for other ways to achieve these emulated parallel applications in a real environment that offers a high performance gain. The parallel and distributed systems have become a natural solution for different types of environments [18]. In such systems many middleware have been designed and used to build grid computing. Subsequently, new challenges appear, such as load balancing and fault tolerances. The quality and performance of a grid computing depend on the performance of the machines used in its nodes, the interconnection networks and mainly the quality of the associated middleware. Researchers intensify their efforts to find new solutions and new high-performance middleware
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 150 having new innovative features to improve the performance of parallel virtual machines based on grid computing. In this paper, we present a new model to describe and emulate a polymorphic massively parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modelled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU structure. All these computers are based physically on a multitude of fine grained processing elements (PE) arranged and coupled according to their associated topological pattern. In this model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host agents are interconnected throw a multi agent system platform. This feature allows combining the performances of a set of physical computers to build a high performance massively parallel virtual machine. The developed software is based on a hard kernel of a parallel virtual machine in which we translate all the physical properties of its different components. This kernel can be easily extended to the other mentioned topological structures. To implement a parallel program in the proposed platform, we have developed a new parallel programming language based on XML and its compiler which allow editing, compiling and running parallel programs. The rest of this paper is organized as follows. In Section 2, we will present the proposed massively parallel computational model. In the section 3, we present the architecture and the design this model. In the newt section, we illustrate the performance of this model by presenting an example of a parallel program implementation for the edge detection of a brain MRI image presenting pathology. Finally, the last section gives some concluding remarks and some exploiting perspectives. II. THE PROPOSED PARALLEL COMPUTATIONAL MODEL A. The massively virtual computer structure As it mentioned previously, one of the clear trends of high performance is the Parallel Computing, aimed to substitute and distribute task and data between elementary computers. Consequently, computer architects have always strived to increase the performance of their computer architectures, allowing appears several architectures in scientific literature such as the linear, 2D, 3D meshes, pyramidal structures and GPU structure. In this work, our virtual Reconfigurable Parallel Computer (PRC) is based on the reconfigurable 3D mesh easily extensible to the other mentioned architectures. A 3D Reconfigurable Mesh Computer (3D-RMC) of size M x N x K, is a massively parallel machine having M x N x K Processing elements (PEs) arranged on a 3-D matrix. The figure 1 shows a 3-D MCR of size 3 x 3 x 3. It is a Single Instruction Multiple Data (SIMD) structure, in which each PE (i, j, k) is localized in Cartesian coordinates (i, j, k) and has an identifier defined by ID = M x i + N x j+ k. A 2D RMC of size n x n has a finite number of registers of size (log2 n) bits. The PEs can carry out arithmetic and logical operations. They can also carry out reconfiguration operations to exchange data over the mesh. The PEs can use a shared a memory. Each PE is a self Thread autonomous component. All the PEs are managed by a virtual host manager represented by a distributed agent that is designed to load parallel program and global data to distribute them over the mesh of PEs for any execution. A virtual host agent can communicate with other host agents deployed in the distributed system. This feature gives the possibility to combine the performances of a set of distributed physical computers in order to build a massively parallel virtual machine.
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 151 Figure 1: A Virtual host Agent representing a 3D Reconfigurable Mesh Computer of size 3 x 3 x 3 B. Processing Element (PE) Model In this part, we have translated all the physical components of a PE as shown in figure 2, and its functionalities to the virtual processor model. Thus, the state of the virtual PE is defined by the following components: • Identifier registers (iReg, jReg, kReg): Used to identify the PE in virtual PE topology. • Data registers (Reg[]): Used to load the single data to be performed by the PE. For example, the data may be a set of pixels representing a part of an image to be processed by the virtual computer. • Flags: Each of its flag bits will indicate the PE state related to any instruction. • Communication Ports: Defined variables: p1, p2, …, pn. • ALUnit: To perform logical and arithmetical operations, • Other components: Notice that the PRC machine emulator is an open system. It can be dynamically and easily extended in terms of register memories, communication bus width, and special functions according to the problem in query. In such cases of special advanced programming applications, some algorithms require special additional registers. The programmer has the ability to extend easily its computational model. The figure 2 shows a simplified model of two connected virtual processing elements used in our model. Figure 2. The main virtual PE components Shared Memory PE 000 H O S T P7 P5 P6 P4 P3 P2 P1 P0 iReg iReg iReg ALU Reg[0] Reg[1] Reg[..] Flags Bridge idReg Other PE 000 P7 P5 P6 P4 P3 P2 P1 P0 iReg iReg iReg ALU Reg[0] Reg[1] Reg[..] Flags Bridge idReg Other PE 010
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 152 The behavior of the virtual VE is defined by the following operations: • Basic PE operations Like any processor, each processing element (PE) of the RMC can execute a set of instructions relating to the arithmetic and logical operations. The operands concerned can be the local data of a PE or the data arising on its communication channels (Bit-Model or Word-model) after any data exchange operation between the PEs. The PE can be configured as a marked PE to participate in performing parallel operations. The PE can also carry out the bridge configurations in order to establish connections between two or more communication channels. When the PE of the RMC is in a Simple Bridge (SB) state, it establishes connections between two of its communication channels. This PE can connect itself to each bit of its channels, either in transmitting mode, or in receiving mode, as it can be isolated from some of its bits (i.e. neither transmitter, nor receiver). A PE is in a Double Bridge (DB) state when it carries out a configuration having two independent buses. A PE is in Crossed Bridge (CB) State when it connects all its active communication channels in only one; each bit with its correspondent. This operation is generally used when we want to transmit information to a set of PEs at the same time. III. ARCHITECTURE AND DESIGN OF THE PROPOSED MODEL The proposed parallel virtual machine is represented by a set of distributed agents. Each agent represents the host of its local virtual machine. Each host has a set of virtual processors arranged according to the topology of the virtual machine (2D, 3D pyramidal, etc...). The virtual PE is a self threaded objet which is ready to run the basic instructions of the parallel program broadcasted by the virtual host. In the proposed model, we have defined an abstract layer of the virtual machine which represents the core of this framework. We have defined some concrete implementations of the virtual machine, but the programmer can add other implementations extending the features of the model. The UML class diagram of Figure 3 shows the main components of the proposed model. In the realized framework, the parallel programmers must edit or open an edited parallel program and compile it before its execution. In the developed emulator, we have proposed a new parallel programming language based on XML language according to a specific developed XML schema. Figure 3: The UML class diagram of the main components of the parallel virtual computer In figure 4, we represent the architecture of the parallel virtual computer. In this architecture, the used components are configured using an xml configuration file. The parallel xml program is Agent AbstractParallelVirtualComputer Mesh3 GPU ALU Port * * Other Thread VPEImpl OtherVPEImpl AbstractVirtualPE
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 ISSN 0976 - 6375(Online), Volume 5, Issue parsed and compiled to a parallel object program. This later is executed by the host agent by broadcasting and synchronizing data and instructions flows to the marked PEs of the parallel virtual machine. Figure 4: Reconfigurable Parallel Computer architecture A parallel program loaded in the platform is composed by a set of instructions. Each instruction is defined by zero a set attributes. An instruction contains other child instructions. The UML class diagram of the figure 5 represents the structure of parallel object program. Figure 5: The UML class diagram of the XML Parallel progra IV. APPLICATION In this section, we present an example of a parallel algorithm implementation for contour detection of a gray levelled image using Sobel operator [ The Sobel operator is used in image processing, particularly within edge detection algor Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel operator is either the corresponding gradient vector or the norm the image with a small, separable, and integer valued filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient approximation that it produces is relatively crude, in particular for high frequency variations in the image. The operator uses two 3×3 kernels which are convolved with the original image to calculate approximations of the derivatives: one for horizontal c the source image, and Gx and Gy are two images which at each point contain the horizontal and vertical derivative approximations, the computations are as follows [20] ParallelProgram -fileName : String +compile () :void +run() Computer Engineering and Technology (IJCET), ISSN 0976 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 153 parsed and compiled to a parallel object program. This later is executed by the host agent by ing and synchronizing data and instructions flows to the marked PEs of the parallel virtual Reconfigurable Parallel Computer architecture A parallel program loaded in the platform is composed by a set of instructions. Each ruction is defined by zero a set attributes. An instruction contains other child instructions. The UML class diagram of the figure 5 represents the structure of parallel object program. The UML class diagram of the XML Parallel program In this section, we present an example of a parallel algorithm implementation for contour detection of a gray levelled image using Sobel operator [19]. The Sobel operator is used in image processing, particularly within edge detection algor Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel operator is either the corresponding gradient vector or the norm of this vector. The Sobel operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient proximation that it produces is relatively crude, in particular for high frequency variations in the The operator uses two 3×3 kernels which are convolved with the original image to calculate approximations of the derivatives: one for horizontal changes, and one for vertical. If we define A as the source image, and Gx and Gy are two images which at each point contain the horizontal and vertical derivative approximations, the computations are as follows [20]: Instruction - name : String +execute () :void * * Attribute - name : String - value : String +execute () :void * Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), © IAEME parsed and compiled to a parallel object program. This later is executed by the host agent by ing and synchronizing data and instructions flows to the marked PEs of the parallel virtual A parallel program loaded in the platform is composed by a set of instructions. Each ruction is defined by zero a set attributes. An instruction contains other child instructions. The UML class diagram of the figure 5 represents the structure of parallel object program. In this section, we present an example of a parallel algorithm implementation for contour The Sobel operator is used in image processing, particularly within edge detection algorithms. Technically, it is a discrete differentiation operator, computing an approximation of the gradient of the image intensity function. At each point in the image, the result of the Sobel operator is either the of this vector. The Sobel operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of computations. On the other hand, the gradient proximation that it produces is relatively crude, in particular for high frequency variations in the The operator uses two 3×3 kernels which are convolved with the original image to calculate hanges, and one for vertical. If we define A as the source image, and Gx and Gy are two images which at each point contain the horizontal and Attribute : String : String :void
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 154 Gx ൌ ൥ െ1 0 ൅1 െ2 0 ൅2 െ1 0 ൅1 ൩ ‫כ‬ A and Gy ൌ ൥ ൅1 ൅2 ൅1 0 0 0 െ1 െ2 െ1 ൩ ‫כ‬ A Where * here denotes the 2-dimensional convolution operation. The x-coordinate is defined here as increasing in the right direction, and the y-coordinate is defined as increasing in the down direction. At each point in the image, the resulting gradient approximations can be combined to give the gradient magnitude, using: G ൌ ඥGxଶ ൅ Gyଶ In the sequential algorithm the complexity of this algorithm is evaluated to N. N represent the pixel count of the image. We will show that in this parallel implementation the complexity is Ɵ One time. The structure of a parallel program implemented in the defined XML language is described as bellow: <parallelProgram> <loadImage file="brain1.jpg" reg="0" codage="8"/> <markAllPEs/> <exchangeData reg="0" /> <compute expression="reg[9]=Math.abs(-reg[8]-2*reg[1]- reg[2]+reg[4]+2*reg[3]+reg[6])"/> <compute expression="reg[10]=Math.abs(reg[8]+2*reg[7]-reg[2]+reg[4]-2*reg[5]- reg[6])"/> <compute expression="reg[1]=reg[9]+reg[10]"/> </parallelProgram> • Step 1 : Loading the image the massively virtual machine : <loadImage file="brain1.jpg" reg="0" codage="8"/> As shown in figure 6, in this step, the gray level value of each pixel of the image is loaded in the register 0 of each PE of the virtual computer. Figure 6: Data registers contents of 9 PEs after loading the image 20 0 0 00 PE[i,j] 0 00 0 11 0 0 00 PE[i-1,j] 0 00 0 4 0 0 00 PE[i,j+1] 0 00 0 33 0 0 00 PE[i-1,j+1] 0 00 0 8 0 0 00 PE[i+1,j+1] 0 00 0 44 0 0 00 PE[i+1,j] 0 00 0 22 0 0 00 PE[i+1,j-1] 0 00 0 2 0 0 00 PE[i,j-1] 0 00 0 10 0 0 00 PE[-1,j-1] 0 00 0
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 155 • Step 2 : Marking all PEs: The statement <markAllPEs/> is used by the host agent to mark all the virtual PEs which are designated to participate to the parallel computing. • Step 3 : Exchanging data between the PE and its 8 neighbours: <exchangeData reg="0"/> As shown in figure 7, in this step each PE will exchange the data stored in register 0 with its 8 neighbours, if they exist. The received data are stored in other registers reg[1], reg[2], reg[3], reg[4], reg[5], reg[6], reg[7] and reg[8]. Figure 7: Data registers contents of 9 PEs after the exchanging data operation • Step 3 : Applying Sobel operator : Each PE compute Sobel operator Gx, Gy and G, then save them, respectively, in reg[9], reg[10] and reg[1]. <compute expression="reg[9]=Math.abs(-reg[8]-2*reg[1] reg[2] + reg[4]+2*reg[3]+reg[6])"/> <compute expression="reg[10]=Math.abs(reg[8]+2*reg[7]-reg[2] +reg[4]-2*reg[5]-reg[6])"/> <compute expression ="reg[1]=Math.sqrt(reg[9]*reg[9]+reg[10] *reg[10])"/> Figure 8 shows the parallel program result of a component contour detection of a gray levelled image using Sobel operator. The image in figure 8.a represents the input image representing a brain magnetic resonance image. The resulted output image after the parallel Sobel algorithm is shown in figure 8.b. 20 44 11 42 PE[i,j] 33 822 10 11 20 0 3310 PE[i-1,j] 0 42 0 4 8 0 020 PE[i,j+1] 0 044 0 33 4 0 011 PE[i-1,j+1] 0 020 0 8 0 4 044 PE[i+1,j+1] 0 00 20 44 0 20 822 PE[i+1,j] 4 00 2 22 0 2 440 PE[i+1,j-1] 20 00 0 2 22 10 200 PE[i,j-1] 11 440 0 10 2 0 110 PE[-1,j-1] 0 200 0
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 156 Figure 8: Results of the parallel program for contour detection using parallel Sobel operator V. CONCLUSION In this paper, we have presented a new model of a polymorphic massively parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modelled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures and GPU structure. In this model the host is represented by a distributed agent. Each virtual host agent, deployed in a physical computer, manages a local parallel virtual computer composed by a set of virtual processing elements (VPE). Each VPE is represented by a self threaded object. The distributed virtual host agents are interconnected throw a multi agent system platform. The obtained parallel virtual machine and its programming language compiler can be used as a high performance computing system. Actually, we have used this tool to resolve problems related to image processing domain using parallel approach and specially SIMD architecture. However, this platform can be used to resolve massively parallel applications related to other domains. In this model, we have proposed an abstract layer representing the core of the framework to allow other developers adding new extensions for new parallel virtual machine structures. Using a new parallel programming language based on XML is a good solution for this platform. However, we are working to develop other modules which allow writing parallel programs using the scientific programming languages like C++, Java or Fortran. REFERENCES [1]. AWG Duller, RH Storer, AR Thomson, EL Dagless « An associative processor array for image processing» Image and Vision Computing Volume 7, Issue 2, May 1989, Pages 151–158. [2]. O.Bouattane, J.Elmesbahi, and A.Rami, « A fast parrallel algorithm for convex hull problem of multi-leveled images » journal of Intelligent and Robotic Systems, Kluwer academic Publisher, Vol.33, pp. 285-299, 2002. [3]. A.Errami, M.Khaldoun, J.Elmesbahi, and O.Bouattane, « O(1) time algorithm for structural characterization of multi-leveled images and its applications on a re-configurable mesh computer » journal of Intelligent and Robotic Systems, Springer, Vol .44, pp. 277-290, 2006. [4]. F.Vázquez, E.M.Garzón, J.J.Fernández « A matrix approach to tomographic reconstruction and its implementation on GPUs » Journal of Structural Biology Volume 170, Issue 1, April 2010, pp. 146–151. a) b)
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 3, March (2014), pp. 148-157 © IAEME 157 [5]. Timo van Kessela, Ben van Werkhovena, Niels Drostb, Jason Maassenb, Henri E. Bala, Frank J. Seinstrab, « User transparent data and task parallel multimedia computing with Pyxis-DT » Future Generation Computer Systems Volume 29, Issue 8, October 2013, Pages 2252–2261 Including Special sections: Advanced Cloud Monitoring Systems & The fourth IEEE International Conference on e-Science 2011 — e-Science Applications and Tools & Cluster, Grid, and Cloud Computing. [6]. R. Miller. et al, “Geometric algorithms for digitized pictures on a mesh connected computer,” IEEE Transactions on PAMI, Vol. 7, No. 2, pp. 216–228, 1985. [7]. V. K. Prasanna and D. I. Reisis, “Image computation on meshes with multiple broadcast,” IEEE Transactions on PAMI, Vol. 11, No. 11, pp. 1194–1201, 1989. [8]. H. LI, M. Maresca, “Polymorphic torus network,” IEEE Transaction on Computer, Vol. C-38, No. 9, pp. 1345– 1351, 1989. [9]. T. Hayachi, K. Nakano, and S. Olariu, “An O ((log log n)²) time algorithm to compute the convex hull of sorted points on re-configurable meshes,” IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 12, 1167– 1179, 1998. [10]. R. Miller, V. K. Prasanna-Kummar, D. I. Reisis, and Q. F. Stout, “Parallel computation on re- configurable meshes,” IEEE Transactions on Computer, Vol. 42, No. 6, pp. 678–692, 1993. [11]. Yue Zhao, Francis C.M. Lau, "Implementation of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 663-672, March 2014, doi:10.1109/TPDS.2013.52 [12]. Jing Wu, Joseph JaJa, Elias Balaras, "An Optimized FFT-Based Direct Poisson Solver on CUDA GPUs," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 3, pp. 550-559, March 2014, doi:10.1109/TPDS.2013.53 [13]. Abid Rafique, George A. Constantinides, Nachiket Kapre, "Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs," IEEE Transactions on Parallel and Distributed Systems, 28 Feb. 2014. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TPDS.2014.6> [14]. M. Youssfi, O. Bouattane and M. O. Bensalah “A Massively Parallel Re-Configurable Mesh Computer Emulator: Design, Modeling and Realization” J. Software Engineering & Applications, 2010, 3: 11-26 doi:10.4236/jsea.2010.31002 Published Online January 2010. [15]. Bouattane, B. Cherradi, M. Youssfi and M.O. Bensalah « Parallel c-means algorithm for image segmentation on a reconfigurable mesh computer » ELSEVIER. Parallel computing, 37 (2011), pp 230-243. [16]. M. Youssfi, O. Bouattane, and M.O. Bensalah “ On the Object Modelling of the Massively Parallel Architecture Computers” Proceedings of the IASTED Inter. Conf. Software engineering, February 16 - 18, 2010, Innsbruck, AUSTRIA. Pp 71-78. [17]. M. Youssfi, O. Bouattane, and M.O. Bensalah “Parallelization of the local image processing operators. Application on the emulating framework of the reconfigurable mesh connected computer”. Proceeding of scientists meeting in Information Technology and Communication JOSTIC. Rabat, November 3-4, 2008. pp 81-83. [18]. C. Kesselman, S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of High Performance Computing Applications, vol. 15, issue3, pp 200–222, 2001. [19]. SOBEL, I., An Isotropic 3×3 Gradient Operator, Machine Vision for Three – Dimensional Scenes, Freeman, H., Academic Pres, NY, 376-379, 1990 [20]. R. Gonzalez and R. Woods Digital Image Processing, Addison Wesley, 1992, pp 414 – 428. [21]. Arup Kumar Bhattacharjee and Soumen Mukherjee, “Object Oriented Design for Sequential and Parallel Software Components”, International Journal of Information Technology and Management Information Systems (IJITMIS), Volume 1, Issue 1, 2010, pp. 32 - 44, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413.