• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
An Opportunistic Storage System for UnaGrid
 

An Opportunistic Storage System for UnaGrid

on

  • 697 views

UnaGrid is an opportunistic virtual grid infrastructure that takes advantage of the idle processing capabilities of conventional desktop machines in computer rooms through the use of Customizable ...

UnaGrid is an opportunistic virtual grid infrastructure that takes advantage of the idle processing capabilities of conventional desktop machines in computer rooms through the use of Customizable Processing Virtual Clusters (CPVCs), these capabilities are used within the development of e-Science projects. This paper presents the design, implementation and assessment of a virtual storage system, which simultaneously allows UnaGrid to take advantage of the storage and processing capabilities available in tens or hundreds of desktop machines. The first tests have shown that this system allows attaining large storage capabilities, at low cost, and superior performance than a NFS-NAS dedicated solution.

Statistics

Views

Total Views
697
Views on SlideShare
696
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

http://www.slashdocs.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An Opportunistic Storage System for UnaGrid An Opportunistic Storage System for UnaGrid Presentation Transcript

    • An Opportunistic Storage System for UnaGrid Harold Castro, Mario VillamizarDepartment of Systems and Computing Engineering Universidad de los Andes Bogotá, Colombia
    • Introduction Service Grids GridComputing Opportunistic Grids
    • Introduction Projects GIMPS - Distributed.Net SETI@home – BOINCOpportunistic Condor – XtremWeb Grid Javelin – Bayanihan Entropia - OurGrid InteGrade UnaGrid
    • Project: Campus Grid Uniandes - UnaGrid Take advantage of the idle processing capabilities available inconventional computer labs. Support the development of e-Science projects.
    • UnaGrid X X Cores Cores Linux Linux Processing Processing Virtual Machine Virtual Machine Physical Machine of a Physical Machine of a Computer Room Computer Room b. When there is not an End User a. When there is an End User using using the physical machine the physical machineA Processing Virtual Machine (PVM) is executed on each computer of alab, which is executed in background as a low priority process.The PVM is executed in a transparent manner while the users executetheirs daily activities.
    • UnaGrid – Customizable Processing Virtual Cluster (CPVC) Computer lab VM VM VM A CPVC is composed of PVMs VM VM VM executed on each computer Master of a lab (cluster slaves) and a Dedicated computer outside the computer lab dedicated machine (master cluster). VM VM VM Computer Lab – Virtual Cluster Slaves Each research group can define its own CPVCs with customapplication environments (middlewares, applications, etc.).
    • UnaGrid – An Opportunistic Grid Infrastructure
    • UnaGrid – Current Storage SystemA dedicated NFS-NAS storage solution has been used in which all CPVCs store their data
    • Problem and motivation UnaGrid Disk space benefits available in computer labsA strategy to implement a Virtual Distributed Storage System Take advantage of the idle storage capabilitiesA transparent system for users and applications Provide new storage capabilities to UnaGrid infrastructure
    • Possible Solutions A new file system or opportunistic system Use an opportunistic The UnaGrid requirements system require another approach.Use a distribute or parallel file system
    • UnaGrid Requirements The desktops machines of the computer labs have The CPVCs operates with Windows, Linux or Mac, as Linux operating system.their base operating system.The virtual distributed storage system must be executed fromWindows, Linux or Mac desktops and used form Linux CPVCs. Solution Virtualization Technologies
    • Strategy Proposed n n n Gigabytes Gigabytes Gigabytes Storage Server Storage Server Storage Server Customizable Metadata Server n Gigabytes n Gigabytes n Gigabytes Storage Virtual Cluster (CSVCs) Storage Server Storage Server Storage Server Computer outside the computer lab Computer lab A VM is executed on each computer of a computer lab, thismachine operates as a storage server. An additional VM it is necessary as metadata server.
    • Two Virtual Machines on each Computer X X X X Gigabytes Cores Gigabytes Cores Linux Linux Linux Linux Storage Processing Storage Processing Virtual Machine Virtual Machine Virtual Machine Virtual Machine Windows Windows Physical Machine of Physical Machine of a Computer Lab a Computer Lab a. When there is not a user using the b. When there is a user using the physical machine physical machineIntrusion level on the end user.Priorities and resources assigned to VMs.Resource competition between the VMs.
    • Solution Strategy Definition of a virtualstorage cluster by computerlab. Concurrent executionwith the CPVCs. Take advantage of theidle processing and storagecapabilities of eachcomputer.
    • Solution StrategyAny opportunistic system or distributed file system may beexecuted on a Customizable Storage Virtual Cluster (CSVC) This strategy must be validated. Current opportunistic Parallel and distributed filesolutions do not meet the systems can be used to UnaGrid requirements validated the strategy proposed
    • Methodology Intrusion level on the end userResource competition between virtual machinesPerformance evaluation of the strategy proposed
    • Level intrusion on the end user X X Several tests were conducted Gigabytes Cores to determine the best resource Linux Linux assignation to the two virtual machines executed in a non- StorageVirtual Machine Processing Virtual Machine intrusive manner: VMs executed in Windows background. Resource assigned to VMs. Tasks executed by the end Physical Machine of a Computer Lab user. b. When there is a user using the Tasks executed by the two physical machine virtual machines.
    • Level intrusion on the end user 1 2 1 2 Core Cores Core Cores Processing Virtual Machine Storage Virtual Machine Intensive processing task Intensive storage taskFour type of tests were executed when the end user execute:One or two intensive processing tasks.One or two intensive storage tasksWe configured 8 execution environments.
    • Level intrusion on the end user Results when the end user executes one intensiveprocessing tasks. EVALUATION OF THE PERFORMANCE DEGRADATION WHEN THE USER EXECUTES 1 INTENSIVE PROCESSING TASK ID Processing Virtual Machine Storage Virtual Machine Average Execution Time (4 Tests) - Seconds In execution # Cores Activity RAM In execution # Cores Activity RAM 100000 200000 300000 400000 500000 A1 No NA NA NA No NA NA NA 22,07 44,09 66,14 88,20 110,24 A2 Yes 1 1 Task 1 GB No NA NA NA 22,15 44,27 66,41 88,53 110,66 A3 Yes 2 2 Tasks 1 GB No NA NA NA 22,19 44,32 66,48 88,65 110,82 A4 No NA NA NA Yes 1 1 Task 1 GB 22,14 44,27 66,39 88,54 110,66 A5 No NA NA NA Yes 2 2 Tasks 1 GB 22,21 44,38 66,56 88,73 110,94 A6 Yes 1 1 Task 1 GB Yes 1 1 Task 1 GB 22,15 44,29 66,42 88,67 110,70 A7 Yes 2 2 Tasks 1 GB Yes 1 1 Task 1 GB 22,19 44,35 66,52 88,71 110,89 A8 Yes 2 2 Tasks 1 GB Yes 2 2 Tasks 1 GB 22,18 44,37 66,55 88,71 110,89 Maximum performance degradation (%): 0,65 0,66 0,64 0,61 0,63
    • Level intrusion on the end user The execution of the two virtual machines executed inbackground decrease the QoS perceived by the end user byless than 4%.One intensive processing task: 0.66%Two intensive processing tasks: 1.24%One intensive storage task : 2.45%Two intensive storage tasks : 3.35% We executed 640 tests using an application calledUnaGridLoadSimulator.
    • Resource competition between virtual machines 100 90 80 70% CPU usage 60 End User Process 50 Processing Virtual Machine 40 Storage Virtual Machine 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19The VMs only use the processing capabilities does not used bythe end user, these capabilities are equitably divided betweenthe VMs.
    • Performance evaluation of the strategy proposed We evaluated 4distributed file systemsand the NFS-NAS solution. A computer lab with 31computers. 62 VMs were configuredfor each file system. Condor scheduler.
    • Performance evaluation of the strategy proposed Operating AggregatedFile system Version System Capacity PVFS 2.8.1 Debian 4.0 344 GB Gfarm 2.3.0 Debian 4.0 346 GB Lustre 1.8.1 RHEL 5 300 GB GPFS 3.2.1-13 RHEL 5 300 GB
    • Performance Evaluation – From one client 120,00 45,00 40,00 Write Bandwidth (MB/sec) Read Bandwidth (MB/sec) 100,00 35,00 80,00 30,00 NFS 25,00 NFS 60,00 PVFS 20,00 PVFS 40,00 Gfarm 15,00 Gfarm Lustre 10,00 Lustre 20,00 GPFS 5,00 GPFS 0,00 0,00 I/O Request Size (KB) I/O Request Size (KB)The performance of the file systems were tested from one client(one PVM of the CPVC) varying the size of the I/O requests for a 1 GB file. We used the IOzone tool.
    • Performance Evaluation – Read from several clients 120,00 Average Aggregate Read Rate 100,00 80,00 NFS (MB/sec) 60,00 PVFS 40,00 Gfarm 20,00 Lustre 0,00 GPFS 1 3 6 9 12 15 18 21 24 27 30 Number of Concurrent Clients As the number of clients increases, the average bandwidth per clientdecreases. As the number of file system clients (PVMs) increases there is a higher probability that the two VMs executed on each physicalmachine operate as a client (PVM) and server (SVM) of the file systems.
    • Performance Evaluation – Read from several clients 700,00Aggregate Read Rate 600,00 500,00 NFS (MB/sec) 400,00 PVFS 300,00 200,00 Gfarm 100,00 Lustre 0,00 GPFS 1 3 6 9 12 15 18 21 24 27 30 Number of Concurrent ClientsAs the number of clients increases, global performance of the file systems also went up. GPFS = 580.79 MB/s, Lustre = 425.17 MB/s, Gfarm = 310.88 MB/s, PVFS = 244.73 MB/s, NFS = 18.61 MB/s
    • Performance Evaluation – Write from several clients 40,00 Average Aggregate Write Rate 35,00 30,00 25,00 NFS (MB/sec) 20,00 PVFS 15,00 Gfarm 10,00 5,00 Lustre 0,00 GPFS 1 3 6 9 12 15 18 21 24 27 30 Number of Concurrent ClientsAs the number of clients increases, the average bandwidth per client decreases.
    • Performance Evaluation – Write from several clients 300,00 Aggregate Write Rate 250,00 200,00 NFS (MB/sec) 150,00 PVFS 100,00 Gfarm 50,00 Lustre 0,00 GPFS 1 3 6 9 12 15 18 21 24 27 30 Number of Concurrent ClientsGlobal performance for the file systems increases up to a determined number of clients (15 or 18) and then begins to decrease. Lustre = 270.01 MB/s, GPFS = 246.88 MB/s, Gfarm = 211.47 MB/s, PVFS = 116.15 MB/s, NFS = 5.93 MB/s
    • Performance Evaluation Analysis When the CPVCs execute intensive processing tasks,performance of the file systems (CSVCs) is affected by less than4%. With the use of a CSVC it is possible to achieve readbandwidths of 4.5 Gbps and write bandwidths of 2.2 Gbps. Several terabytes may be grouped through the proposedstrategy. With a CSVC bandwidths higher than 1 Gbps are attainedwithout the need of lay down more cable.
    • Conclusions The strategy of using CPVCs and CSVCs in computer labsconcurrently and transparently allow to take advantage of thenon-used processing and storage capabilities. Hundreds of processing cores and several terabytes may begrouped through the proposed strategy for the development ofe-Science projects. The strategy allows personalizing the tools, middleware,applications, and configurations of the CPVCs and the CSVCs,guarantying the usability of the UnaGrid infrastructure.
    • Future Work Assessment of the strategy proposed in a productionenvironment and its scalability. Assessment of the performance of applications that useCSVCs. The use of policies and mechanisms of redundancy in theCSVCs. The use of strategies for data placement. Performance evaluation with other opportunistic and filesystems.
    • Thanks for your attention! Questions?
    • UnaGrid – Implementation Three computer rooms(with 35 computers eachone). Core 2 Duo processorsand 4 GB of RAM memory. Three CPVCs. Condor. VMware. Globus.
    • Possible Solutions A new file system or opportunistic system Use an opportunistic The UnaGrid requirements system require another approach.Use a distribute or parallel file system
    • A new file system or opportunistic system Long time is required Data and metadata distribution. Metadata management. Cache management. Implementation (kernel or user). Storage media. Communication protocols. User management. Scalability. POSIX semantics. Replication tools. Others.
    • Related Work – Opportunistic Systems Desktop DataFarSite FreeLoader OppStore Grid (DDG)¿What features must the UnaGrid storage system have?
    • Related Work – Opportunistic Systems FreeLoader OppStore Data Grid Desktop Farsite Property / SystemApplication modification required no yes yes noOperation in Linux environments no yes yes yesNatively integrated with the operating yes no no yessystemData redundancy support by software yes yes yes yesInstallation on PC desktops yes yes yes yesNon-intrusive operation no yes yes yesAvailable for installation no no yes no
    • Related Work – Opportunistic Systems Desktop Data FreeLoader OppStore Farsite Grid Property / SystemDesigned for HPC no yes yes yesSecurity mechanisms yes no NA NALicense type Pr OS OS OSData striping support no yes yes no C/SModel for metadata management P2P C/S C/S P2PModel for file/fragment transfer P2P P2P P2P P2P
    • Related Work – Distributed File Systems Data Data/Metadata Data Metadata Data Data/Metadata Data Server Server Server Server Server Server Server NETWORK Client Client Client Client Client ClientParallel Virtual File System General Parallel File System (PVFS) (GPFS) Grid Datafarm Sun Microsystems Gfarm Lustre
    • Related Work – Distributed File Systems Lustre Gfarm GPFS PVFS Property / SystemApplication modification required no no no noOperation in Linux environments yes yes yes yesNatively integrated with the operating yes yes yes yessystemData redundancy support by software no yes no yesInstallation on PC desktops yes yes yes yesNon-intrusive operation no no no noAvailable for installation yes yes yes yes
    • Related Work – Distributed File Systems Lustre Gfarm GPFS PVFS Property / SystemDesigned for HPC yes yes yes yesSecurity mechanisms yes yes yes yesLicense type OS OS OS PrData striping support yes no yes yesModel for metadata management C/S C/S C/S C/SModel for file/fragment transfer P2P P2P P2P P2P