The International Journal of Engineering and Science


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The International Journal of Engineering and Science

  1. 1. The International Journal of EngineeringAnd Science (IJES)||Volume|| 1 ||Issue|| 1 ||Pages|| 13-17 ||2012||ISSN: 2319 – 1813 ISBN: 2319 – 1805 Research of elastic management strategy for cloud storage 1 SHAO Bi-lin, 2BIAN Gen-qing, 3ZHU Xu-dong 1, First Author School of Management, Xi’an University of Architecture and Technology, *2, Corresponding Author School of Information and Control Engineering, Xi’an University of Architecture and Technology, 3 Schools of Information and Control Engineering, Xi’an University of Architecture and Technology ,-------------------------------------------------------------Abstract-------------------------------------------------------------In order to sol ve those issues including limited storage capacity, high cost of storage and fault recovery intraditional HDFS , cloud storage can effectively solve these problems by using virtual resources of the IaaS based onHDFS . But it cannot assure cloud storage to utilize virtual resources more effectively. In order to solve theseproblems, the paper proposes a elastic cloud storage framework based on HDFS , and introduces the thought ofvirtual resources management into the framework, and proposes a elastic management strategy of virtual resourceallocation and scheduling based on this framework. The simulation experiment shows cloud storage can effectivelyimprove the efficiency in the use of virtual resources.Keywords: HDFS; Elastic Cloud Storage; Virtual Resource; Resource Allocation and Scheduling; feedback control theory.-------------------------------------------------------------------------------------------------------------------------Date of Submission: 19, October, 2012 Date of Publication: 10, November 2012------------------------------------------------------------------------------------------------------------------------ I Introduce we should be even more concerned about the With the development of Web2.0 technology, actual efficiency of cloud storage to the virtualespecially the popularity of social network and resources. Therefore, through introducing the thought of virtual resource management, this articlepopular, the traditional storage technologies(network Storage or distributed storage) are proposes a kind of elastic storage architecture anddifficult to meet the demand of their vast amounts elastic management strategy of virtual resources. The simu lation results show that cloud storage canof unstructured data storage [1]. So more and morecommercial websites began to use cloud storage effectively improve the efficiency in the use ofarchitecture based on HDFS, such as Google, virtual resources.Amaze, Facebook, etc., the development isgathering mo mentu m. Design ideas of HDFS -based II HDFS on El astic Cloud Storagecloud storage derived fro m the virtual cluster The paper makes the virtual resource schedulingcomputing, which is expanded fro m a single v irtualstorage servers to a virtual storage server cluster units as Slot. Slot represents the characterizat ion of(ie: Hadoop virtual cluster). Hadoop virtual cluster virtual resources calculation ability, wh ich iscan be comb ined with mu ltiple virtual storage decided by the virtual resource CPU and memoryservers, just as a storage server which has a greatcomputing power and high-throughput provides size. Set CPU as the unit CPU capacity, takingusers with a transparent access interface [2]. 1GHz; mem on behalf of the unit memo ry capacity,However, the existing cloud storage has certainlimitat ions in study and implementation, which taking 1GB; a virtual server Pi is able to provideconcerned for high-throughput and high reliability a number of Slotexcessively, and ignored the cloud storage’s actual  cpui memi use rate on virtual resource, thus it seriously affects Sloti  min  ,  (1)the cloud storage application and popularization.  cpu mem Cloud storage for load-handling capability anddisturbance switching capability is important, butwww.thei The IJES Page 13
  2. 2. Research of elastic management strategy for cloud storage dynamically start, stop, move a service instance, Among them,   says round down, cpui is the   equivalent dynamic start, stop, migration of aCPU size of virtual server Pi , memi is the memory HDFS data node; secondly, through the virtualsize of virtual server Pi . In a mo ment t , HDFS resource management platform based on theloads is loadt , the nu mber of virtual server is N , virtual resource management capabilit ies for thethen available virtual resources: HDFS data node provides a flexib le v irtual N resource allocation strategy, dynamic adjustment Slottotal   Sloti (2) i 1 of Hadoop virtual cluster preformed resource N And HDFS virtual resource use rate: size, and allows mult iple HDFS shared resources; loadt loadt at the same time, through the encapsulation of a u  Slottotal N  cpui memi   min  cpu virtual resource scheduling service, real-t ime memtotal  , i 1  total  monitoring of HDFS and virtual resource load (3) informat ion, and according to the load informat ionIf the HDFS v irtual resource usage in an expected and the elastic scheduling strategy on data node,high and low water level, it is judged as effective HDFS elastically scheduling. Thus achievedresource. The problem to be solved is dynamic without affecting the existing HDFS structure andprogramming R, such that when a certain amount function ,to solve the resource waste and the problem of resource sharing, making use of manyof loadt . 0  ulow  u  uhigh  1 . Therefore, to existing virtual resource management platform tomake the HDFS schedule resource elastically achieve them is proposed, such as OpenNebula [3][4],according to the real-time load state and resource Platform[5]. in order to simulation test, virtualuse rate, it must be introduced the virtual resource resource management conduct ISF of Platformflexib ility management strategy. Flexible will be used. Application Application Applicationmanagement strategy should include: flexible Program Program Program HDFS Client HDFS Client HDFS Clientresource allocation strategy and flexib le resourcescheduling strategy. But if the new resource Internet/Intranetmanagement component, or resource scheduling Elastic scheduling Hadoop Cluster Hadoop Cluster service resources Service(HDFS1) Service(HDFS2)strategy is added to the existing HDFS, it will not Vitual Resource Management Platform(VM Orchestra)only increase the complexity of HDFS, but also instance instance instance instance instance instance instance instance instance instance Service Service Service Service Service Service Service Service Service Serviceincrease the HDFS manager addit ional cost, andincrease the load, thus affect the overall HDFS VM VM VM VM VM VM VM VM VM VM Hadoop-based elastic cloud storageload capacity and stability. Hypervisor(XEN) Hypervisor(XEN) Amazon EC2 This paper introduces the virtual resource ... Physical resource poolmanagement platform, and puts forward Elasticcloud storage system based on HDFS. First of all, Fig 1. Elastic Cloud Storage based on HDFSthe HDFS cluster is encapsulated into a virtual Shown in Figure 1, a elastic cloud storageresource management platform as a service, each based on HDFS is made up of four parts : a virtualservice instance represents a HDFS data node. The resource, virtual resource management platformuse of virtual resource management plat form for (VM Orchestra), Hadoop virtual cluster service,service instance lifetime management to realize Hadoop virtual cluster service flexib le scheduling.virtual resource management platfo rm to the Virtual resource is the carrier of Each HDFSHDFS data node lifetime management, i.e., by data nodes, managed by the virtual resourcewww.thei The IJES Page 14
  3. 3. Research of elastic management strategy for cloud storage management platform. Virtual resource Elastic cloud storage virtual resource management platform can be deployed to run a allocation strategy, consists of reservation strategy, range of services, the p latform can assign a virtual sharing strategy, borrowing and lending strategy, resource and start one or more service instance for recycling strategies. Reservation and sharing each service configuration. strategy provides a group of static exclusive and Hadoop virtual cluster service, namely HVCS. shared resources belonging to HCS. Reservation Each HVCS service represents a particular HDFS policy allows directly to HCS allocation reserved Hadoop cluster, consisting of a configuration and resources, regardless of the HCS can be used or one or more instances. In this paper, each instance not be used, even if HCS load is small; and of HVCS represents a HDFS data node, the sharing strategy provides shared resources for configuration of HVCS specifies the required HCS, including 2 parts: shared resource belonging informat ion when HDFS runs, and a group of to HCS and overload using shared resource. flexib le resource allocation strategy, namely Shared resource does not immed iately distribute HDFS flexib le resource allocation strategy of data HCS resource, HCS can use resource according to node. the configuration of the sharing proportion only Hadoop virtual cluster elastic scheduling HCS has more demand, wh ich requires more service, namely HVRS. A data node is increased, virtual nodes, and reserve resources has been removed or transferred fro m the HDFS cluster by exhausted; when reserved resources and sharing HVRS through the current HDFS resources resources still cannot meet the demand of HCS informat ion and the load informat ion, HDFS data virtual node, sharing strategy allows HCS to use node lifet ime control and service scheduling other HCS free sharing resources. Borrowing and interface and the provisions of flexibility principle lending strategy are used for HCS to reserve to perform HDFS data node elastic scheduling. resources sharing, this strategy will take effect HVCS resource usage was obtained by the only when shared resource has overload. By interface wh ich is provided by the virtual resource borrowing, lending the idle reserved resource, management platform, and the HDFS load HCS can effectively use the free reserved informat ion is provided by HDFS. resources in the Hadoop cluster, which effectively improves the environment resources use rate.III Virtual Resource Elastic Management Recycling strategy used to recycle HCS idle Strategy of Elastic Cloud Storage resources which is required sharing or borrowing Virtual resource elastic management strategy out, support HCS in its spare time, and because of of elastic cloud storage includes 2 parts: flexible increasing load need more virtual node, the others resource allocation strategy and flexib le resource can use the resources which belongs to HCS. The scheduling strategy. Flexib le resource allocation research results found that, after introducing the strategy, used to specify a set of the v irtual virtual resource allocation strategy, compared with resources for HDFS to ensure the elastic the conventional HDFS fixed using a group of the scheduling object available; the elastic scheduling allocation of resources, cloud storage on HDFS strategy, elastically schedules HDFS data node available resources are dynamic regulated through the usage of virtual resource, so as to according to load state. effectively use the virtual resource. 3.2 Virtual Resource El astic Scheduling IV Flexi ble Allocati on Strateg y on Virtual Strategy Resource www.thei The IJES Page 15
  4. 4. Research of elastic management strategy for cloud storage For every HDFS, the data node elastic Slottotal , i.e.:scheduling strategy is composed of a three tuplestructure P(S, A, R), wherein S is a sample space L(t )  s1L(t 1)  r1Slottotal (t 1)  r2 Slottotal (t  2)(Samp ling), A for the action space ofimplementation (Actions), R (Rule) represents (5)scheduling rules. The model parameters s1 , r1 and r2 , which S represents sampling space, wh ich consists ofthe virtual node load information {u(l (hi ))}iN 1 ,  can use regression minimu m variance (Recursivewherein i represents a virtual node, hi represents Least-Squares, RLS) method assessment. In orderthe virtual node load capacity, l (hi ) represents a to realize the flexib le control based on this model,virtual node load, u (l (hi )) represents virtual node the equation (5) is transformed the controlresource consumption. Then equation (6): S  {S (hi ) |{u(l (hi )) |1  i  N}} (4) A said elastic scheduling execution action 1 s r Slottotal (t )  Lref  1 L(t )  2 Slottotal (t  1) r1 r1 r1set, A  ( Aover , Awasted ) , wherein Aover represents (6)for resource use overload handling action, Where in Lref  L(t  1) , namely : the next Awasted represents waste treatment action. By (3) type R knowledge, scheduling rules are time step expected HDFS load information.defined as follo ws: The scheduling rule R is introduced to the * if the current HDFS in the resources control equation (6), wh ich can get elastic control equation (7):utilizat ion rate u h igher than uhigh , judged as 1 s1 r2  r Lhigh  r L(t )  r Slottotal (t  1) if Lhigh  L(t )resource usage overload, HCRS will perform the  1 1 1 1 s1 r2action Aover on the HDFS data node overload Slottotal (t )   Llow  L(t )  Slottotal (t  1) if Llow  L(t )  r1 r1 r1operation, namely: if u  uhigh , then Aover .  Slottotal (t  1) otherwise   * if the current HDFS resource using rate u Namely: the HSC can work only when theis lower than ulow , it is judged to exist HDFS Lhigh  L(t ) (h igher than supply) orwaste of resources, HCRS will perform Awasted ,HDFS node waste processing, namely : if u  ulow , Llow  L(t ) (less supply), and set new Slottotal (t ) ,then Awasted . namely the executive Aover or Awasted . The elastic scheduling is the process of V Simulati on test tracking HDFS load in formation {l (hi )}iN 1 , based  Virtual resource elastic management on the scheduling rules R, it schedules and strategies are tested, specific methods are as controls timely HDFSs Slottotal , and realization follows: v irtual resource management platform: of virtual resource is dispatching. In order to using virtual resource management product ISF of realize virtual resource automated flexible Platform , including 8 adjustable virtual resourcescheduling, introduced into the classical feedback Slots; and HDFS1 cluster contains 2 slots reserved [6][7] control theory , in modeling HDFS, the resources and 1 slots of shared resources; at the relations of load informat ion L  {l (hi )}iN 1 and  same time HDFS2 cluster contains 1 slots resourcewww.thei The IJES Page 16
  5. 5. Research of elastic management strategy for cloud storagereservation and 1 slots of shared resources; load difficult ies to share resources among mult iplesampling through the use of CPU rate HDFS, and the problems and difficulties are caused by the lack of flexib le resourcemeasurement, high water level uhigh and low management pattern. Against this problem, thiswater level ulow is set to 85%, 40%; Aover to paper proposes elastic cloud storage framewo rkincrease a slots, Awasted to remove a slots; HDFS based on the HDFS, and puts forward HDFS datatesting tools as HDFS Bench. Testing server test node flexible resource allocation strategy andtool over a period of time, respectively for 2 elastic scheduling strategy according to itsHDFS cluster time tested, through the observation characteristic. The simulat ion experiment showsof 2 virtual cluster with load change for virtual that elastic management strategy can improveresources (slots) usage. The test results as shown HDFS existing problems, and effectively imp rovein figure 2. the efficiency in the use of virtual resource. Acknowledg ment This research is supported by National Natural Science Foundation of China (61073196&61272458) and Natural Science Research Foundation of Shanxi Province. China (2011JM 8026). Reference [1] M. Armbrust, A. Fox, D. A. Patterson, N. Lanham, B. Trushkowsky,J.Trutna, and H. Oh. Scads: Fig 2. Simu lation Result Scale-independent storage for social computing The results show that, when HDFS1 loading applications. In Proc. of CIDR, 2009.amount is small, it uses only one slots, while the [2] S. Ghemawat, H. Gobioff, and S.-T. Leung. The Googleremainder of the resource is the part to high load file system. In Proc. of SOSP, 2003.Volume 37 Issue 5,HDFS2 ;with the increasing of HDFS1 loading December 2003, Pages: 29-43.amount, shared slots used by HDFS2 and [3] Open Nebula. slots will gradually returned to the [4] B. Sotomayor, R. Montero, I. Llorente, and I. Foster.HDFS1, then their maximu m load using slots; Capacity Leasing in Cloud Systems using the OpenWith the HDFS2 loading amount gradually Nebula Engine. In Workshop on Cloud Computing and itsdecreased, HDFS2 gradually reduced the use of Applications (CCA08),2009.slots, thus released slots are gradually used by the [5] Platform: load of HDFS1; through the use of slots [6] X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, P.sharing and borrowing fro m HDFS2, wh ich Padala, and K. Shin. What does control theory bring togradually reduces load. Along with its load systems research? SIGOPS Operating Systems Review,reduction, HDFS2 gradually reduces the use of 2009, 43(1):62-69.slots, and tends to smooth. [7] H. C. Lim,S. Babu, J. S. Chase, and S. S. Parekh. VI Conclusion Automated control in cloud computing: Challenges and Through studying of the existing HDFS opportunities. ACDC 09 Proceedings of the 1st workshopconstructing methods and the data node on Automated control for dat acenters and clouds. Volume:controlling mode, what makes the utilizat ion rate C, Issue: 09, Publisher: ACM, Pages: 13-18.of HDFS for the system res ource low is theproblems that the waste of resource usage and thewww.thei The IJES Page 17