NICTA Copyright 2012 From imagination to impactAutomatic Undo forCloud Management viaAI PlanningIngo Weber, Hiroshi Wada, ...
NICTA Copyright 2012 From imagination to impactNICTA in Brief• Australia’s National Centre of Excellence inInformation and...
NICTA Copyright 2012 From imagination to impactour spin-out3
NICTA Copyright 2012 From imagination to impactMotivation of this work• Yuruware Bolt: site-replication across regions– Ex...
NICTA Copyright 2012 From imagination to impactOur Goal• Provide an “undo” button to cloud users– Allow for rolling back t...
NICTA Copyright 2012 From imagination to impactStatus quo6dodoOperations(API calls)Administrator/scriptCloud resourcesAPI ...
NICTA Copyright 2012 From imagination to impactGoal7CheckpointRollbackdodoOperations(API calls)Administrator/scriptCommitC...
NICTA Copyright 2012 From imagination to impactOur approach• “undo one by one in reverse order“ does notalways work– No un...
NICTA Copyright 2012 From imagination to impactPseudo-delete for not-undoable ops9CheckpointRollbackdodoOperations(API cal...
NICTA Copyright 2012 From imagination to impactAI planning for the rest of cases10Undo SystemCheckpointRollbackdodoOperati...
NICTA Copyright 2012 From imagination to impactAI Planning 101• Given the initial state of the world, goal state,and a set...
NICTA Copyright 2012 From imagination to impactPlanning under uncertainty• In our problem– Initial state: state at rollbac...
NICTA Copyright 2012 From imagination to impactDomain model: example• Action to delete a disk volume in PDDL(:action Delet...
NICTA Copyright 2012 From imagination to impactDomain model: actionsResource type ActionsVirtual machine launch, terminate...
NICTA Copyright 2012 From imagination to impactEvaluation• Scalability of the planner based on an internallyreleased proto...
NICTA Copyright 2012 From imagination to impactEvaluation 1: vs plan length1600.511.522.50 10 20 30 40 50 60 70SecondsPlan...
NICTA Copyright 2012 From imagination to impactEvaluation 2: vs # of unrelated resources1705010015020025030035040045050035...
NICTA Copyright 2012 From imagination to impactConclusion & future work• Rollback in cloud using AI planner– Formalized pa...
NICTA Copyright 2012 From imagination to impactQuestions?The paper is available at www.nicta.com.au/pub?doc=599419
Upcoming SlideShare
Loading in …5
×

Automatic Undo for Cloud Management via AI Planning

318 views

Published on

presented at Berkeley AMPLab Seminar on Oct 17, 2012
by Ingo Weber, Hiroshi Wada, Alan Fekete, Anna Liu, Len Bass

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
318
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Automatic Undo for Cloud Management via AI Planning

  1. 1. NICTA Copyright 2012 From imagination to impactAutomatic Undo forCloud Management viaAI PlanningIngo Weber, Hiroshi Wada, Alan Fekete,Anna Liu and Len BassBased on presentation at USENIX HotDep ‘12
  2. 2. NICTA Copyright 2012 From imagination to impactNICTA in Brief• Australia’s National Centre of Excellence inInformation and Communication Technology• Five Research Labs:– ATP: Australian Technology Park, Sydney– NRL: UNSW, Sydney– CRL: ANU, Canberra– VRL: Uni. Melbourne– QRL: Uni. Queensland and QUT• 700 staff including 270 PhD students• Budget: ~$90M/yr from Fed/State Gov andindustry• ~600 research papers/year, ~150 patents total
  3. 3. NICTA Copyright 2012 From imagination to impactour spin-out3
  4. 4. NICTA Copyright 2012 From imagination to impactMotivation of this work• Yuruware Bolt: site-replication across regions– Executes long-running operations - create, delete andupdate variety resources via AWS API• Issues we face– High cost of writing unit tests• Preparing a test bed, reset after each test, and error recovery• “Why there is no DBUnit for cloud!”– High cost of error handling• Resources often get stuck in an unexpected state• Well-coordinated and tailored clean up for each case4
  5. 5. NICTA Copyright 2012 From imagination to impactOur Goal• Provide an “undo” button to cloud users– Allow for rolling back to a previous state– e.g., undelete deleted resources and reconstruct therelations among resources• We’re users - cannot alter API or implementation– Some operations are undoable, e.g., detach a VIP– Some operations are semi-undoable, e.g., stop a VM– Some operations are not undoable, e.g., delete a disk• Less invasive– Minimum changes in existing code or scripts5
  6. 6. NICTA Copyright 2012 From imagination to impactStatus quo6dodoOperations(API calls)Administrator/scriptCloud resourcesAPI calls• Administrators and/or scripts talk to cloud via API
  7. 7. NICTA Copyright 2012 From imagination to impactGoal7CheckpointRollbackdodoOperations(API calls)Administrator/scriptCommitCloud resourcesAPI calls• Provide the ability to go back to a checkpoint• No change in scripts except checkpoint and commit
  8. 8. NICTA Copyright 2012 From imagination to impactOur approach• “undo one by one in reverse order“ does notalways work– No undo action is available• e.g., no “undeleteing a deleted resouce “– Undo requires a different sequence of API calls• e.g., “deleteing an autoscaling group“ does not work– Simple undo could results in a different state• e.g., Undo “stopping an instance with a VIP“– Undo operation may fail• e.g., detaching a volume could fail. Need an alternative.– Not optimal• e.g., Bolt‘s operations (creating many temporary resouces)• Tedious to hand-code for all possible cases!8
  9. 9. NICTA Copyright 2012 From imagination to impactPseudo-delete for not-undoable ops9CheckpointRollbackdodoOperations(API calls)Administrator/scriptCommitAPI WrapperLogical state(e.g., pseudo-delete flags)Cloud resourcesWrappedAPI callsAPI callsApplychangesExecutedeletes• Execute API calls if they are undoable• Defer the execution of non-undoable calls until commit
  10. 10. NICTA Copyright 2012 From imagination to impactAI planning for the rest of cases10Undo SystemCheckpointRollbackdodoOperations(API calls)Sense cloudresourcesstatesSense cloudresourcesstatesAI PlannerAdministrator/scriptResourcestate(PDDL)GenerateInput asgoal stateCommitAPI WrapperLogical state(e.g., pseudo-delete flags)Cloud resourcesWrappedAPI callsAPI callsApplychangesExecutedeletesResourcestate(PDDL)Input asinitial stateGenerateDomainmodel(PDDL)InputGenerateCompen-sation planCompen-sationscriptCodegeneratorExecuterollback• Changes made by(semi-)undoable APIcalls are compensatedby an AI planner• AI planner finds waysto handle errorspotentially occur duringundo as well
  11. 11. NICTA Copyright 2012 From imagination to impactAI Planning 101• Given the initial state of the world, goal state,and a set of available actions, find a sequence ofactions that leads from the initial to the goal• Highly optimized heuristics tame the problem forpractical purposes11http://en.wikipedia.org/wiki/Tower_of_Hanoi
  12. 12. NICTA Copyright 2012 From imagination to impactPlanning under uncertainty• In our problem– Initial state: state at rollback– Goal state: state at checkpoint– Actions: AWS APIs• We use FF [*] with an extension to handleactions with alternative outcomes• Finds “maximal“ contingency plans– e.g., if detachnig a volume fails, stop the attachedinstance if possible. If a planner cannot solve, askhuman intervention12[*] H OFFMANN , J., and N EBEL , B. “The FF planning system: Fast plan generationthrough heuristic search.” Journal of Artificial Intelligence Research, 14 (2001),
  13. 13. NICTA Copyright 2012 From imagination to impactDomain model: example• Action to delete a disk volume in PDDL(:action Delete-Volume:parameters (?vol - tVolume):precondition(and(volumeAvailable ?vol)(not (unrecoverableFailure ?vol))):effect(oneof(and(deleted ?vol)(not (volumeAvailable ?vol)))(unrecoverableFailure ?vol)))13
  14. 14. NICTA Copyright 2012 From imagination to impactDomain model: actionsResource type ActionsVirtual machine launch, terminate, start, stop, changeVM sizeDisk volume create, delete, create-from-snapshot,attach, detachDisk snapshot create, deleteElastic IP address allocate, release, associate, disassociateSecurity group create, deleteAuto-scaling group create, delete, change-sizes, change-launch-configAuto-scaling launch config create, deleteTag create, deleteOthers AZ, cluster online/offline, ...14
  15. 15. NICTA Copyright 2012 From imagination to impactEvaluation• Scalability of the planner based on an internallyreleased prototype– AWS cmd line tool replacement• Use cases: ~70 different planning settings tested• Evaluation 1: against plan length• Evaluation 2: against # of unrelated resources15
  16. 16. NICTA Copyright 2012 From imagination to impactEvaluation 1: vs plan length1600.511.522.50 10 20 30 40 50 60 70SecondsPlan length20 length is the maximum we need in our problemExecuting a plan with 10 steps takes ~145 sec
  17. 17. NICTA Copyright 2012 From imagination to impactEvaluation 2: vs # of unrelated resources1705010015020025030035040045050035 350 3500SecondsFacts + Actions (in 1000s)Basis: most difficult problem from previous slidePlanner‘s cost is small unless having 1000s of resouces+ 500unrelatedinstances andvolumes~50 relatedresources
  18. 18. NICTA Copyright 2012 From imagination to impactConclusion & future work• Rollback in cloud using AI planner– Formalized part of AWS APIs in a planning domain– Planning execution time is marginal for practicalsystem sizes• Future work– Extending checkpoints to capture internal resourcestate– Parallelizing plans– Finding forward plans with constraints18
  19. 19. NICTA Copyright 2012 From imagination to impactQuestions?The paper is available at www.nicta.com.au/pub?doc=599419

×