Matchmaking in glideinWMS in CMS

559 views
475 views

Published on

This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
559
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Matchmaking in glideinWMS in CMS

  1. 1. glideinWMS for users Matchmaking in glideinWMS in CMS by Igor Sfiligoi (UCSD)CERN, Dec 2012 glideinWMS matchmaking 1
  2. 2. Scope of this talk This talk provides a high level description of how glideinWMS matchmaking works in CMS. Reader is expected to be familiar with the CMS experiment environment http://cms.web.cern.ch/CERN, Dec 2012 glideinWMS matchmaking 2
  3. 3. glideinWMS architecture ● A reminder G.F. +3 VO FE Grid G.F. +1 Execute node Central manager Execute node Submit node Execute node Negotiator Submit node Execute node Submit node Execute node Schedd CondorCERN, Dec 2012 glideinWMS matchmaking 3
  4. 4. Two levels of matchmaking ● First in the VO Frontend ● To decide where G.F. to provision resources VO FE +3 +1 G.F. Grid Execute node ● i.e. where Submit node Central manager Execute node Execute node to send glideins Negotiator Submit node Execute node Submit node Execute node Schedd Then in the Condor ● HTCondor Negotiator ● To decide The two which Job gets the glidein Slot must have compatible policiesCERN, Dec 2012 glideinWMS matchmaking 4
  5. 5. Defining the policy ● The VO FE configures the glideins ● So it can define the Slot Requirements ● Preferred strategy to leave all policy decisions in the VO FE hands, i.e. both ● VO FE matchmaking policy Easier keep them in sync this way ● HTCondor matchmaking policy ● This implies ● Users should not define Job Requirements ● Instead, publish attributes describing requirements http://www.slideshare.net/igor_sfiligoi/condor-week-12-attribute-matchmaking-move-req-out-of-user-handsCERN, Dec 2012 glideinWMS matchmaking 5
  6. 6. CMS Production @ CERN PoliciesCERN, Dec 2012 glideinWMS matchmaking 6
  7. 7. Description ● The VO FE @ CERN serves the production needs ● i.e. Reconstruction and MC production ● Job submission regulated by service managed by a dedicated team, so jobs are ● Targeted ● Well behaved At least by and largeCERN, Dec 2012 glideinWMS matchmaking 7
  8. 8. Matchmaking policy ● Two dimensions ● Grid Site ● Single CPU vs HTPC ● The actual policy is the AND of both ● Both VO FE policy and HTCondor policy defined in the VO FE instance configurationCERN, Dec 2012 glideinWMS matchmaking 8
  9. 9. Matching on Grid site name ● User Jobs expected to publish the attribute DESIRED_Sites String list ● e.g. +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD” ● The G.F. and the glideins advertising GLIDEIN_CMSSite ● The matchmaking policy is GLIDEIN_CMSSite ∈ DESIRED_SitesCERN, Dec 2012 glideinWMS matchmaking 9
  10. 10. Matching on Job Type ● Use Jobs can publish the attribute DESIRES_HTPC Integer representation of Boolean values ● e.g. +DESIRES_HTPS = 1 ● If not defined, defaults to 0 ● The G.F. And the glideins may advertise GLIDEIN_Is_HTPC Boolean value ● If not defined, defaults to False ● The matchmaking policy is (GLIDEIN_Is_HTPC==True)==(DESIRES_HTPC==1)CERN, Dec 2012 glideinWMS matchmaking 10
  11. 11. Example submit file Universe Universe = vanilla = vanilla Executable = mcgen Executable = mcgen Arguments = -k 1543.3 Arguments = -k 1543.3 Output Output = mcgen.out = mcgen.out Error Error = mcgen.err = mcgen.err Log Log = mcgen.log = mcgen.log +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD” +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD” +DESIRES_HTPC = 0 +DESIRES_HTPC = 0 Requirements = True Requirements = True Queue 1 Queue 1CERN, Dec 2012 glideinWMS matchmaking 11
  12. 12. CMS AnaOps @ UCSD PoliciesCERN, Dec 2012 glideinWMS matchmaking 12
  13. 13. Description ● VO FE @ UCSD serves CMS analysis users ● User Jobs much more chaotic ● Most users dont really understand their needs ● Must protect from accidental errors ● Yet keep the system flexible ● Net result ● More complex policyCERN, Dec 2012 glideinWMS matchmaking 13
  14. 14. Two different policies ● The AnaOps FE actually has two policies ● The Regular policy ● The Overflow policy ● The Regular policy tries to match resources ● Based on User desires ● The Overflow policy “outsmarts” the Users ● Will violate User desires without breaking the Jobs ● The aim is to finish user jobs sooner ● User can opt-out, if he wishesCERN, Dec 2012 glideinWMS matchmaking 14
  15. 15. The Regular M.M. policy ● Four+one dimensions ● Grid Site ● Single CPU vs HTPC ● Memory usage ● Job duration Due to preemption ● Number of Job Starts ● The actual policy is the AND of both ● Both VO FE policy and HTCondor policy defined in the VO FE instance configurationCERN, Dec 2012 glideinWMS matchmaking 15
  16. 16. Grid site selection ● This is both similar and different compared to the Production FE @CERN ● Serves the same purpose, but supports three different ways to select a site – Due to historical evolution ● The three options are ● GLIDEIN_CMSSite ∈ DESIRED_Sites Planning to extend to ● GLIDEIN_SEs ∈ DESIRED_SEs (GLIDEIN_SEs ∩ DESIRED_SEs) ≠∅ ● GLIDEIN_Gatekeeper ∈ DESIRED_Gatekeepers ● The actual policy is the OR of the threeCERN, Dec 2012 glideinWMS matchmaking 16
  17. 17. Job type selection ● Just like @ CERNCERN, Dec 2012 glideinWMS matchmaking 17
  18. 18. Memory Usage● Most Grid sites put strict limits on the amount of memory that can be used ● Will kill glideins if they exceed the limit● G.F. and glideins advertise the Entry-specific limit GLIDEIN_MaxMemMBs● Jobs can explicitly declare the needed memory request_memory Native Condor attribute, no + needed ● Condor will also measure it at run time Use a combination of these to calculate – ImageSize – Virtual memory used the actual JobMemory – ResidentSetSize – True memory usage● Policy: JobMemory <= GLIDEIN_MaxMemMBsCERN, Dec 2012 glideinWMS matchmaking 18
  19. 19. Job Duration 1/2 ● Glideins have a limited lifetime ● Must fit within the limits of the Grid sites queue ● Glideins publish the deadline GLIDEIN_ToDie – Jobs must finish before reaching the deadline ● Final user job lifetime unpredictable ● Depends on the type of computing done ● User should indicate the expected job lifetime – Else we have to assume reasonable defaults Not many users set this value(s) right nowCERN, Dec 2012 glideinWMS matchmaking 19
  20. 20. Job Duration 2/2 ● The same type of computation may take different amount of time ● e.g. Based on the type of input ● Jobs can declare two attributes ● NormMaxWallTimeMins – Expected limit ● MaxWallTimeMins – Absolute max limit ● The matchmaking logic is ● Use NormMaxWallTimeMins for Based on simple assumption the first job startup that the job was killed for hitting the deadline. ● Use MaxWallTimeMins for all othersCERN, Dec 2012 glideinWMS matchmaking 20
  21. 21. Cut on number of re-starts ● Not really a user configurable property ● More an emergency break ● In a properly configured system, should never be triggered ● But unexpected problems happen ● So better limit the damageCERN, Dec 2012 glideinWMS matchmaking 21
  22. 22. The Overflow Use case ● User Jobs specify a list of sites, because the data they need is there ● With recent versions of CMSSW, jobs can access the data from remote ● With a small performance penalty ● We can thus schedule jobs “anywhere” ● As long as the needed data is at a Site that has joined the xrootd federation ● But only if no CPU available “close to the data” – And not too far, either http://indico.cern.ch/contributionDisplay.py?contribId=381&sessionId=5&confId=149557 http://indico.cern.ch/contributionDisplay.py?contribId=232&sessionId=8&confId=149557CERN, Dec 2012 glideinWMS matchmaking 22
  23. 23. The Overflow M.M. policy ● Violate only the “Site selection” rule ● Keep all the others ● Plus, add one+one more: ● An opt-out mechanism ● Delayed matchingCERN, Dec 2012 glideinWMS matchmaking 23
  24. 24. New Site M.M. policy ● The user specified attribute is used to flag the job as “Overflowable” ● i.e. the job will match if and only if (DESIRED_<site>s ∩ SUPPORTED_<site>s) ≠∅ Still support all 3 types of site identification ● Matching jobs can then run on any glidein ● Additional limits can be put in place by the FE, but mostly invisible to the userCERN, Dec 2012 glideinWMS matchmaking 24
  25. 25. The opt-out mechanism ● The Overflow policy considers all jobs by default ● But Users may want to opt-out some of the Jobs – Sometimes it is just a need (to get deterministic results, e.g. for testing a site) ● To opt-out, the user defines +CMS_ALLOW_OVERFLOW = False ● The FE will not consider such jobs for OverflowingCERN, Dec 2012 glideinWMS matchmaking 25
  26. 26. Delayed matching ● As said initially, Jobs should preferentially run close to the data ● Overflow should only consider jobs “that cannot find resources close to the data” ● We implemented it based on time ● Jobs are matched only if waiting in the queue for more than 6 hours Users cannot influence itCERN, Dec 2012 glideinWMS matchmaking 26
  27. 27. Example submit file Universe Universe = vanilla = vanilla Executable = myana Executable = myana Arguments = -k 1543.3 Arguments = -k 1543.3 Output Output = myana.out = myana.out Error Error = myana.err = myana.err Log Log = myana.log = myana.log request_memory = 1500 request_memory = 1500 +DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it" +DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it" +NormMaxWallTimeMins = 7200 +NormMaxWallTimeMins = 7200 +MaxWallTimeMins = 14400 +MaxWallTimeMins = 14400 +DESIRES_HTPC = 0 +DESIRES_HTPC = 0 +CMS_ALLOW_OVERFLOW = True +CMS_ALLOW_OVERFLOW = True Requirements = True Requirements = True Queue 1 Queue 1CERN, Dec 2012 glideinWMS matchmaking 27
  28. 28. The EndCERN, Dec 2012 glideinWMS matchmaking 28
  29. 29. Pointers ● glideinWMS Home Page http://tinyurl.com/glideinWMS ● HTCondor Home Page http://research.cs.wisc.edu/htcondor/ ● HTCondor support htcondor-users@cs.wisc.edu htcondor-admin@cs.wisc.edu ● glideinWMS support glideinwms-support@fnal.govCERN, Dec 2012 glideinWMS matchmaking 29
  30. 30. Acknowledgments ● The creation of this document was sponsored by grants from the US NSF and US DOE, and by the University of California systemCERN, Dec 2012 glideinWMS matchmaking 30

×