glideinWMS Training Jan 2012 - Condor tuning

686 views

Published on

This talk walks you through the various knobs that need to be tuned to get Condor work with glideinWMS.
Part of the glideinWMS Training session held in Jan 2012 at UCSD.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
686
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

glideinWMS Training Jan 2012 - Condor tuning

  1. 1. glideinWMS Training @ UCSD Condor tunning by Igor Sfiligoi (UCSD)UCSD Jan 18th 2012 Condor Tunning 1
  2. 2. Regulating User PrioritiesUCSD Jan 18th 2012 Condor Tunning 2
  3. 3. User priorities ● By default, the Negotiator treats all users equally ● You get fair-share out of the box If you have two users, on average each gets ½ of slots ● If you have different needs, Negotiator supports ● Priority factors – PF reduces user priority – If users X and Y have PFX=(N-1)*PFY, on average user X gets 1/N of slots (with user Y the rest) ● Accounting groups (see next slide) http://research.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.htmlUCSD Jan 18th 2012 Condor Tunning 3
  4. 4. Accounting groups ● Users can be joined in accounting groups ● The Negotiator defines the groups, but jobs specify which group they belong to ● Each group can be given a quota ● Can be relative to the size of the pool ● Sum of group jobs cannot exceed it ● If quotas >100%, can be used for relative prio ● Here higher is better Jobs without any ● Each group will be given, group may never get anything on average, quota /sum(quotas) of slots GUCSD Jan 18th 2012 Condor Tunning 4
  5. 5. Configuring AC ● To enable accounting groups, just define the quotas in the Negotiator config file ● Which jobs go into which group not defined here #condor_config.local #condor_config.local ## 1.0 is100% 1.0 is 100% GROUP_QUOTA_DYNAMIC_group_higgs == 3.2 GROUP_QUOTA_DYNAMIC_group_higgs 3.2 GROUP_QUOTA_DYNAMIC_group_b == GROUP_QUOTA_DYNAMIC_group_b 2.2 2.2 GROUP_QUOTA_DYNAMIC_group_k == GROUP_QUOTA_DYNAMIC_group_k 6.5 6.5 GROUP_QUOTA_DYNAMIC_group_susy == 8.8 GROUP_QUOTA_DYNAMIC_group_susy 8.8UCSD Jan 18th 2012 Condor Tunning 5
  6. 6. Using the AC ● Users must specify which group they belong to ● No automatic mapping or validation in Condor ● Based on trust ● Jobs must add to their submit file +AccountingGroup = "<group>.<user>" Universe Universe = vanilla = vanilla Executable = cosmos Executable = cosmos Arguments = -k 1543.3 Arguments = -k 1543.3 Output Output = cosmos.out = cosmos.out Input Input = cosmos.in = cosmos.in Log Log = cosmos.log = cosmos.log +AccountingGroup = "group_higgs.frieda" +AccountingGroup = "group_higgs.frieda" Queue 1 Queue 1UCSD Jan 18th 2012 Condor Tunning 6
  7. 7. Configuring PF Anyone can read ● Priority factors a dynamic property Can only be set by the Negotiator ● Set by cmdline tool condor_userprio administrator $ condor_userprio -all -allusers |grep user1@node1 $ condor_userprio -all -allusers |grep user1@node1 user1@node1 8016.22 8.02 10.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 user1@node1 8016.22 8.02 10.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 $ condor_userprio -setfactor user1@node1 1000 $ condor_userprio -setfactor user1@node1 1000 The priority factor of user1@node1 was set to 1000.000000 The priority factor of user1@node1 was set to 1000.000000 $ condor_userprio -all -allusers |grep user1@node1 $ condor_userprio -all -allusers |grep user1@node1 user1@node1 8016.22 8.02 1000.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 user1@node1 8016.22 8.02 1000.00 0 15780.63 11/23/2011 05:59 12/30/2011 20:37 ● Although persistent between Negotiator restarts ● May want to set higher default PF (e.g. 1000) – default of 1, and user PF cannot go below – Attribute regulated by DEFAULT_PRIO_FACTOR http://research.cs.wisc.edu/condor/manual/v7.6/2_7Priorities_Preemption.html#sec:user-priority-explainedUCSD Jan 18th 2012 Condor Tunning 7
  8. 8. Negotiation cycle optimizationUCSD Jan 18th 2012 Condor Tunning 8
  9. 9. Negotiation internals ● The Negotiator does the matchmaking in loops ● Get all Machine ClassAds (from Collector) ● Get Job ClassAds (from Schedds) one by one, prioritized by user – Tell Schedd if a match has been found ● Sleep ● RepeatUCSD Jan 18th 2012 Condor Tunning 9
  10. 10. Sleep time ● The Negotiator sleeps between cycles to save CPU ● But will be interrupted if new jobs are submitted ● Makes sense for static pools The sleep will not be interrupted if a new glidein join ● But can be a problem for dynamic ones like glideins Unmatched glideins ● Keep the sleep as low as feasible NEGOTIATOR_INTERVAL ● Installer sets it to 60s NEGOTIATOR_INTERVAL ==60 NEGOTIATOR_INTERVAL 60 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorIntervalUCSD Jan 18th 2012 Condor Tunning 10
  11. 11. Preemption ● By default, Condor will preempt a user as soon as jobs from a higher priority users are available ● Preemption == wasted CPU Preemption==killing jobs ● The glideinWMS installer disables preemption ● Once a glidein is given to a user, he can keep it until the glidein dies ## Prevent preemption Prevent preemption Glidein lifetime PREEMPTION_REQUIREMENTS ==False PREEMPTION_REQUIREMENTS False is limited, so ## Speed up Matchmaking Speed up Matchmaking fair share NEGOTIATOR_CONSIDER_PREEMPTION ==False NEGOTIATOR_CONSIDER_PREEMPTION False still enforced long term http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorConsiderPreemptionUCSD Jan 18th 2012 Condor Tunning 11
  12. 12. Protecting against dead glideins ● Glideins can die ● Just a fact of life on the Grid ● But their ClassAds will stay in the Collector for about 20 mins ● If they were Idle at the time, will match jobs! ● Make sure the lowest priority user get stuck ● Order Machine ClassAds by freshness NEGOTIATOR_POST_JOB_RANK ==MY.LastHeardFrom NEGOTIATOR_POST_JOB_RANK MY.LastHeardFrom http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorPostJobRankUCSD Jan 18th 2012 Condor Tunning 12
  13. 13. Protecting against slow schedds ● If one schedd gets stuck, you want to limit the damage, and match jobs from other schedds ● Negotiator can limit the time spent on each schedd Was really important before 7.6.X... less of an issue now ● The glideinWMS installer sets these limits ● See Condor manual for details NEGOTIATOR_MAX_TIME_PER_SUBMITTER=60 NEGOTIATOR_MAX_TIME_PER_SUBMITTER=60 NEGOTIATOR_MAX_TIME_PER_PIESPIN=20 NEGOTIATOR_MAX_TIME_PER_PIESPIN=20 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:NegotiatorMaxTimePerSubmitterUCSD Jan 18th 2012 Condor Tunning 13
  14. 14. One way matchmaking ● The Negotiator normally sends the match both the Schedd and the Startd ● Only Schedd strictly needed ● Given that the glideins can be behind a firewall, it is a good idea to disable this behavior NEGOTIATOR_INFORM_STARTD ==False NEGOTIATOR_INFORM_STARTD FalseUCSD Jan 18th 2012 Condor Tunning 14
  15. 15. Collector tunningUCSD Jan 18th 2012 Condor Tunning 15
  16. 16. Collector tunning ● The major tunning is the Collector tree ● Already discussed in the Installation section ● Three more optional tunnings ● Security session lifetime ● File descriptor limits ● Client handling parametersUCSD Jan 18th 2012 Condor Tunning 16
  17. 17. Security session lifetime ● All communication to the Collector (leafs) is GSI based ● But GSI handshakes are expensive! ● Condor thus uses it just to exchange a shared secret (i.e. password) with the remote glidein ● Will use this passwd for all further communications ● But the passwd also has a limited lifetime To limit damage ● Default lifetime quite long in case it is stolen ● The glideinWMS installer Was months in old limits it to 12h versions of Condor! SEC_DAEMON_SESSION_DURATION == 50000 SEC_DAEMON_SESSION_DURATION 50000 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecDefaultSessionDuration http://iopscience.iop.org/1742-6596/219/4/042017UCSD Jan 18th 2012 Condor Tunning 17
  18. 18. File descriptor limit ● The Collector (leafs) also function as CCB ● Will use 5+ FDs for every glidein they serve! ● By default, OS only gives 1024 FDs to each process ● But Condor will limit itself to ~80% of that ● So it has a few FDs left for internal needs (e.g. logs) Not done by ● You may want to increase this limit glideinWMS installer ● Must be done as root The alternative is to just (and Collector usually not root) run more secondary collectors (recommended) ● And dont forget the system-wide limits /proc/sys/fs/file-maxUCSD Jan 18th 2012 Condor Tunning 18
  19. 19. Client handling limits ● Every time someone runs condor_status Used by the Frontend processes the Collector must act on it ● A query is also done implicitly when you run condor_q -name <schedd_name> ● Since the Collector is a busy process, the Condor team added the option to fork on query It only uses RAM if the parent ● This will use up RAM! memory pages change But not unusual on a busy Collector ● So there is a limit COLLECTOR_QUERY_WORKERS Defaults to 16 You may want to either lower or increase itUCSD Jan 18th 2012 Condor Tunning 19
  20. 20. Firewall tunning ● If you decided you want to keep the firewall you may need to tune it ● Open ports for the Collector tree ● And possibly the Negotiator Negotiator port usually (if Schedds outside the perimeter) dynamic... will need to fix it with ● You may also need to NEGOTIATOR_ARGS = -f -p 9617 tune the firewall scalability knobs Statefull firewalls ## foriptables for iptables will have to track sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; 5+ TCP connections sysctl -p; sysctl -p; per glidein echo 48194 >>/sys/module/ip_conntrack/parameters/hashsize echo 48194 /sys/module/ip_conntrack/parameters/hashsizeUCSD Jan 18th 2012 Condor Tunning 20
  21. 21. Schedd tunningUCSD Jan 18th 2012 Condor Tunning 21
  22. 22. Schedd tunning ● The following knobs may need to be tuned ● Job limits ● Startup/Termination limits ● Client handling limits ● Match attributes ● Requirements defaults ● Periodic expressions ● You also may need to tune the system settingsUCSD Jan 18th 2012 Condor Tunning 22
  23. 23. Job limits ● As you may remember, each running job uses a non-negligible amount of resources ● Thus the Schedd has a limit on them MAX_JOB_RUNNING ● Set it to a value You may waste some that is right for your HW CPU on glideins, but better than crashing the submit node ## glideinWMS installerprovided value glideinWMS installer provided value MAX_JOBS_RUNNING == 6000 MAX_JOBS_RUNNING 6000 http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxJobRunningUCSD Jan 18th 2012 Condor Tunning 23
  24. 24. Startup limits ● Each time a job is started ● The Schedd must create a shadow ● The Shadow must load the input files from disk ● The Shadow must transfer the files over the network ● If too many start at once, may overload the node ● Two sets of limits: Limit at the Schedd ● JOB_START_DELAY/JOB_START_COUNT ● MAX_CONCURRENT_UPLOADS Limit on the network http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:JobStartCount http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxConcurrentUploadsUCSD Jan 18th 2012 Condor Tunning 24
  25. 25. Termination limits ● Termination can be problematic as well Can easily overload ● Output files must be transferred back the submit node. – And you do not know how big will it be! ● If using glexec, an authentication call will be made on each glidein Each calling the site GUMS. Imagine O(1k) within a second! ● Similar limits: Limit at the Schedd ● JOB_STOP_DELAY/JOB_STOP_COUNT ● MAX_CONCURRENT_DOWNLOADS Limit on the network http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:JobStopCount http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:MaxConcurrentDownloadsUCSD Jan 18th 2012 Condor Tunning 25
  26. 26. Client handling limits ● Every time someone runs condor_q Used by the Frontend processes the Schedd must act on it ● Since the Schedd is a busy process, the Condor team added the option to fork on query ● This will use up RAM! It only uses RAM if the parent memory pages change ● So there is a limit But not unusual on a busy Schedd SCHEDD_QUERY_WORKERS Defaults to 3 You will want to increase it http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:ScheddQueryWorkersUCSD Jan 18th 2012 Condor Tunning 26
  27. 27. Match attributes for history ● It is often useful to know where were the jobs running ● By default, you get just the host name LastRemoteHost ● May be useful to know which glidein factory it came from, too (etc.) ● But you can get match info easily if you ... ● Add to the job submit files job_attribute = $$(glidein attribute) After matching, Schedd adds in the ClassAd MATCH_EXP_job_attributeUCSD Jan 18th 2012 Condor Tunning 27
  28. 28. System level addition ● You can add them in the system config## condor_config.local condor_config.localJOB_GLIDEIN_Factory =="$$(GLIDEIN_Factory:Unknown)" JOB_GLIDEIN_Factory "$$(GLIDEIN_Factory:Unknown)"JOB_GLIDEIN_Name == "$$(GLIDEIN_Name:Unknown)" Factory identification JOB_GLIDEIN_Name "$$(GLIDEIN_Name:Unknown)"JOB_GLIDEIN_Entry_Name =="$$(GLIDEIN_Entry_Name:Unknown)" JOB_GLIDEIN_Entry_Name "$$(GLIDEIN_Entry_Name:Unknown)"JOB_GLIDEIN_ClusterId == "$$(GLIDEIN_ClusterId:Unknown)" Glidein identification JOB_GLIDEIN_ClusterId "$$(GLIDEIN_ClusterId:Unknown)" (to give the Factory admins)JOB_GLIDEIN_ProcId == "$$(GLIDEIN_ProcId:Unknown)" JOB_GLIDEIN_ProcId "$$(GLIDEIN_ProcId:Unknown)"JOB_GLIDECLIENT_Name =="$$(GLIDECLIENT_Name:Unknown)" JOB_GLIDECLIENT_Name "$$(GLIDECLIENT_Name:Unknown)" Frontend identificationJOB_GLIDECLIENT_Group == "$$((GLIDECLIENT_Group:Unknown)" JOB_GLIDECLIENT_Group "$$((GLIDECLIENT_Group:Unknown)"JOB_Site == "$$(GLIDEIN_Site:Unknown)" JOB_Site "$$(GLIDEIN_Site:Unknown)" Any other attributeSUBMIT_EXPRS == $(SUBMIT_EXPRS) SUBMIT_EXPRS $(SUBMIT_EXPRS) JOB_GLIDEIN_Factory JOB_GLIDEIN_Name JOB_GLIDEIN_Factory JOB_GLIDEIN_Name JOB_GLIDEIN_Entry_Name This is how you push JOB_GLIDEIN_Entry_Name them in the user job JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId JOB_GLIDEIN_ClusterId JOB_GLIDEIN_ProcId JOB_GLIDECLIENT_Name JOB_GLIDECLIENT_Group ClassAds JOB_GLIDECLIENT_Name JOB_GLIDECLIENT_Group JOB_Site JOB_Site UCSD Jan 18th 2012 Condor Tunning 28
  29. 29. Additional job defaults ● With glideins, we often want empty user job requirements ● But Condor does not allow it ● It will forcibly add something about ● Arch e.g. (Arch == "INTEL") ● Disk e.g. (Disk>DiskUsage) ● Memory e.g. (Memory>ImageSize) ● Unless they are already there... so lets add them ## condor_config.local condor_config.local APPEND_REQ_VANILLA ==(Memory>=1)&&(Disk>=1)&&(Arch=!="fake") APPEND_REQ_VANILLA (Memory>=1)&&(Disk>=1)&&(Arch=!="fake") Just to make sure it always evaluates to TrueUCSD Jan 18th 2012 Condor Tunning 29
  30. 30. Periodic expressions ● Unless you have very well behaved users, some of the jobs will be pathologically broken ● You want to detect them and The will be wasting CPU cycles at best remove the from the system ● Condor allows for periodic expressions SYSTEM_PERIODIC_REMOVE Just remove SYSTEM_PERIODIC_HOLD Leave in the queue, but do not match ● An arbitrary boolean expression ● What to look for comes with experienceUCSD Jan 18th 2012 Condor Tunning 30
  31. 31. Example periodic expression## condor_config.local condor_config.localSYSTEM_PERIODIC_HOLD == SYSTEM_PERIODIC_HOLD ((ImageSize>2000000) |||| Too much memory ((ImageSize>2000000) (DiskSize>1000000) |||| (DiskSize>1000000) Too much disk space (((( JobStatus == 2 ) && JobStatus == 2 ) && ((CurrentTime-JobCurrentStartDate ) ) >100000)) |||| Took too long ((CurrentTime-JobCurrentStartDate > 100000)) (JobRunCount>5)) (JobRunCount>5)) Tried too many timesSYSTEM_PERIODIC_HOLD_REASON == SYSTEM_PERIODIC_HOLD_REASON ifThenElse((ImageSize>2000000), ifThenElse((ImageSize>2000000), “Used too much memory”, In Condor 7.7.X “Used too much memory”, you can also provide … … ifThenElse((JobRunCount>5), a human readable reason string ifThenElse((JobRunCount>5), “Tried too many times”, “Tried too many times”, “Reason unknown”)...) “Reason unknown”)...) http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SystemPeriodicHold http://www.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#param:SystemPeriodicHoldReason UCSD Jan 18th 2012 Condor Tunning 31
  32. 32. File descriptors and process IDs ● Each Shadow keeps open O(10) FDs ● And we have a shadow per running job ● Make sure you have enough system-wide file descriptors ● Make also sure the system can support all the shadow PIDs ~$ echo 4849118 >> /proc/sys/fs/file-max ~$ echo 4849118 /proc/sys/fs/file-max ~$ cat /proc/sys/fs/file-nr ~$ cat /proc/sys/fs/file-nr 59160 00 4849118 59160 4849118 ~$ echo 128000 >> /proc/sys/kernel/pid_max ~$ echo 128000 /proc/sys/kernel/pid_max http://www.cs.wisc.edu/condor/condorg/linux_scalability.htmlUCSD Jan 18th 2012 Condor Tunning 32
  33. 33. Firewall tunning ● If you decided you want to keep the firewall you may need to tune it ● At minimum, open the port of the Shared Port daemon ● You may also need to tune the firewall scalability knobs Statefull firewalls will have to track ## foriptables for iptables 2 connections sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; sysctl -w net.ipv4.netfilter.ip_conntrack_max=385552; per running job sysctl -p; sysctl -p; echo 48194 >>/sys/module/ip_conntrack/parameters/hashsize echo 48194 /sys/module/ip_conntrack/parameters/hashsizeUCSD Jan 18th 2012 Condor Tunning 33
  34. 34. The EndUCSD Jan 18th 2012 Condor Tunning 34
  35. 35. Pointers ● The official project Web page is http://tinyurl.com/glideinWMS ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● Condor Home Page http://www.cs.wisc.edu/condor/ ● Condor support condor-user@cs.wisc.edu condor-admin@cs.wisc.eduUCSD Jan 18th 2012 Condor Tunning 35
  36. 36. Acknowledgments ● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI ● The glideinWMS factory operations at UCSD is sponsored by OSG ● The funding comes from NSF, DOE and the UC systemUCSD Jan 18th 2012 Condor Tunning 36

×