SlideShare a Scribd company logo
1 of 54
Download to read offline
glideinWMS Training @ UCSD


                glideinWMS Frontend
                      Installation
                     Part 1 – Condor Installation
                           by Igor Sfiligoi (UCSD)




UCSD Jan 17th 2012                Condor Install     1
Overview


                     ●   Introduction
                     ●   Planning and Common setup
                     ●   Central Manager Installation
                     ●   Submit node Installation




UCSD Jan 17th 2012                      Condor Install   2
Refresher - Glideins
 ●   A glidein is just a properly configured Condor
     execution node submitted as a Grid job
      ●    glideinWMS       Central manager
           provides                                           glidein
                                                            Execution node
                               Collector            CREAM
           automation                                         glidein
                                                            Execution node
                              Negotiator
          Submit node
          Submit node
                                                              glidein
                                                            Execution node
          Submit node
                                                            Execution node
                                                              glidein
           Schedd                                              Startd
                                                   Globus
                                                                    Job
                            glideinWMS


UCSD Jan 17th 2012                Condor Install                             3
Refresher - Glideins
 ●   The glideinWMS triggers glidein submission
      ●    The “regular” negotiator matches jobs to glideins
                            Central manager
                                                              glidein
                                                            Execution node
                               Collector            CREAM
                                                              glidein
                                                            Execution node
                              Negotiator
          Submit node
          Submit node
                                                              glidein
                                                            Execution node
          Submit node
                                                            Execution node
                                                              glidein
           Schedd                                              Startd
                                                   Globus
                                                                    Job
                            glideinWMS


UCSD Jan 17th 2012                Condor Install                             4
Bottom line




              Condor is king!
             (glideinWMS just a small layer on top)




UCSD Jan 17th 2012           Condor Install           5
Condor installation
 ●   Proper Condor installation and configuration
     the most important task
      ●   Condor will do most of the work
      ●   … and is thus the most resource hungry
 ●   GlideinWMS installation almost an afterthought
      ●   Although it does require proper
          security config of Condor
      ●   GlideinWMS installation proper will be described
          in a separate talk


UCSD Jan 17th 2012            Condor Install                 6
Planning
                         and
                     Common setup



UCSD Jan 17th 2012       Condor Install   7
Refresher - Condor
 ●   Two main node types
      ●   Submit node(s)
      ●   Central manager                                   Central manager
      ●   (execute nodes are dynamic – glideins)               Collector
 ●   Public TCP/IP                           Submit node
                                             Submit node
                                             Submit node
                                                              Negotiator

     networking needed
                                                   Schedd
 ●   GSI used for
     network security
                                                                  glidein


UCSD Jan 17th 2012                Condor Install                            8
Planning the setup
 ●   In theory, all Condor daemons can be installed
     on a single node
 ●   However, if at all possible, put
     Central Manager on a dedicated node
      ●   i.e. do not use it as a submit node, too
      ●   Both for security and stability reasons
 ●   You may want/need more than one submit node
      ●   Depends on expected use and available HW
      ●   You do need at least one, though

UCSD Jan 17th 2012             Condor Install         9
Common system considerations
 ●   Condor is supported on a wide variety of platforms
     ●   Including Linux (e.g. RHEL5), MacOS and Windows
     ●   Linux recommended in OSG (and assumed in the rest of talk)
 ●   GSI security requires
     ●   Host or service certificate
     ●   CAs & CRLs
           –   Typically delivered via OSG RPMs (but other means acceptable)
               https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallCertAuth

     ●   Full Grid Client software recommended (for ease of ops)
         https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallOSGClient




UCSD Jan 17th 2012                                   Condor Install                        10
OSG Grid Client
 ●   Requires RHEL5-compatible Linux
      ●   RHEL6 support promised for early 2012
 ●   Procedure in a nutshell
      ●   Add EPEL and OSG RPM repositories to sys conf.
      ●   yum install osg-ca-certs
      ●   yum install osg-client
                                                                              Other Grid clients
      ●   Enable CRL fetching crontab                                           (e.g. EGI/glite)
                                                                             will work just as well

https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallOSGClient




UCSD Jan 17th 2012                                Condor Install                                      11
Requesting a host certificate
 ●   OSG provides a script to talk to DOEGrids
     https://twiki.grid.iu.edu/bin/view/Documentation/Release3/GetHostServiceCertificates

 ●   Procedure in a nutshell
      ●   Install OSG client
      ●   yum install osg-cert-scripts
      ●   cert-request …
      ●   Wait for email
                                                                   If you have other ways
      ●   cert-retrieve …                                            to obtain a host cert,
                                                                     feel free to use them
      ●   cp into /etc/grid-security/

UCSD Jan 17th 2012                          Condor Install                                    12
Condor
                     Central Manager




UCSD Jan 17th 2012        Condor Install   13
Refresher - Central Manager
                                                 Central manager
 ●   Two (groups of) processes
                                                    Collector
      ●   Collector
                                                   Negotiator
      ●   Negotiator
 ●   The Collector defines the Condor pool
      ●   Knows about all the glideins it owns
      ●   Knows about all the schedds
 ●   The Negotiator does the matchmaking
      ●   Decides who gets what resources


UCSD Jan 17th 2012            Condor Install                       14
Condor Collector – considerations
 ●   The Collector is the repository of all knowledge
     ●   All other daemons report to it
     ●   Including the glideins, who get its address at run-time
 ●   Must process lots of info                           Central manager
     ●   One update every 5 mins                   Negotiator       Collector
         from each and every daemon
                                                    Collector         Collector
     ●   With strong security → expensive
 ●   Typically deployed as
     a tree of collectors
                                              glidein              glidein
     ●   All security handled in leafs
                                                                             glidein
     ●   Top one still has the complete picture      glidein


UCSD Jan 17th 2012                Condor Install                                  15
CCB – An additional cost
 ●   The Condor collectors are also acting as CCBs
      ●   Each glidein will open 5+ long-lived TCP sockets
 ●   Make sure you have enough file descriptors
      ●   Default OS limit is 1024 per process
 ●   Plan on having
     one CCB per 100 glideins
                                                    CCB

                                               Call me back
        Leafs in the                               I want to connect
     tree of collectors                          to the execute node


                                                              transfer files


UCSD Jan 17th 2012            Condor Install                                   16
High availability
                                                     (theory)
 ●   Central manager can be a single point of failure
      ●   If it dies, the Condor pool dies with it!
 ●   To avoid this, one can deploy multiple CMs
      ●   All daemons will advertise to 2 (or more) Collectors
                                  Currently not supported by glideinWMS
      ●   All CMs will have the same view of the world
 ●   There can only be one Negotiator, though
      ●   One negotiator will be Active, all others in standby
      ●   More details on Condor man page
          http://www.cs.wisc.edu/condor/manual/v7.6/3_11High_Availability.html#SECTION004112000000000000000



UCSD Jan 17th 2012                                  Condor Install                                            17
Hardware needs
 ●   Tree of collectors spreads the load over
     multiple processes
      ●   So several CPUs come handy
 ●   Negotiator single threaded
      ●   Will benefit from fast CPU              Exact footprint
                                                  depends on how many
 ●   Memory usage not terrible                    additional attributes
                                                  the VO defines
      ●   O(100k) per glidein to store ClassAds
      ●   Concrete CMS example: 25k glideins ~ 6G memory
 ●   Negligible disk IO
UCSD Jan 17th 2012            Condor Install                      18
System considerations
                                              Minimize risk due to Condor bugs
 ●   Does not need to run as root (although it can)
      ●Make sure the host cert is readable by that user
 ●   Must be on the public IP network
     ● Each collector listens on its own well defined port,

       must be reachable by all glideins (WAN)            Must open
                                                           firewall
     ● Negotiator has a dynamic list port,                 at least
       must be reachable by submit nodes (schedds) for these
 ●   Will use a large number of network sockets
     ● Will overwhelm most firewalls


     ● Consider disabling stateful firewalls (e.g. iptables)




UCSD Jan 17th 2012           Condor Install                                  19
Security considerations
 ●   Cannot be firewalled → endpoint security
      ●   GSI security used (i.e. x509 certs) for networking
      ●   Limit administrative rights to local users (FS auth)
 ●   The Collector is central trust point of the pool
      ●   The DNs of all other daemons are whitelisted here,
          including:
           –   Schedds
           –   Glideins (i.e. pilot proxies)
           –   Clients (e.g. glideinWMS Frontend)

UCSD Jan 17th 2012             Condor Install                    20
Installing the CM
 ●   Two major burdens (for basic install)
     ●   Collector tree
     ●   Security setup
 ●   The glideinWMS installer helps with both
     ●   Starting from Condor tarball                  Easy-to-use
                                                       update cmdline tool
     ●   As any user (e.g. as non-root)                available, too
     ●   Highly recommended
 ●   RPM install also an option
     ●   Easy to keep up-to-date (i.e. yum update)
     ●   But you will need to configure by hand
     ●   And will run as root               Unless you hack the startup script

UCSD Jan 17th 2012                 Condor Install                            21
Collector tree setup
 ●   In a nutshell
       ●   For each secondary collector:
             –   Tell Master to start a collector on different port
             –   repeat
       ●   Forward ClassAds to main Collector
     ...
       ...
     COLLECTORXXX = $(COLLECTOR)
       COLLECTORXXX = $(COLLECTOR)
     COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog"
       COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog"   xN
     COLLECTORXXX_ARGS = -f -p YYYY
       COLLECTORXXX_ARGS = -f -p YYYY
     DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX
       DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX
     …
       …
     # forward ads to the main collector
      # forward ads to the main collector
     # (this is ignored by the main collector, since the address matches itself)
     CONDOR_VIEW_HOSTthe main collector, since the address matches itself)
      # (this is ignored by = $(COLLECTOR_HOST)
      CONDOR_VIEW_HOST = $(COLLECTOR_HOST)


UCSD Jan 17th 2012                                  Condor Install                  22
Security setup               (1)




 ●   In a nutshell
      ●   Configure basic GSI (i.e. point to CAs and host cert)
      ●   Set up authorization (i.e. switch to whitelist)
      ●   Whitelist all DNs
      ●   Enable GSI
 ●   DN whitelisting a bit annoying
      ●   Must be done in two places
           –   in condor_config, and
                                                   And is a regexp here!
           –   in condor_mapfile
      ●   glideinWMS provides a cmdline tool

UCSD Jan 17th 2012                Condor Install                           23
Security setup                           (2)

# condor_config.local
 # condor_config.local
# Configure GSI
 # Configure GSI
CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile
 CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile
GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
 GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem
 GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem
GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem
 GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem
# Force whitelisting
 # Force whitelisting
DENY_WRITE = anonymous@*
 DENY_WRITE = anonymous@*
DENY_ADMINISTRATOR = anonymous@*
 DENY_ADMINISTRATOR = anonymous@*
DENY_DAEMON = anonymous@*
 DENY_DAEMON = anonymous@*
DENY_NEGOTIATOR = anonymous@*
 DENY_NEGOTIATOR = anonymous@*
DENY_CLIENT = anonymous@*
 DENY_CLIENT = anonymous@*
ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
 ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
ALLOW_WRITE = *
 ALLOW_WRITE = *
USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN
 USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN
# list all DNs                                                      # condor_mapfile
                                                                      # condor_mapfile
... list all DNs
  #                                                                 ...
  ...                                                                 ...
GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX
  GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX
                                                                    GSI "^DNXXX$" UIDXXX
                                                                      GSI "^DNXXX$" UIDXXX
                                                                                             xN
...                                                                 ...
  ...                                                                 ...
                                                                    GSI (.*) anonymous
                                                                      GSI (.*) anonymous
# enable GSI                                                        FS (.*) 1
 # enable GSI                                                         FS (.*) 1
SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI
 SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI
SEC_DEFAULT_AUTHENTICATION = REQUIRED
 SEC_DEFAULT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_ENCRYPTION = OPTIONAL
 SEC_DEFAULT_ENCRYPTION = OPTIONAL
SEC_DEFAULT_INTEGRITY = REQUIRED
                                               Also enable     local auth
 SEC_DEFAULT_INTEGRITY = REQUIRED
# optionally, relax client and read settings
 # optionally, relax client and read settings
   UCSD Jan 17th 2012                         Condor Install                                      24
Installing with Q&A installer
 ~/glideinWMS/install$ ./glideinWMS_install
    ~/glideinWMS/install$ ./glideinWMS_install
 ...
    ...
 Please select: 4
    Please select: 4
 [4] User Pool Collector
 ... User Pool Collector
    [4]
    ...
 Where do you have the Condor tarball? /home/condor/Downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz
    Where do you have the Condor tarball? /home/condor/Downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz
 Where do you want to install it?: [/home/condor/glidecondor] /home/condor/glidecondor
 If Where do you want to install Condor, who should get email about it?: me@myemail
     something goes wrong with it?: [/home/condor/glidecondor] /home/condor/glidecondor
    If something goes wrong with Condor, who should get email about it?: me@myemail
 Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
    Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
 ...
    ...
 Do you want to get it from VDT?: (y/n) y
    Do you want to get it from VDT?: (y/n) y
 Do you have already a VDT installation?: (y/n) y
    Do you have already a VDT installation?: (y/n) y
 Where is the VDT installed?: /etc/osg/wn-client
    Where is the VDT installed?: /etc/osg/wn-client
 ...
    ...
 Will you be using a proxy or a cert? (proxy/cert) cert
    Will you be using a proxy or a cert? (proxy/cert) cert
 Where is your certificate located?: /home/condor/.globus/hostcert.pem
    Where is your certificate located?: /home/condor/.globus/hostcert.pem
 Where is your certificate key located?: /home/condor/.globus/hostkey.pem
    Where is your certificate key located?: /home/condor/.globus/hostkey.pem
 My DN = 'DN1'
    My DN = 'DN1'
 ...                                                                             You can also add
    ...
 DN: DNXXX
    DN: DNXXX
 nickname: [condor001] uidXXX                                                    the DNs as an
    nickname: [condor001] uidXXX                        xN                       independent step
 Is this a trusted Condor daemon?: (y/n) y
    Is this a trusted Condor daemon?: (y/n) y
 ...
    ...
 DN:
    DN:
 How many slave collectors do you want?: [5] 200
    How many slave collectors do you want?: [5] 200
 What name would you like to use for this pool?: [My pool] MyVO
    What name would you like to use for this pool?: [My pool] MyVO
 What port should the collector be running?: [9618] 9618
    What port should the collector be running?: [9618] 9618
UCSD Jan 17th 2012                            Condor Install                                           25
Maintenance
 ●   If you need to add more DNs, use
      ●   cmdline tool glidecondor_addDN
            ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
             ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
            Configuration files changed.
             Configuration files changed.
            Remember to reconfig the affected Condor daemons.
             Remember to reconfig the affected Condor daemons.


 ●   To upgrade the Condor binaries, use
      ●   cmdline tool glidecondor_upgrade
      ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
         ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
      Will update Condor in /home/condor/glidecondor
         Will update Condor in /home/condor/glidecondor
      ..
         ..
      Creating backup dir
         Creating backup dir
      Putting new binaries in place
         Putting new binaries in place
      Finished successfully
         Finished successfully
      Old binaries can be found in /home/condor/glidecondor/old.120102_13
       Old binaries can be found in /home/condor/glidecondor/old.120102_13
UCSD Jan 17th 2012                              Condor Install                                             26
Starting Condor
 ●   The installer will start Condor for you, but you
     still should know how to stop and start it by hand
 ●   To start condor, run:
     ~/glidecondor/start_condor.sh
 ●   To stop Condor, use
     condor_off -daemon master
 ●   Finally, to force Condor to re-read the config:
     ~/glidecondor/sbin/condor_reconfig


UCSD Jan 17th 2012        Condor Install               27
Condor
                     Submit node(s)




UCSD Jan 17th 2012        Condor Install   28
Refresher - Submit node(s)
 ●   Submit node defined by the schedd
      ●   Which holds user jobs                      Submit node

                                                 Schedd
                                                               Shadow
 ●   Shadows will be started as the                               .
                                                                  .
                                                                  .
     jobs are matched to glideins                              Shadow
      ●   One per running job

 ●   At least one submit node is needed
      ●   But there may be many


UCSD Jan 17th 2012              Condor Install                     29
Network use
 ●   Glideins must contact the submit node
     in order to run jobs
     ●   Both with standard protocol and CCB
 ●   Each shadow normally uses 2 random ports
     ●   Not firewall friendly                   Although firewalls can get
                                                 overwhelmed anyhow
     ●   Can be a problem over O(10k) jobs       (see CM slides)
 ●   Newer versions of Condor support
     “shared port daemon”                      Does not reduce
     ●   Listens on a single port              number of sockets

     ●   Forwards the sockets to the appropriate local process


UCSD Jan 17th 2012            Condor Install                          30
Security considerations
 ●   Like with CM, must use endpoint security
 ●   Schedd and CM must whitelist each other
     ●   Certificate DN based
                                                                                                  Central manager
 ●   AuthZ with glideins indirect
                                                                                                       Collector
     ●   No need to whitelist glidein DN(s)
                                                                                                      Negotiator
     ●   Collector trusts glidein,     Submit node
         Schedd trusts Collector        Schedd
 ●   Schedd also must
     whitelist any clients
     (e.g. VO Frontend)                                          Local users
     ●   Only startds can use                                    use FS auth                                glidein
                                                              (i.e. UID based)
         indirect AuthZ
               http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecEnableMatchPasswordAuthentication
UCSD Jan 17th 2012                                  Condor Install                                                    31
Hardware needs
 ●   Submit node is memory hungry                              Actual need
                                                               depends on how
      ●   1M per running jobs due to shadows                   many additional
      ●   O(10k) per job in queue for ClassAds                 VO attributes used
 ●   Schedd can use a fast CPU (single threaded)
      ●   Shadows very light CPU users
 ●   Jobs may put substantial IO load on HDD
      ●   Depends on how much data is being produced
      ●   Depends how short are the jobs
 ●   And the above is just for Condor
      ●   VO may have portal software          Make sure the remaining HW
                                               is adequate for these
      ●   or actual interactive users

UCSD Jan 17th 2012            Condor Install                                32
User account considerations
 ●   Users must be able to launch
     condor_submit
     locally on the submit node
     ●   Remote submission not recommended
                                                                 Still local
         (and disabled by default)                               from the
                                                                 Condor
 ●   VO must decide how to do it                                 point of view
     ●   SSHd (i.e. interactive use)
     ●   Portal (e.g. CMS CRABServer)
 ●   Will need one UID per user                     No need to create
     ●   Non-UID based auth possible,             user accounts before
                                                  Installing Condor, but
         but not recommended                           do plan for it
         (but not supported out of the box)

UCSD Jan 17th 2012               Condor Install                            33
Schedd is a superuser

               ●     Schedd must run as root
                     (euid==0, even as it drops ruid to “condor”)
                     ●   So it can switch UID as needed
                     ●   To access user files
                     ●   Same for shadows (but ruid set to job user)
               ●     Host cert thus must be owned by root



UCSD Jan 17th 2012                      Condor Install                 34
Installing the submit node
 ●   Two major burdens (for basic install)
      ●   Shared port daemon
      ●   Security setup
 ●   The glideinWMS installer helps with both
      ●   Starting from Condor tarball         Easy-to-use
      ●   Should be run as root                update cmdline tool
                                               available, too
      ●   Highly recommended
 ●   RPM install also an option
      ●   Easy to keep up-to-date (i.e. yum update)
      ●   But you will need to configure by hand
UCSD Jan 17th 2012            Condor Install                         35
Shared port daemon
 ●   Not enabled by default in Condor
 ●   In a nutshell
      ●   Pick a port for it
      ●   Enable it
      ●   Add it to the list of Daemons to start
                     ## condor_config.local
                         condor_config.local
                     ## Enable shared_port_daemon
                         Enable shared_port_daemon
                     SHARED_PORT_ARGS == -p 9615
                      SHARED_PORT_ARGS    -p 9615
                     USE_SHARED_PORT == True
                      USE_SHARED_PORT    True
                     DAEMON_LIST == $(DAEMON_LIST) SHARED_PORT
                      DAEMON_LIST    $(DAEMON_LIST) SHARED_PORT


UCSD Jan 17th 2012                    Condor Install              36
Security setup               (1)




 ●   In a nutshell
      ●   Configure basic GSI (i.e. point to CAs and host cert)
      ●   Enable match authentication
      ●   Set up authorization (i.e. switch to whitelist)
      ●   Whitelist all DNs
      ●   Enable GSI
 ●   DN whitelisting a bit annoying
      ●   Must be done in two places
           –   in condor_config, and               And is a regexp here!
           –   in condor_mapfile
      ●   glideinWMS provides a cmdline tool

UCSD Jan 17th 2012                Condor Install                           37
Security setup                           (2)



   # condor_config.local
    # condor_config.local
   # Configure GSI
    # Configure GSI
   CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile
    CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile
   GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
    GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates
   GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem
    GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem
   GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem
    GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem
   # Enable match authentication
    # Enable match authentication
   SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
    SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE
   # Force whitelisting
    # Force whitelisting
   DENY_WRITE = anonymous@*
    DENY_WRITE = anonymous@*
   … # see CM slides for details
    … # see CM slides for details
   # list all DNs                                              # condor_mapfile
     # list all DNs                                              # condor_mapfile
   ...                                                         ...
     ...                                                         ...
   GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX
     GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX                  GSI "^DNXXX$" UIDXXX
                                                                 GSI "^DNXXX$" UIDXXX
                                                                                        xN
   ...                                                         ...
     ...                                                         ...
                                                               GSI (.*) anonymous
   # enable GSI                                                  GSI (.*) anonymous
                                                               FS (.*) 1
    # enable GSI
   SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI                   FS (.*) 1
    SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI
   SEC_DEFAULT_AUTHENTICATION = REQUIRED
    SEC_DEFAULT_AUTHENTICATION = REQUIRED
   SEC_DEFAULT_ENCRYPTION = OPTIONAL
    SEC_DEFAULT_ENCRYPTION = OPTIONAL
   SEC_DEFAULT_INTEGRITY = REQUIRED              Also     enable local auth
    SEC_DEFAULT_INTEGRITY = REQUIRED
   # optionally, relax client and read settings
    # optionally, relax client and read settings


UCSD Jan 17th 2012                       Condor Install                                      38
Network optimization settings
 ●   Since glideins often behind firewalls
      ●   The glidein Startd setup optimized to avoid
          incoming connections and UDP
 ●   The Schedd must also play along

                     ## condor_config.local
                         condor_config.local
                     ## Reverse protocol direction
                         Reverse protocol direction
                     STARTD_SENDS_ALIVES == True
                      STARTD_SENDS_ALIVES    True
                     ## Avoid UDP
                         Avoid UDP
                     SCHEDD_SEND_VACATE_VIA_TCP == True
                      SCHEDD_SEND_VACATE_VIA_TCP    True




UCSD Jan 17th 2012                Condor Install           39
Installing with Q&A installer
 ~/glideinWMS/install$ ./glideinWMS_install
   ~/glideinWMS/install$ ./glideinWMS_install
 ...
   ...
 Please select: 5
 [5] User Schedd5
   Please select:
   [5] User Schedd
 …
   …
 Which user should Condor run under?: [condor] condor
   Which user should Condor run under?: [condor] condor
 Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz
   Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz
 Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor
   Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor
 If something goes wrong with Condor, who should get email about it?: me@myemail
   If something goes wrong with Condor, who should get email about it?: me@myemail
 Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
   Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y
 ...
   ...
 Do you want to get it from VDT?: (y/n) y
   Do you want to get it from VDT?: (y/n) y
 Do you have already a VDT installation?: (y/n) y
   Do you have already a VDT installation?: (y/n) y
 Where is the VDT installed?: /etc/osg/wn-client
   Where is the VDT installed?: /etc/osg/wn-client
 Will you be using a proxy or a cert? (proxy/cert) cert
   Will you be using a proxy or a cert? (proxy/cert) cert
 Where is your certificate located?: /etc/grid-security/hostcert.pem
   Where is your certificate located?: /etc/grid-security/hostcert.pem
 Where is your certificate key located?: /etc/grid-security/hostkey.pem
   Where is your certificate key located?: /etc/grid-security/hostkey.pem
 My DN = 'DN1'
   My DN = 'DN1'
 ...
   ...
                                                                                You can also add
 DN: DNXXX                                                                      the DNs as an
   DN: DNXXX
 nickname: [condor001] uidXXX
   nickname: [condor001] uidXXX                        xN                       independent step
 Is this a trusted Condor daemon?: (y/n) y
   Is this a trusted Condor daemon?: (y/n) y
 ...
   ...
 DN:
   DN:
 What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain
   What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain
 Do you want to enable the shared_port_daemon?: (y/n) y
   Do you want to enable the shared_port_daemon?: (y/n) y
 What port should it use?: [9615] 9615
   What port should it use?: [9615] 9615
 How many secondary schedds do you want?: [9] 0
   How many secondary schedds do you want?: [9] 0
UCSD Jan 17th 2012                               Condor Install                                        40
Maintenance
 ●   If you need to add more DNs, use
      ●   cmdline tool glidecondor_addDN
            ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
             ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA
            Configuration files changed.
             Configuration files changed.
            Remember to reconfig the affected Condor daemons.
             Remember to reconfig the affected Condor daemons.
                                                                                            Do not use
                                                                                            -daemon
 ●   To upgrade the Condor binaries, use                                                    for client's DN
      ●   cmdline tool glidecondor_upgrade
      ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
         ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz
      Will update Condor in /home/condor/glidecondor
         Will update Condor in /home/condor/glidecondor
      ..
         ..
      Creating backup dir
         Creating backup dir
      Putting new binaries in place
         Putting new binaries in place
      Finished successfully
         Finished successfully
      Old binaries can be found in /home/condor/glidecondor/old.120102_13
       Old binaries can be found in /home/condor/glidecondor/old.120102_13
UCSD Jan 17th 2012                              Condor Install                                             41
Starting Condor
 ●   The installer will start Condor for you, but you still
     should know how to stop and start it by hand
 ●   The installer has created an init.d script for you
     /etc/init.d/condor start|stop
 ●   To force Condor to reload its config, still use
     /opt/glidecondor/sbin/condor_reconfig



                                           All as root


UCSD Jan 17th 2012        Condor Install                  42
Fine tunning




UCSD Jan 17th 2012       Condor Install   43
Fine tunning
 ●   The previous slides provide only basic setup
      ●   Although the glideinWMS does some basic tunning
 ●   You will likely want to tune the system further
      ●   Proper limits in the submit node
      ●   Default job attributes
      ●   Sanity checks
      ●   Priority tunning
 ●   Not part of this talk
      ●   Will go into details tomorrow


UCSD Jan 17th 2012            Condor Install                44
Integration with
                     OSG Accounting




UCSD Jan 17th 2012         Condor Install   45
OSG Accounting
 ●   OSG tries to keep accurate accounting
     information of who used what resources
      ●    Using GRATIA
     https://twiki.grid.iu.edu/twiki/bin/view/Accounting/WebHome
     http://gratia-osg-prod-reports.opensciencegrid.org/gratia-reporting/




UCSD Jan 17th 2012                                         Condor Install   46
Per-user accounting
 ●   OSG has per-user accounting, too
      ●   With glideins, this level of detail lost
      ●   Only pilot proxy seen by OSG (sites)




UCSD Jan 17th 2012             Condor Install        47
The glidein GRATIA probe
 ●   OSG thus asks glidein operators to install a
     dedicated probe alongside the glidein schedd(s)
      ●   Which will provide per-user accounting info
          to the OSG GRATIA server
      ●   Optimized for use with OSG glidein factory
     https://twiki.grid.iu.edu/bin/view/Accounting/ProbeConfigGlideinWMS


                        Submit node

                Schedd                                                      OSG
                                                                           GRATIA
                                     GRATIA Probe
                                                                           Server




UCSD Jan 17th 2012                               Condor Install                     48
Installing the GRATIA probe
 ●   In a nutshell
      ●   Register submit node with GOC
      ●   Tweak condor config
      ●   yum install gratia-probe-condor
      ●   Configure GRATIA
     https://twiki.grid.iu.edu/bin/view/Accounting/ProbeConfigGlideinWMS




UCSD Jan 17th 2012                               Condor Install            49
Condor changes for GRATIA
 ●   GRATIA gets information from history logs
      ●    Requires one file per terminated job for efficiency
 ●   GRATIA needs to know where the job ran
      ●    Additional attribute added to the job ClassAd
           (more general details on this tomorrow)


          ## condor_config.local
            condor_config.local
          PER_JOB_HISTORY_DIR ==/var/lib/gratia/data
           PER_JOB_HISTORY_DIR /var/lib/gratia/data
          JOBGLIDEIN_ResourceName=
            JOBGLIDEIN_ResourceName=
           "$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName),  
             "$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName),
                            IfThenElse(IsUndefined(TARGET.GLIDEIN_Site),  
                              IfThenElse(IsUndefined(TARGET.GLIDEIN_Site),
                                        FileSystemDomain, TARGET.GLIDEIN_Site),  
                                         FileSystemDomain, TARGET.GLIDEIN_Site),
                            TARGET.GLIDEIN_ResourceName)])"
                              TARGET.GLIDEIN_ResourceName)])"
          SUBMIT_EXPRS == $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName
            SUBMIT_EXPRS $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName
UCSD Jan 17th 2012                                   Condor Install                   50
GRATIA configuration
 ●   Essentially just tell GRATIA             ## /etc/gratia/condor/ProbeConfig
                                                /etc/gratia/condor/ProbeConfig

     what name you have                       SiteName="VOX_glidein_node1"
                                               SiteName="VOX_glidein_node1"
                                              EnableProbe="1"
     registered in with GOC                    EnableProbe="1"
                                              ## add this line to allow user jobs
                                                add this line to allow user jobs
      ●   Then enable it                      ## without a proxy
                                                without a proxy
                                              MapUnknownToGroup="1"
                                               MapUnknownToGroup="1"


 ●   You also need to tell it                 ## /root/setup.sh
                                                /root/setup.sh
     where to find Condor                     source /etc/profile.d/condor.sh
                                               source /etc/profile.d/condor.sh




UCSD Jan 17th 2012           Condor Install                                  51
The End




UCSD Jan 17th 2012     Condor Install   52
Pointers
 ●   The official glideinWMS project Web page is
     http://tinyurl.com/glideinWMS
 ●   glideinWMS development team is reachable at
     glideinwms-support@fnal.gov
 ●   Condor Home Page
     http://www.cs.wisc.edu/condor/
 ●   Condor support
     condor-user@cs.wisc.edu
     condor-admin@cs.wisc.edu

UCSD Jan 17th 2012      Condor Install             53
Acknowledgments
 ●   The glideinWMS is a CMS-led project
     developed mostly at FNAL, with contributions
     from UCSD and ISI
 ●   The glideinWMS factory operations at UCSD is
     sponsored by OSG
 ●   The funding comes from NSF, DOE and the
     UC system




UCSD Jan 17th 2012        Condor Install            54

More Related Content

Similar to glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS Training Jan 2012

Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSIgor Sfiligoi
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012Igor Sfiligoi
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMSIgor Sfiligoi
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolIgor Sfiligoi
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Igor Sfiligoi
 
Wedding convenience and control with RemoteCondor
Wedding convenience and control with RemoteCondorWedding convenience and control with RemoteCondor
Wedding convenience and control with RemoteCondorIgor Sfiligoi
 
The glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud WorldThe glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud WorldIgor Sfiligoi
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceIgor Sfiligoi
 
Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013Opersys inc.
 
Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Igor Sfiligoi
 
Solving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringSolving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringIgor Sfiligoi
 
Meetup Docker@Nuxeo - Build a Cloud Platform with Docker
Meetup Docker@Nuxeo - Build a Cloud Platform with DockerMeetup Docker@Nuxeo - Build a Cloud Platform with Docker
Meetup Docker@Nuxeo - Build a Cloud Platform with DockerNuxeo
 
Drone presentation
Drone presentationDrone presentation
Drone presentationLance Smith
 
Developing Android Platform Tools
Developing Android Platform ToolsDeveloping Android Platform Tools
Developing Android Platform ToolsOpersys inc.
 
Quick Deployments
Quick DeploymentsQuick Deployments
Quick DeploymentsRandy602049
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guidesparkfabrik
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3sHaggai Philip Zagury
 

Similar to glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS Training Jan 2012 (20)

Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012glideinWMS validation scirpts - glideinWMS Training Jan 2012
glideinWMS validation scirpts - glideinWMS Training Jan 2012
 
Pilot Factory
Pilot FactoryPilot Factory
Pilot Factory
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012
 
Wedding convenience and control with RemoteCondor
Wedding convenience and control with RemoteCondorWedding convenience and control with RemoteCondor
Wedding convenience and control with RemoteCondor
 
The glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud WorldThe glideinWMS approach to the ownership of System Images in the Cloud World
The glideinWMS approach to the ownership of System Images in the Cloud World
 
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS Experience
 
Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013
 
Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012
 
Solving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoringSolving Grid problems through glidein monitoring
Solving Grid problems through glidein monitoring
 
Meetup Docker@Nuxeo - Build a Cloud Platform with Docker
Meetup Docker@Nuxeo - Build a Cloud Platform with DockerMeetup Docker@Nuxeo - Build a Cloud Platform with Docker
Meetup Docker@Nuxeo - Build a Cloud Platform with Docker
 
Drone presentation
Drone presentationDrone presentation
Drone presentation
 
Developing Android Platform Tools
Developing Android Platform ToolsDeveloping Android Platform Tools
Developing Android Platform Tools
 
Quick Deployments
Quick DeploymentsQuick Deployments
Quick Deployments
 
Cloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guideCloud-Native Drupal: a survival guide
Cloud-Native Drupal: a survival guide
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 

More from Igor Sfiligoi

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...Igor Sfiligoi
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingIgor Sfiligoi
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesIgor Sfiligoi
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeIgor Sfiligoi
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Igor Sfiligoi
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputIgor Sfiligoi
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsIgor Sfiligoi
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyIgor Sfiligoi
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCIgor Sfiligoi
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsIgor Sfiligoi
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksIgor Sfiligoi
 

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

glideinWMS Frontend Installation - Part 1 - Condor Installation - glideinWMS Training Jan 2012

  • 1. glideinWMS Training @ UCSD glideinWMS Frontend Installation Part 1 – Condor Installation by Igor Sfiligoi (UCSD) UCSD Jan 17th 2012 Condor Install 1
  • 2. Overview ● Introduction ● Planning and Common setup ● Central Manager Installation ● Submit node Installation UCSD Jan 17th 2012 Condor Install 2
  • 3. Refresher - Glideins ● A glidein is just a properly configured Condor execution node submitted as a Grid job ● glideinWMS Central manager provides glidein Execution node Collector CREAM automation glidein Execution node Negotiator Submit node Submit node glidein Execution node Submit node Execution node glidein Schedd Startd Globus Job glideinWMS UCSD Jan 17th 2012 Condor Install 3
  • 4. Refresher - Glideins ● The glideinWMS triggers glidein submission ● The “regular” negotiator matches jobs to glideins Central manager glidein Execution node Collector CREAM glidein Execution node Negotiator Submit node Submit node glidein Execution node Submit node Execution node glidein Schedd Startd Globus Job glideinWMS UCSD Jan 17th 2012 Condor Install 4
  • 5. Bottom line Condor is king! (glideinWMS just a small layer on top) UCSD Jan 17th 2012 Condor Install 5
  • 6. Condor installation ● Proper Condor installation and configuration the most important task ● Condor will do most of the work ● … and is thus the most resource hungry ● GlideinWMS installation almost an afterthought ● Although it does require proper security config of Condor ● GlideinWMS installation proper will be described in a separate talk UCSD Jan 17th 2012 Condor Install 6
  • 7. Planning and Common setup UCSD Jan 17th 2012 Condor Install 7
  • 8. Refresher - Condor ● Two main node types ● Submit node(s) ● Central manager Central manager ● (execute nodes are dynamic – glideins) Collector ● Public TCP/IP Submit node Submit node Submit node Negotiator networking needed Schedd ● GSI used for network security glidein UCSD Jan 17th 2012 Condor Install 8
  • 9. Planning the setup ● In theory, all Condor daemons can be installed on a single node ● However, if at all possible, put Central Manager on a dedicated node ● i.e. do not use it as a submit node, too ● Both for security and stability reasons ● You may want/need more than one submit node ● Depends on expected use and available HW ● You do need at least one, though UCSD Jan 17th 2012 Condor Install 9
  • 10. Common system considerations ● Condor is supported on a wide variety of platforms ● Including Linux (e.g. RHEL5), MacOS and Windows ● Linux recommended in OSG (and assumed in the rest of talk) ● GSI security requires ● Host or service certificate ● CAs & CRLs – Typically delivered via OSG RPMs (but other means acceptable) https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallCertAuth ● Full Grid Client software recommended (for ease of ops) https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallOSGClient UCSD Jan 17th 2012 Condor Install 10
  • 11. OSG Grid Client ● Requires RHEL5-compatible Linux ● RHEL6 support promised for early 2012 ● Procedure in a nutshell ● Add EPEL and OSG RPM repositories to sys conf. ● yum install osg-ca-certs ● yum install osg-client Other Grid clients ● Enable CRL fetching crontab (e.g. EGI/glite) will work just as well https://twiki.grid.iu.edu/bin/view/Documentation/Release3/InstallOSGClient UCSD Jan 17th 2012 Condor Install 11
  • 12. Requesting a host certificate ● OSG provides a script to talk to DOEGrids https://twiki.grid.iu.edu/bin/view/Documentation/Release3/GetHostServiceCertificates ● Procedure in a nutshell ● Install OSG client ● yum install osg-cert-scripts ● cert-request … ● Wait for email If you have other ways ● cert-retrieve … to obtain a host cert, feel free to use them ● cp into /etc/grid-security/ UCSD Jan 17th 2012 Condor Install 12
  • 13. Condor Central Manager UCSD Jan 17th 2012 Condor Install 13
  • 14. Refresher - Central Manager Central manager ● Two (groups of) processes Collector ● Collector Negotiator ● Negotiator ● The Collector defines the Condor pool ● Knows about all the glideins it owns ● Knows about all the schedds ● The Negotiator does the matchmaking ● Decides who gets what resources UCSD Jan 17th 2012 Condor Install 14
  • 15. Condor Collector – considerations ● The Collector is the repository of all knowledge ● All other daemons report to it ● Including the glideins, who get its address at run-time ● Must process lots of info Central manager ● One update every 5 mins Negotiator Collector from each and every daemon Collector Collector ● With strong security → expensive ● Typically deployed as a tree of collectors glidein glidein ● All security handled in leafs glidein ● Top one still has the complete picture glidein UCSD Jan 17th 2012 Condor Install 15
  • 16. CCB – An additional cost ● The Condor collectors are also acting as CCBs ● Each glidein will open 5+ long-lived TCP sockets ● Make sure you have enough file descriptors ● Default OS limit is 1024 per process ● Plan on having one CCB per 100 glideins CCB Call me back Leafs in the I want to connect tree of collectors to the execute node transfer files UCSD Jan 17th 2012 Condor Install 16
  • 17. High availability (theory) ● Central manager can be a single point of failure ● If it dies, the Condor pool dies with it! ● To avoid this, one can deploy multiple CMs ● All daemons will advertise to 2 (or more) Collectors Currently not supported by glideinWMS ● All CMs will have the same view of the world ● There can only be one Negotiator, though ● One negotiator will be Active, all others in standby ● More details on Condor man page http://www.cs.wisc.edu/condor/manual/v7.6/3_11High_Availability.html#SECTION004112000000000000000 UCSD Jan 17th 2012 Condor Install 17
  • 18. Hardware needs ● Tree of collectors spreads the load over multiple processes ● So several CPUs come handy ● Negotiator single threaded ● Will benefit from fast CPU Exact footprint depends on how many ● Memory usage not terrible additional attributes the VO defines ● O(100k) per glidein to store ClassAds ● Concrete CMS example: 25k glideins ~ 6G memory ● Negligible disk IO UCSD Jan 17th 2012 Condor Install 18
  • 19. System considerations Minimize risk due to Condor bugs ● Does not need to run as root (although it can) ●Make sure the host cert is readable by that user ● Must be on the public IP network ● Each collector listens on its own well defined port, must be reachable by all glideins (WAN) Must open firewall ● Negotiator has a dynamic list port, at least must be reachable by submit nodes (schedds) for these ● Will use a large number of network sockets ● Will overwhelm most firewalls ● Consider disabling stateful firewalls (e.g. iptables) UCSD Jan 17th 2012 Condor Install 19
  • 20. Security considerations ● Cannot be firewalled → endpoint security ● GSI security used (i.e. x509 certs) for networking ● Limit administrative rights to local users (FS auth) ● The Collector is central trust point of the pool ● The DNs of all other daemons are whitelisted here, including: – Schedds – Glideins (i.e. pilot proxies) – Clients (e.g. glideinWMS Frontend) UCSD Jan 17th 2012 Condor Install 20
  • 21. Installing the CM ● Two major burdens (for basic install) ● Collector tree ● Security setup ● The glideinWMS installer helps with both ● Starting from Condor tarball Easy-to-use update cmdline tool ● As any user (e.g. as non-root) available, too ● Highly recommended ● RPM install also an option ● Easy to keep up-to-date (i.e. yum update) ● But you will need to configure by hand ● And will run as root Unless you hack the startup script UCSD Jan 17th 2012 Condor Install 21
  • 22. Collector tree setup ● In a nutshell ● For each secondary collector: – Tell Master to start a collector on different port – repeat ● Forward ClassAds to main Collector ... ... COLLECTORXXX = $(COLLECTOR) COLLECTORXXX = $(COLLECTOR) COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog" COLLECTORXXX_ENVIRONMENT = "_CONDOR_COLLECTOR_LOG=$(LOG)/CollectorXXXLog" xN COLLECTORXXX_ARGS = -f -p YYYY COLLECTORXXX_ARGS = -f -p YYYY DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX DAEMON_LIST = $(DAEMON_LIST) COLLECTORXXX … … # forward ads to the main collector # forward ads to the main collector # (this is ignored by the main collector, since the address matches itself) CONDOR_VIEW_HOSTthe main collector, since the address matches itself) # (this is ignored by = $(COLLECTOR_HOST) CONDOR_VIEW_HOST = $(COLLECTOR_HOST) UCSD Jan 17th 2012 Condor Install 22
  • 23. Security setup (1) ● In a nutshell ● Configure basic GSI (i.e. point to CAs and host cert) ● Set up authorization (i.e. switch to whitelist) ● Whitelist all DNs ● Enable GSI ● DN whitelisting a bit annoying ● Must be done in two places – in condor_config, and And is a regexp here! – in condor_mapfile ● glideinWMS provides a cmdline tool UCSD Jan 17th 2012 Condor Install 23
  • 24. Security setup (2) # condor_config.local # condor_config.local # Configure GSI # Configure GSI CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile CERTIFICATE_MAPFILE=/home/condor/glidecondor/certs/condor_mapfile GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem GSI_DAEMON_CERT = /home/condor/.globus/hostcert.pem GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem GSI_DAEMON_KEY = /home/condor/.globus/hostkey.pem # Force whitelisting # Force whitelisting DENY_WRITE = anonymous@* DENY_WRITE = anonymous@* DENY_ADMINISTRATOR = anonymous@* DENY_ADMINISTRATOR = anonymous@* DENY_DAEMON = anonymous@* DENY_DAEMON = anonymous@* DENY_NEGOTIATOR = anonymous@* DENY_NEGOTIATOR = anonymous@* DENY_CLIENT = anonymous@* DENY_CLIENT = anonymous@* ALLOW_ADMINISTRATOR = $(CONDOR_HOST) ALLOW_ADMINISTRATOR = $(CONDOR_HOST) ALLOW_WRITE = * ALLOW_WRITE = * USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN USE_VOMS_ATTRIBUTES = False # use only pilot DN, not FQAN # list all DNs # condor_mapfile # condor_mapfile ... list all DNs # ... ... ... GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI "^DNXXX$" UIDXXX GSI "^DNXXX$" UIDXXX xN ... ... ... ... GSI (.*) anonymous GSI (.*) anonymous # enable GSI FS (.*) 1 # enable GSI FS (.*) 1 SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = REQUIRED Also enable local auth SEC_DEFAULT_INTEGRITY = REQUIRED # optionally, relax client and read settings # optionally, relax client and read settings UCSD Jan 17th 2012 Condor Install 24
  • 25. Installing with Q&A installer ~/glideinWMS/install$ ./glideinWMS_install ~/glideinWMS/install$ ./glideinWMS_install ... ... Please select: 4 Please select: 4 [4] User Pool Collector ... User Pool Collector [4] ... Where do you have the Condor tarball? /home/condor/Downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you have the Condor tarball? /home/condor/Downloads/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you want to install it?: [/home/condor/glidecondor] /home/condor/glidecondor If Where do you want to install Condor, who should get email about it?: me@myemail something goes wrong with it?: [/home/condor/glidecondor] /home/condor/glidecondor If something goes wrong with Condor, who should get email about it?: me@myemail Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y ... ... Do you want to get it from VDT?: (y/n) y Do you want to get it from VDT?: (y/n) y Do you have already a VDT installation?: (y/n) y Do you have already a VDT installation?: (y/n) y Where is the VDT installed?: /etc/osg/wn-client Where is the VDT installed?: /etc/osg/wn-client ... ... Will you be using a proxy or a cert? (proxy/cert) cert Will you be using a proxy or a cert? (proxy/cert) cert Where is your certificate located?: /home/condor/.globus/hostcert.pem Where is your certificate located?: /home/condor/.globus/hostcert.pem Where is your certificate key located?: /home/condor/.globus/hostkey.pem Where is your certificate key located?: /home/condor/.globus/hostkey.pem My DN = 'DN1' My DN = 'DN1' ... You can also add ... DN: DNXXX DN: DNXXX nickname: [condor001] uidXXX the DNs as an nickname: [condor001] uidXXX xN independent step Is this a trusted Condor daemon?: (y/n) y Is this a trusted Condor daemon?: (y/n) y ... ... DN: DN: How many slave collectors do you want?: [5] 200 How many slave collectors do you want?: [5] 200 What name would you like to use for this pool?: [My pool] MyVO What name would you like to use for this pool?: [My pool] MyVO What port should the collector be running?: [9618] 9618 What port should the collector be running?: [9618] 9618 UCSD Jan 17th 2012 Condor Install 25
  • 26. Maintenance ● If you need to add more DNs, use ● cmdline tool glidecondor_addDN ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA Configuration files changed. Configuration files changed. Remember to reconfig the affected Condor daemons. Remember to reconfig the affected Condor daemons. ● To upgrade the Condor binaries, use ● cmdline tool glidecondor_upgrade ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz Will update Condor in /home/condor/glidecondor Will update Condor in /home/condor/glidecondor .. .. Creating backup dir Creating backup dir Putting new binaries in place Putting new binaries in place Finished successfully Finished successfully Old binaries can be found in /home/condor/glidecondor/old.120102_13 Old binaries can be found in /home/condor/glidecondor/old.120102_13 UCSD Jan 17th 2012 Condor Install 26
  • 27. Starting Condor ● The installer will start Condor for you, but you still should know how to stop and start it by hand ● To start condor, run: ~/glidecondor/start_condor.sh ● To stop Condor, use condor_off -daemon master ● Finally, to force Condor to re-read the config: ~/glidecondor/sbin/condor_reconfig UCSD Jan 17th 2012 Condor Install 27
  • 28. Condor Submit node(s) UCSD Jan 17th 2012 Condor Install 28
  • 29. Refresher - Submit node(s) ● Submit node defined by the schedd ● Which holds user jobs Submit node Schedd Shadow ● Shadows will be started as the . . . jobs are matched to glideins Shadow ● One per running job ● At least one submit node is needed ● But there may be many UCSD Jan 17th 2012 Condor Install 29
  • 30. Network use ● Glideins must contact the submit node in order to run jobs ● Both with standard protocol and CCB ● Each shadow normally uses 2 random ports ● Not firewall friendly Although firewalls can get overwhelmed anyhow ● Can be a problem over O(10k) jobs (see CM slides) ● Newer versions of Condor support “shared port daemon” Does not reduce ● Listens on a single port number of sockets ● Forwards the sockets to the appropriate local process UCSD Jan 17th 2012 Condor Install 30
  • 31. Security considerations ● Like with CM, must use endpoint security ● Schedd and CM must whitelist each other ● Certificate DN based Central manager ● AuthZ with glideins indirect Collector ● No need to whitelist glidein DN(s) Negotiator ● Collector trusts glidein, Submit node Schedd trusts Collector Schedd ● Schedd also must whitelist any clients (e.g. VO Frontend) Local users ● Only startds can use use FS auth glidein (i.e. UID based) indirect AuthZ http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:SecEnableMatchPasswordAuthentication UCSD Jan 17th 2012 Condor Install 31
  • 32. Hardware needs ● Submit node is memory hungry Actual need depends on how ● 1M per running jobs due to shadows many additional ● O(10k) per job in queue for ClassAds VO attributes used ● Schedd can use a fast CPU (single threaded) ● Shadows very light CPU users ● Jobs may put substantial IO load on HDD ● Depends on how much data is being produced ● Depends how short are the jobs ● And the above is just for Condor ● VO may have portal software Make sure the remaining HW is adequate for these ● or actual interactive users UCSD Jan 17th 2012 Condor Install 32
  • 33. User account considerations ● Users must be able to launch condor_submit locally on the submit node ● Remote submission not recommended Still local (and disabled by default) from the Condor ● VO must decide how to do it point of view ● SSHd (i.e. interactive use) ● Portal (e.g. CMS CRABServer) ● Will need one UID per user No need to create ● Non-UID based auth possible, user accounts before Installing Condor, but but not recommended do plan for it (but not supported out of the box) UCSD Jan 17th 2012 Condor Install 33
  • 34. Schedd is a superuser ● Schedd must run as root (euid==0, even as it drops ruid to “condor”) ● So it can switch UID as needed ● To access user files ● Same for shadows (but ruid set to job user) ● Host cert thus must be owned by root UCSD Jan 17th 2012 Condor Install 34
  • 35. Installing the submit node ● Two major burdens (for basic install) ● Shared port daemon ● Security setup ● The glideinWMS installer helps with both ● Starting from Condor tarball Easy-to-use ● Should be run as root update cmdline tool available, too ● Highly recommended ● RPM install also an option ● Easy to keep up-to-date (i.e. yum update) ● But you will need to configure by hand UCSD Jan 17th 2012 Condor Install 35
  • 36. Shared port daemon ● Not enabled by default in Condor ● In a nutshell ● Pick a port for it ● Enable it ● Add it to the list of Daemons to start ## condor_config.local condor_config.local ## Enable shared_port_daemon Enable shared_port_daemon SHARED_PORT_ARGS == -p 9615 SHARED_PORT_ARGS -p 9615 USE_SHARED_PORT == True USE_SHARED_PORT True DAEMON_LIST == $(DAEMON_LIST) SHARED_PORT DAEMON_LIST $(DAEMON_LIST) SHARED_PORT UCSD Jan 17th 2012 Condor Install 36
  • 37. Security setup (1) ● In a nutshell ● Configure basic GSI (i.e. point to CAs and host cert) ● Enable match authentication ● Set up authorization (i.e. switch to whitelist) ● Whitelist all DNs ● Enable GSI ● DN whitelisting a bit annoying ● Must be done in two places – in condor_config, and And is a regexp here! – in condor_mapfile ● glideinWMS provides a cmdline tool UCSD Jan 17th 2012 Condor Install 37
  • 38. Security setup (2) # condor_config.local # condor_config.local # Configure GSI # Configure GSI CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile CERTIFICATE_MAPFILE=/opt/glidecondor/certs/condor_mapfile GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_TRUSTED_CA_DIR=/etc/grid-security/certificates GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem # Enable match authentication # Enable match authentication SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = TRUE # Force whitelisting # Force whitelisting DENY_WRITE = anonymous@* DENY_WRITE = anonymous@* … # see CM slides for details … # see CM slides for details # list all DNs # condor_mapfile # list all DNs # condor_mapfile ... ... ... ... GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI_DAEMON_NAME=$(GSI_DAEMON_NAME),DNXXX GSI "^DNXXX$" UIDXXX GSI "^DNXXX$" UIDXXX xN ... ... ... ... GSI (.*) anonymous # enable GSI GSI (.*) anonymous FS (.*) 1 # enable GSI SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI FS (.*) 1 SEC_DEFAULT_AUTHENTICATION_METHODS = FS,GSI SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_INTEGRITY = REQUIRED Also enable local auth SEC_DEFAULT_INTEGRITY = REQUIRED # optionally, relax client and read settings # optionally, relax client and read settings UCSD Jan 17th 2012 Condor Install 38
  • 39. Network optimization settings ● Since glideins often behind firewalls ● The glidein Startd setup optimized to avoid incoming connections and UDP ● The Schedd must also play along ## condor_config.local condor_config.local ## Reverse protocol direction Reverse protocol direction STARTD_SENDS_ALIVES == True STARTD_SENDS_ALIVES True ## Avoid UDP Avoid UDP SCHEDD_SEND_VACATE_VIA_TCP == True SCHEDD_SEND_VACATE_VIA_TCP True UCSD Jan 17th 2012 Condor Install 39
  • 40. Installing with Q&A installer ~/glideinWMS/install$ ./glideinWMS_install ~/glideinWMS/install$ ./glideinWMS_install ... ... Please select: 5 [5] User Schedd5 Please select: [5] User Schedd … … Which user should Condor run under?: [condor] condor Which user should Condor run under?: [condor] condor Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you have the Condor tarball? /root/condor-7.6.4-x86_rhap_5-stripped.tar.gz Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor Where do you want to install it?: [/home/condor/glidecondor] /opt/glidecondor If something goes wrong with Condor, who should get email about it?: me@myemail If something goes wrong with Condor, who should get email about it?: me@myemail Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y ... ... Do you want to get it from VDT?: (y/n) y Do you want to get it from VDT?: (y/n) y Do you have already a VDT installation?: (y/n) y Do you have already a VDT installation?: (y/n) y Where is the VDT installed?: /etc/osg/wn-client Where is the VDT installed?: /etc/osg/wn-client Will you be using a proxy or a cert? (proxy/cert) cert Will you be using a proxy or a cert? (proxy/cert) cert Where is your certificate located?: /etc/grid-security/hostcert.pem Where is your certificate located?: /etc/grid-security/hostcert.pem Where is your certificate key located?: /etc/grid-security/hostkey.pem Where is your certificate key located?: /etc/grid-security/hostkey.pem My DN = 'DN1' My DN = 'DN1' ... ... You can also add DN: DNXXX the DNs as an DN: DNXXX nickname: [condor001] uidXXX nickname: [condor001] uidXXX xN independent step Is this a trusted Condor daemon?: (y/n) y Is this a trusted Condor daemon?: (y/n) y ... ... DN: DN: What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain What node is the collector running (i.e. CONDOR_HOST)?: collectornode.mydomain Do you want to enable the shared_port_daemon?: (y/n) y Do you want to enable the shared_port_daemon?: (y/n) y What port should it use?: [9615] 9615 What port should it use?: [9615] 9615 How many secondary schedds do you want?: [9] 0 How many secondary schedds do you want?: [9] 0 UCSD Jan 17th 2012 Condor Install 40
  • 41. Maintenance ● If you need to add more DNs, use ● cmdline tool glidecondor_addDN ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA ~/glideinWMS/install$ ./glidecondor_addDN -daemon "DN of Schedd A" "DNA" UIDA Configuration files changed. Configuration files changed. Remember to reconfig the affected Condor daemons. Remember to reconfig the affected Condor daemons. Do not use -daemon ● To upgrade the Condor binaries, use for client's DN ● cmdline tool glidecondor_upgrade ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz ~/glideinWMS/install$ ./glidecondor_upgrade ~/Downloads/condor-7.6.5-x86_rhap_5-stripped.tar.gz Will update Condor in /home/condor/glidecondor Will update Condor in /home/condor/glidecondor .. .. Creating backup dir Creating backup dir Putting new binaries in place Putting new binaries in place Finished successfully Finished successfully Old binaries can be found in /home/condor/glidecondor/old.120102_13 Old binaries can be found in /home/condor/glidecondor/old.120102_13 UCSD Jan 17th 2012 Condor Install 41
  • 42. Starting Condor ● The installer will start Condor for you, but you still should know how to stop and start it by hand ● The installer has created an init.d script for you /etc/init.d/condor start|stop ● To force Condor to reload its config, still use /opt/glidecondor/sbin/condor_reconfig All as root UCSD Jan 17th 2012 Condor Install 42
  • 43. Fine tunning UCSD Jan 17th 2012 Condor Install 43
  • 44. Fine tunning ● The previous slides provide only basic setup ● Although the glideinWMS does some basic tunning ● You will likely want to tune the system further ● Proper limits in the submit node ● Default job attributes ● Sanity checks ● Priority tunning ● Not part of this talk ● Will go into details tomorrow UCSD Jan 17th 2012 Condor Install 44
  • 45. Integration with OSG Accounting UCSD Jan 17th 2012 Condor Install 45
  • 46. OSG Accounting ● OSG tries to keep accurate accounting information of who used what resources ● Using GRATIA https://twiki.grid.iu.edu/twiki/bin/view/Accounting/WebHome http://gratia-osg-prod-reports.opensciencegrid.org/gratia-reporting/ UCSD Jan 17th 2012 Condor Install 46
  • 47. Per-user accounting ● OSG has per-user accounting, too ● With glideins, this level of detail lost ● Only pilot proxy seen by OSG (sites) UCSD Jan 17th 2012 Condor Install 47
  • 48. The glidein GRATIA probe ● OSG thus asks glidein operators to install a dedicated probe alongside the glidein schedd(s) ● Which will provide per-user accounting info to the OSG GRATIA server ● Optimized for use with OSG glidein factory https://twiki.grid.iu.edu/bin/view/Accounting/ProbeConfigGlideinWMS Submit node Schedd OSG GRATIA GRATIA Probe Server UCSD Jan 17th 2012 Condor Install 48
  • 49. Installing the GRATIA probe ● In a nutshell ● Register submit node with GOC ● Tweak condor config ● yum install gratia-probe-condor ● Configure GRATIA https://twiki.grid.iu.edu/bin/view/Accounting/ProbeConfigGlideinWMS UCSD Jan 17th 2012 Condor Install 49
  • 50. Condor changes for GRATIA ● GRATIA gets information from history logs ● Requires one file per terminated job for efficiency ● GRATIA needs to know where the job ran ● Additional attribute added to the job ClassAd (more general details on this tomorrow) ## condor_config.local condor_config.local PER_JOB_HISTORY_DIR ==/var/lib/gratia/data PER_JOB_HISTORY_DIR /var/lib/gratia/data JOBGLIDEIN_ResourceName= JOBGLIDEIN_ResourceName= "$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName), "$$([IfThenElse(IsUndefined(TARGET.GLIDEIN_ResourceName), IfThenElse(IsUndefined(TARGET.GLIDEIN_Site), IfThenElse(IsUndefined(TARGET.GLIDEIN_Site), FileSystemDomain, TARGET.GLIDEIN_Site), FileSystemDomain, TARGET.GLIDEIN_Site), TARGET.GLIDEIN_ResourceName)])" TARGET.GLIDEIN_ResourceName)])" SUBMIT_EXPRS == $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName SUBMIT_EXPRS $(SUBMIT_EXPRS) JOBGLIDEIN_ResourceName UCSD Jan 17th 2012 Condor Install 50
  • 51. GRATIA configuration ● Essentially just tell GRATIA ## /etc/gratia/condor/ProbeConfig /etc/gratia/condor/ProbeConfig what name you have SiteName="VOX_glidein_node1" SiteName="VOX_glidein_node1" EnableProbe="1" registered in with GOC EnableProbe="1" ## add this line to allow user jobs add this line to allow user jobs ● Then enable it ## without a proxy without a proxy MapUnknownToGroup="1" MapUnknownToGroup="1" ● You also need to tell it ## /root/setup.sh /root/setup.sh where to find Condor source /etc/profile.d/condor.sh source /etc/profile.d/condor.sh UCSD Jan 17th 2012 Condor Install 51
  • 52. The End UCSD Jan 17th 2012 Condor Install 52
  • 53. Pointers ● The official glideinWMS project Web page is http://tinyurl.com/glideinWMS ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● Condor Home Page http://www.cs.wisc.edu/condor/ ● Condor support condor-user@cs.wisc.edu condor-admin@cs.wisc.edu UCSD Jan 17th 2012 Condor Install 53
  • 54. Acknowledgments ● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI ● The glideinWMS factory operations at UCSD is sponsored by OSG ● The funding comes from NSF, DOE and the UC system UCSD Jan 17th 2012 Condor Install 54