SlideShare a Scribd company logo
1 of 33
Download to read offline
glideinWMS Training @ UCSD



                       GlideinWMS
                     Validation scripts
                        by Igor Sfiligoi (UCSD)




UCSD Jan 18th 2012            Validation Scripts   1
Overview
 ●   Why validation scripts
 ●   Anatomy of validation scripts
 ●   Types of validation scripts




UCSD Jan 18th 2012       Validation Scripts   2
Reminder - Glideins
 ●   A glidein is just a properly configured Condor
     execution node submitted as a Grid job
      ●   glideinWMS    Central manager
          provides         Collector             CREAM
                                                           glidein
                                                         Execution node
          automation                                       glidein
                                                         Execution node
                          Negotiator
     Submit node
     Submit node
                                                           glidein
                                                         Execution node
     Submit node
                                                         Execution node
                                                           glidein
       Schedd                                               Startd
                                              Globus
                                                                 Job
                        glideinWMS


UCSD Jan 18th 2012              Validation Scripts                        3
Reminder – Glidein script
 ●   Glidein startup script just a empty shell that:
      ●   Downloads scripts, parameters and Condor bins
      ●   Runs the scripts in order
      ●   Does the final cleanup
 ●   Two types of script:
                                                     If any of these fail,
      ●   Node validation                            Condor will never be started

      ●   Condor configuration and startup
                       Once Condor starts,
                       glideinWMS is out of the way



UCSD Jan 18th 2012              Validation Scripts                                  4
As a consequence

   If validation scripts finds a bad WN


                Condor will not be started


          No user jobs will ever fail here

UCSD Jan 18th 2012        Validation Scripts   5
Is validating at glidein startup
                   a good idea?
 ●   Advantages:                                         Users happy

      ●   User jobs never land on “broken” nodes
      ●   Failures logged        Factory admins can act on this info,
                                              notifying sites (who can fix the problem)
 ●   Limitations:                                                         Condor provides
      ●   Tested only at glidein startup                                cron-like capabilities
                                                                                for this
           –   If node “goes bad” after Condor startup,
               user jobs will still be fetched and will fail        Can be solved by
 ●   Problems:                                                       passing the test
                                                                   and setting attributes
      ●
          Failed validation       → wasted CPU
           –   Some jobs may still succeed,                                 But this will
                                                                           hide problem
               even if validation failed                                   from Factory
UCSD Jan 18th 2012                  Validation Scripts                                      6
Anatomy of
                     a validation script




UCSD Jan 18th 2012          Validation Scripts   7
Validation scripts 101
 ●   Any executable will do!
      ●   There are no restrictions
      ●   Can be compiled binary or a shell script
 ●   Exit code checked
      ●   ==0 - Success
      ●   !=0        - Failure
 ●   And, to the first approximation, this is all


UCSD Jan 18th 2012               Validation Scripts   8
Validation scripts - I/O
 ●   You may want to:
      ●   Get some input
      ●   Have some output
 ●   Both handled through a dashboard file
      ●   Filename passed as the only argument
          to the validation scripts




UCSD Jan 18th 2012            Validation Scripts   9
Dashboard file
 ●   Simple list of (key, value) pairs
      ●   One per line                                              Newline not allowed in either key or value
      ●   Space separated                                           Space not allowed in the key
 ●    Hash (#) can be used for comments
      GLIDEIN_Factory UCSD
       GLIDEIN_Factory UCSD
      GLIDEIN_Name Production_v4_2
       GLIDEIN_Name Production_v4_2
      GLIDEIN_Entry_Name CMS_T2_US_UCSD_gw2
       GLIDEIN_Entry_Name CMS_T2_US_UCSD_gw2
      GLIDECLIENT_Name UCSD-v5_3.main
       GLIDECLIENT_Name UCSD-v5_3.main
      GLIDEIN_WORK_DIR /data10/condor_local/execute/dir_22668/glide_B22745/main
       GLIDEIN_WORK_DIR /data10/condor_local/execute/dir_22668/glide_B22745/main
      GLIDEIN_Glexec_Use OPTIONAL
       GLIDEIN_Glexec_Use OPTIONAL
      X509_CERT_DIR            /wn-client/globus/TRUSTED_CA
       X509_CERT_DIR            /wn-client/globus/TRUSTED_CA
      GLIDEIN_Site UCSD
       GLIDEIN_Site UCSD
      # This was calculated on the fly
       # This was calculated on the fly
      CCB_ADDRESS glidein-collector.t2.ucsd.edu:9822
       CCB_ADDRESS glidein-collector.t2.ucsd.edu:9822

                     http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html#glidein_config


UCSD Jan 18th 2012                                 Validation Scripts                                     10
Reading input
 ●   Dashboard file as the first argument
 ●   Then just look for the key and split on space

          # here is my dashboard file
           # here is my dashboard file
          glidein_config=$1
           glidein_config=$1
          # I expect only one key and no space in the value
           # I expect only one key and no space in the value
          glexec_bin=`awk '/^GLEXEC_BIN /{print $2}' $glidein_config`
           glexec_bin=`awk '/^GLEXEC_BIN /{print $2}' $glidein_config`
          if [ -z "$glexec_bin" ]; then
           if [ -z "$glexec_bin" ]; then
              exit 1
               exit 1
          fi
           fi
          …
           …
          exit 0
           exit 0




UCSD Jan 18th 2012                Validation Scripts                     11
Writing output
 ●   You can just append to the file
      ●   Just make sure it is properly formatted
          # here is my dashboard file
           # here is my dashboard file
          glidein_config=$1
           glidein_config=$1
          …
           …
          # tell condor to use glexec
           # tell condor to use glexec
          echo 'GLEXEC_JOB True' >> $glidein_config
           echo 'GLEXEC_JOB True' >> $glidein_config
          exit 0
           exit 0

 ●   You should also make sure
     the same key is not already defined

UCSD Jan 18th 2012                Validation Scripts   12
Helper function
 ●   glideinWMS provides a helper BASH function to
     avoid duplicate keys
      ●    External SH file, referenced as
           ADD_CONFIG_LINE_SOURCE
      ●    The function name inside is
           add_config_line
          # here is my dashboard file (MUST be called glidein_config)
           # here is my dashboard file (MUST be called glidein_config)
          glidein_config=$1
           glidein_config=$1
          # get helper function
           # get helper function
          add_config_line_source= 
           add_config_line_source= 
             `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config`
              `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config`
          source $add_config_line_source
           source $add_config_line_source
          …
           …
          # tell condor to use glexec
           # tell condor to use glexec
          add_config_line 'GLEXEC_JOB' 'True'
           add_config_line 'GLEXEC_JOB' 'True'

UCSD Jan 18th 2012                  Validation Scripts                       13
Influencing Condor behavior
 ●   By default, keys in dashboard file ignored by
     Condor startup/configuration script
      ●   Anything you write into it, it is just for your
          consumption (e.g. for other scripts of yours)
 ●   A special whitelist file lists the keys
     that should be passed to Condor
      ●   Referenced as
          CONDOR_VARS_FILE                         Again, source
                                                   ADD_CONFIG_LINE_SOURCE
      ●   Helper function available
          add_condor_vars_line

UCSD Jan 18th 2012            Validation Scripts                            14
Condor Vars file
 ●   Each line contains a key
 ●   Seven fields, space (or tab) separated
      ●   Key
      ●   Type                - I – Integer, S – String, C – Expr.
      ●   Default value       - “-” for no default
      ●   Condor Name - “+” = Key name                 Useful when others
                                                       have to define it
      ●   Is it required?     - Y|N
      ●   Should be exported to ClassAd?                - Y|N
      ●   Should be exported to job environment?
            - “-” no, “+” Key name, “@” Condor Name
                     http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html#condor_vars

UCSD Jan 18th 2012                                Validation Scripts                                 15
Example

       # here is my dashboard file (MUST be called glidein_config)
        # here is my dashboard file (MUST be called glidein_config)
       glidein_config=$1
        glidein_config=$1
       # extract where to find the vars file
        # extract where to find the vars file
       # (MUST be called condor_vars_file)
        # (MUST be called condor_vars_file)
       condor_vars_file= 
        condor_vars_file= 
          `awk '/^CONDOR_VARS_FILE /{print $2}' $glidein_config`
           `awk '/^CONDOR_VARS_FILE /{print $2}' $glidein_config`
       # get helper function
        # get helper function
       add_config_line_source= 
        add_config_line_source= 
          `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config`
           `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config`
       source $add_config_line_source
        source $add_config_line_source
       …
        …
       # This should already have been set
        # This should already have been set
       add_condor_vars_line "GLEXEC_BIN" "C" "-" "GLEXEC" "Y" "N" "-"
        add_condor_vars_line "GLEXEC_BIN" "C" "-" "GLEXEC" "Y" "N" "-"
       # tell condor to use glexec
        # tell condor to use glexec
       add_config_line 'GLEXEC_JOB' 'True'
        add_config_line 'GLEXEC_JOB' 'True'
       add_condor_vars_line "GLEXEC_JOB" "C" "True" "+" "Y" "Y" "-"
        add_condor_vars_line "GLEXEC_JOB" "C" "True" "+" "Y" "Y" "-"
       # tell user where is the TMPDIR
        # tell user where is the TMPDIR
       add_config_line 'GLEXEC_TMP' $TMPDIR
        add_config_line 'GLEXEC_TMP' $TMPDIR
       add_condor_vars_line "GLEXEC_TMP" "S" "-" "+" "Y" "Y" "+"
        add_condor_vars_line "GLEXEC_TMP" "S" "-" "+" "Y" "Y" "+"



UCSD Jan 18th 2012                Validation Scripts                      16
Error messages
 ●   Your script found a problem
      ●   Now what?
 ●   You definitely want to exit with errno !=0
 ●   But, please, also print an error message!
      ●   With enough information to understand
          why the script failed
      ●   Will allow the Factory admins to act on it




UCSD Jan 18th 2012            Validation Scripts       17
Planned improvements
                                    (still speculation at this point)

 ●   Current error codes and messages arbitrary
      ●   Mostly good enough for manual debugging
      ●   But cannot really automatically act on them
 ●   Want to add some more structure
      ●   Based on OSG Common Output Format proposal
          https://twiki.grid.iu.edu/bin/view/SoftwareTools/CommonTestFormat#Alain_s_proposal_Version_4_evolu


 ●   In addition to exit code,                                                                             If file not present,
     scripts expected to write a status file                                                               will assume
                                                                                                           “Error unknown”
      ●   Which will be read and interpreted by the caller
          and propagated to the Factory     Now we can start thinking about
                                                                                     automatically acting on errors!

UCSD Jan 18th 2012                                  Validation Scripts                                                 18
Standardized error reasons
                     (preliminary - still speculation at this point)

 ●   To allow for automated feedback, need
     standardized error reasons
 ●   This is what I currently envision:
      ●   Config                    - e.g. Impossible combinations
      ●   Corruption                - e.g. SHA1 check failed
      ●   WN Resource               - e.g. Disk full or glexec not found
      ●   Network                   - e.g. Cannot talk to VO Collector
      ●   VO Proxy                  - e.g. Proxy too short
      ●   VO Data                   - e.g. VO SW not installed

UCSD Jan 18th 2012                    Validation Scripts                   19
Examples
                      (preliminary - still speculation at this point)

<?xml version="1.0"?>
 <?xml version="1.0"?>
<OSGTestResult id="glideinWMS.check_disk" version="7.5.4">
 <OSGTestResult id="glideinWMS.check_disk" version="7.5.4">
  <result>
   <result>
    <status>OK</status>
     <status>OK</status>
    <metric name="diskspace" ts="2012-01-12T15:02:20"
     <metric name="diskspace" ts="2012-01-12T15:02:20"
             uri="local">/tmp/glidein_15432/</metric>
              uri="local">/tmp/glidein_15432/</metric>
  </result>
   </result>
  <detail>Enough disk space found.</detail>
   <detail>Enough disk space found.</detail>
</OSGTestResult>
 </OSGTestResult> <?xml version="1.0"?>
                         <?xml version="1.0"?>
                        <OSGTestResult id="glideinWMS.check_proxy" version="7.5.4">
                         <OSGTestResult id="glideinWMS.check_proxy" version="7.5.4">
                         <result>
                          <result>
                           <status>FAILED</status>
                            <status>FAILED</status>
                           <metric name="failure" ts="..." uri="local">VO Proxy</metric>
                            <metric name="failure" ts="..." uri="local">VO Proxy</metric>
                           <metric name="proxy" ts="2012-01-12T15:02:21"
                            <metric name="proxy" ts="2012-01-12T15:02:21"
                                    uri="local">/tmp/glidein_15432/proxy/a.proxy</metric>
                                     uri="local">/tmp/glidein_15432/proxy/a.proxy</metric>
                         </result>
                          </result>
                         <detail>Proxy had less than 12h left.</detail>
                          <detail>Proxy had less than 12h left.</detail>
                       </OSGTestResult>
                        </OSGTestResult>
 UCSD Jan 18th 2012                     Validation Scripts                              20
Validation script
                          types




UCSD Jan 18th 2012         Validation Scripts   21
Why should you use VS?
 ●   Of course:                              What we discussed until now

      ●   Check for obviously broken nodes
 ●   But also:
      ●   To discover and advertise dynamic information
      ●   Non-trivial configuration
      ●   Site-specific customizations




UCSD Jan 18th 2012            Validation Scripts                           22
Dynamic information
 ●   Some information dynamic by nature
      ●   E.g. location of VO software
 ●   You want to discover at run-time where
     it is located
      ●   And fail, if you cannot find it!
      ●   Makes life easier for the users
 ●   Once discovered, good practice to advertise it
      ●   In either/both the ClassAd and/or job environment


UCSD Jan 18th 2012             Validation Scripts             23
Example
           # check if CMSSW installed locally
            # check if CMSSW installed locally
           if [ -f "$CMSSW" ]; then
            if [ -f "$CMSSW" ]; then
               source "$CMSSW"
                source "$CMSSW"
               If [ -z “$CMSSW_LIST” -o -z "$CMSSW_LOC" ]; then
                If [ -z “$CMSSW_LIST” -o -z "$CMSSW_LOC" ]; then
                   echo "Corrupted CMSSW at $CMSSW!n" 1>&2
                    echo "Corrupted CMSSW at $CMSSW!n" 1>&2
                   exit 1
                    exit 1
               fi
                fi
           else
            else
               echo "CMSSW not found!n" 1>&2
                echo "CMSSW not found!n" 1>&2
               exit 1
                exit 1
           fi
            fi
           # publish to user job env
            # publish to user job env
           add_config_line "CMSSW_LOC" "$CMSSW_LOC"
            add_config_line "CMSSW_LOC" "$CMSSW_LOC"
           add_condor_vars_line "CMSSW_LOC" "S" "-" "+" "Y" "N" "+"
            add_condor_vars_line "CMSSW_LOC" "S" "-" "+" "Y" "N" "+"
           # publish to Condor
            # publish to Condor
           add_config_line "CMSSW_LIST" "$CMSSW_LIST"
            add_config_line "CMSSW_LIST" "$CMSSW_LIST"
           add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-"
            add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-"
           exit 0
            exit 0




UCSD Jan 18th 2012                 Validation Scripts                   24
Non-trivial configuration
                            (Not really a “validation” script)

 ●   You may want to generate some data on the fly
      ●   e.g. a random seed
             let s=$RANDOM%123+17
              let s=$RANDOM%123+17
             add_config_line "MY_SEED" “$s”
              add_config_line "MY_SEED" “$s”
             add_condor_vars_line "MY_SEED" "I" "-" "+" "Y" "N" "+"
              add_condor_vars_line "MY_SEED" "I" "-" "+" "Y" "N" "+"

 ●   And sometimes it is just inconvenient to specify
     some values in the frontend XML file
      ●   e.g a long list
             l="1"
              l="1"
             for ((i=2; $i<100; i++)); do
              for ((i=2; $i<100; i++)); do
               l="$l:$i"
                l="$l:$i"
             done
              done
             add_config_line "MY_LIST" “$l”
              add_config_line "MY_LIST" “$l”
             add_condor_vars_line "MY_LIST" "S" "-" "+" "Y" "N" "+"
              add_condor_vars_line "MY_LIST" "S" "-" "+" "Y" "N" "+"

UCSD Jan 18th 2012                    Validation Scripts               25
Site specific customization
 ●   Currently, the frontend XML file does not allow
     site-specific customizations
      ●   Unless you want to have a group per site!
                                                             Limiting, since only one level of groups
      ●   And there is the option for you to arrange for
          the Factory to provide it for you
                                                             Maintenance will be a mess

 ●   You can code the per-site config
     into a “validation script”
                                                             Still not ideal, but may be
                                                             better than the alternative
Especially, if you can apply a rule with few exceptions

UCSD Jan 18th 2012                      Validation Scripts                                     26
Example
          glidein_config=$1
           glidein_config=$1
          site=`awk '/^GLIDEIN_CMSSITE /{print $2}' $glidein_config`
           site=`awk '/^GLIDEIN_CMSSITE /{print $2}' $glidein_config`
          country=`echo $site| awk '{print substr($1,8,2)}'`
           country=`echo $site| awk '{print substr($1,8,2)}'`
          if [ "$country" == "US" ]; then
           if [ "$country" == "US" ]; then
             myvar="OSG"
              myvar="OSG"
          elif [ "$country" == "IT" -o "$country" == "FR" ]; then
           elif [ "$country" == "IT" -o "$country" == "FR" ]; then
             myvar="EGI"
              myvar="EGI"
          else
           else
             echo "Cannot run in $country" 1>&2
              echo "Cannot run in $country" 1>&2
             exit 1
              exit 1
          fi
           fi
          add_config_line "MY_VAR" "$myvar"
           add_config_line "MY_VAR" "$myvar"
          add_condor_vars_line "MY_VAR" "I" "-" "+" "Y" "N" "+"
           add_condor_vars_line "MY_VAR" "I" "-" "+" "Y" "N" "+"




UCSD Jan 18th 2012                Validation Scripts                    27
Limitations




UCSD Jan 18th 2012      Validation Scripts   28
Limits of validation scripts
 ●   Whatever is discovered on the WN is
      ●   Used by the script for its own testing
      ●   At best, propagated to glidein ClassAd or job env
 ●   The discovered info cannot be used for
     Frontend matchmaking purposes!
      ●   At best, for Negotiator matchmaking
 ●   As a result, you may be requesting glideins that
     will never run any user jobs              If condition
                                                      common to
      ●   Wither fail validation or do not match       all WNs


UCSD Jan 18th 2012            Validation Scripts              29
What can you do?
 ●   How do you notice it?
      ●   If validation errors
           –   The Factory admins will likely contact you
      ●   If Negotiator not matching jobs
           –   You will need to discover it yourself
 ●   What to do afterwards?                         Maybe you were just too aggressive?
      ●   Tune the script                         Pretty much a hack!

      ●   Manually blacklist a site is your frontend XML
      ●   Or convince the Factory admins to advertise
          VO specific info
                                              Can be hard to maintain long term

UCSD Jan 18th 2012                 Validation Scripts                                     30
The End




UCSD Jan 18th 2012    Validation Scripts   31
Pointers
 ●   The official glideinWMS project Web page is
     http://tinyurl.com/glideinWMS
 ●   glideinWMS development team is reachable at
     glideinwms-support@fnal.gov
 ●   The OSG glidein factory is reachable at
     osg-gfactory-support@physics.ucsd.edu




UCSD Jan 18th 2012     Validation Scripts          32
Acknowledgments
 ●   The glideinWMS is a CMS-led project
     developed mostly at FNAL, with contributions
     from UCSD and ISI
 ●   The glideinWMS factory operations at UCSD is
     sponsored by OSG
 ●   The funding comes from NSF, DOE and the
     UC system




UCSD Jan 18th 2012        Validation Scripts        33

More Related Content

What's hot (6)

Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012Condor overview - glideinWMS Training Jan 2012
Condor overview - glideinWMS Training Jan 2012
 
Make it test-driven with CDI!
Make it test-driven with CDI!Make it test-driven with CDI!
Make it test-driven with CDI!
 
Achieving Zero Defect with Agile Methods BugDay Bangkok 2012 โดย Varokas Pan...
Achieving Zero Defect with Agile Methods BugDay Bangkok 2012  โดย Varokas Pan...Achieving Zero Defect with Agile Methods BugDay Bangkok 2012  โดย Varokas Pan...
Achieving Zero Defect with Agile Methods BugDay Bangkok 2012 โดย Varokas Pan...
 
How to Test Enterprise Java Applications
How to Test Enterprise Java ApplicationsHow to Test Enterprise Java Applications
How to Test Enterprise Java Applications
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric car
 
Code Contracts In .Net
Code Contracts In .NetCode Contracts In .Net
Code Contracts In .Net
 

Similar to glideinWMS validation scirpts - glideinWMS Training Jan 2012

Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013
Opersys inc.
 

Similar to glideinWMS validation scirpts - glideinWMS Training Jan 2012 (20)

Monitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor poolMonitoring and troubleshooting a glideinWMS-based HTCondor pool
Monitoring and troubleshooting a glideinWMS-based HTCondor pool
 
Pilot Factory
Pilot FactoryPilot Factory
Pilot Factory
 
Matchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMSMatchmaking in glideinWMS in CMS
Matchmaking in glideinWMS in CMS
 
glideinWMS Training Jan 2012 - Condor tuning
glideinWMS Training Jan 2012 - Condor tuningglideinWMS Training Jan 2012 - Condor tuning
glideinWMS Training Jan 2012 - Condor tuning
 
Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012Condor from the user point of view - glideinWMS Training Jan 2012
Condor from the user point of view - glideinWMS Training Jan 2012
 
Introduction to glideinWMS
Introduction to glideinWMSIntroduction to glideinWMS
Introduction to glideinWMS
 
How to write Testable Javascript
How to write Testable JavascriptHow to write Testable Javascript
How to write Testable Javascript
 
How do I write Testable Javascript - Presented at dev.Objective() June 16, 2016
How do I write Testable Javascript - Presented at dev.Objective() June 16, 2016How do I write Testable Javascript - Presented at dev.Objective() June 16, 2016
How do I write Testable Javascript - Presented at dev.Objective() June 16, 2016
 
Serenity BDD Workshop - 9th March 2016
Serenity BDD Workshop - 9th March 2016Serenity BDD Workshop - 9th March 2016
Serenity BDD Workshop - 9th March 2016
 
Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013Android Platform Debugging and Development at ELCE 2013
Android Platform Debugging and Development at ELCE 2013
 
Debug JNI code with ndk-gdb and eclipse GUI
Debug JNI code with ndk-gdb and eclipse GUIDebug JNI code with ndk-gdb and eclipse GUI
Debug JNI code with ndk-gdb and eclipse GUI
 
Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010Improving Engineering Processes using Hudson - Spark IT 2010
Improving Engineering Processes using Hudson - Spark IT 2010
 
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
OpenTuesday: Die Selenium-Toolfamilie und ihr Einsatz im Web- und Mobile-Auto...
 
Introduction of unit test on android kernel
Introduction of unit test on android kernelIntroduction of unit test on android kernel
Introduction of unit test on android kernel
 
Android app to the challenge
Android   app to the challengeAndroid   app to the challenge
Android app to the challenge
 
Pluggable web app using Angular (Odessa JS conf)
Pluggable web app using Angular (Odessa JS conf)Pluggable web app using Angular (Odessa JS conf)
Pluggable web app using Angular (Odessa JS conf)
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
 
Test your user interface using BDD (Swedish)
Test your user interface using BDD (Swedish)Test your user interface using BDD (Swedish)
Test your user interface using BDD (Swedish)
 
NovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programsNovaProva, a new generation unit test framework for C programs
NovaProva, a new generation unit test framework for C programs
 
An argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS ExperienceAn argument for moving the requirements out of user hands - The CMS Experience
An argument for moving the requirements out of user hands - The CMS Experience
 

More from Igor Sfiligoi

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Comparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance computeComparing GPU effectiveness for Unifrac distance compute
Comparing GPU effectiveness for Unifrac distance compute
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

glideinWMS validation scirpts - glideinWMS Training Jan 2012

  • 1. glideinWMS Training @ UCSD GlideinWMS Validation scripts by Igor Sfiligoi (UCSD) UCSD Jan 18th 2012 Validation Scripts 1
  • 2. Overview ● Why validation scripts ● Anatomy of validation scripts ● Types of validation scripts UCSD Jan 18th 2012 Validation Scripts 2
  • 3. Reminder - Glideins ● A glidein is just a properly configured Condor execution node submitted as a Grid job ● glideinWMS Central manager provides Collector CREAM glidein Execution node automation glidein Execution node Negotiator Submit node Submit node glidein Execution node Submit node Execution node glidein Schedd Startd Globus Job glideinWMS UCSD Jan 18th 2012 Validation Scripts 3
  • 4. Reminder – Glidein script ● Glidein startup script just a empty shell that: ● Downloads scripts, parameters and Condor bins ● Runs the scripts in order ● Does the final cleanup ● Two types of script: If any of these fail, ● Node validation Condor will never be started ● Condor configuration and startup Once Condor starts, glideinWMS is out of the way UCSD Jan 18th 2012 Validation Scripts 4
  • 5. As a consequence If validation scripts finds a bad WN Condor will not be started No user jobs will ever fail here UCSD Jan 18th 2012 Validation Scripts 5
  • 6. Is validating at glidein startup a good idea? ● Advantages: Users happy ● User jobs never land on “broken” nodes ● Failures logged Factory admins can act on this info, notifying sites (who can fix the problem) ● Limitations: Condor provides ● Tested only at glidein startup cron-like capabilities for this – If node “goes bad” after Condor startup, user jobs will still be fetched and will fail Can be solved by ● Problems: passing the test and setting attributes ● Failed validation → wasted CPU – Some jobs may still succeed, But this will hide problem even if validation failed from Factory UCSD Jan 18th 2012 Validation Scripts 6
  • 7. Anatomy of a validation script UCSD Jan 18th 2012 Validation Scripts 7
  • 8. Validation scripts 101 ● Any executable will do! ● There are no restrictions ● Can be compiled binary or a shell script ● Exit code checked ● ==0 - Success ● !=0 - Failure ● And, to the first approximation, this is all UCSD Jan 18th 2012 Validation Scripts 8
  • 9. Validation scripts - I/O ● You may want to: ● Get some input ● Have some output ● Both handled through a dashboard file ● Filename passed as the only argument to the validation scripts UCSD Jan 18th 2012 Validation Scripts 9
  • 10. Dashboard file ● Simple list of (key, value) pairs ● One per line Newline not allowed in either key or value ● Space separated Space not allowed in the key ● Hash (#) can be used for comments GLIDEIN_Factory UCSD GLIDEIN_Factory UCSD GLIDEIN_Name Production_v4_2 GLIDEIN_Name Production_v4_2 GLIDEIN_Entry_Name CMS_T2_US_UCSD_gw2 GLIDEIN_Entry_Name CMS_T2_US_UCSD_gw2 GLIDECLIENT_Name UCSD-v5_3.main GLIDECLIENT_Name UCSD-v5_3.main GLIDEIN_WORK_DIR /data10/condor_local/execute/dir_22668/glide_B22745/main GLIDEIN_WORK_DIR /data10/condor_local/execute/dir_22668/glide_B22745/main GLIDEIN_Glexec_Use OPTIONAL GLIDEIN_Glexec_Use OPTIONAL X509_CERT_DIR /wn-client/globus/TRUSTED_CA X509_CERT_DIR /wn-client/globus/TRUSTED_CA GLIDEIN_Site UCSD GLIDEIN_Site UCSD # This was calculated on the fly # This was calculated on the fly CCB_ADDRESS glidein-collector.t2.ucsd.edu:9822 CCB_ADDRESS glidein-collector.t2.ucsd.edu:9822 http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html#glidein_config UCSD Jan 18th 2012 Validation Scripts 10
  • 11. Reading input ● Dashboard file as the first argument ● Then just look for the key and split on space # here is my dashboard file # here is my dashboard file glidein_config=$1 glidein_config=$1 # I expect only one key and no space in the value # I expect only one key and no space in the value glexec_bin=`awk '/^GLEXEC_BIN /{print $2}' $glidein_config` glexec_bin=`awk '/^GLEXEC_BIN /{print $2}' $glidein_config` if [ -z "$glexec_bin" ]; then if [ -z "$glexec_bin" ]; then exit 1 exit 1 fi fi … … exit 0 exit 0 UCSD Jan 18th 2012 Validation Scripts 11
  • 12. Writing output ● You can just append to the file ● Just make sure it is properly formatted # here is my dashboard file # here is my dashboard file glidein_config=$1 glidein_config=$1 … … # tell condor to use glexec # tell condor to use glexec echo 'GLEXEC_JOB True' >> $glidein_config echo 'GLEXEC_JOB True' >> $glidein_config exit 0 exit 0 ● You should also make sure the same key is not already defined UCSD Jan 18th 2012 Validation Scripts 12
  • 13. Helper function ● glideinWMS provides a helper BASH function to avoid duplicate keys ● External SH file, referenced as ADD_CONFIG_LINE_SOURCE ● The function name inside is add_config_line # here is my dashboard file (MUST be called glidein_config) # here is my dashboard file (MUST be called glidein_config) glidein_config=$1 glidein_config=$1 # get helper function # get helper function add_config_line_source= add_config_line_source= `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config` `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config` source $add_config_line_source source $add_config_line_source … … # tell condor to use glexec # tell condor to use glexec add_config_line 'GLEXEC_JOB' 'True' add_config_line 'GLEXEC_JOB' 'True' UCSD Jan 18th 2012 Validation Scripts 13
  • 14. Influencing Condor behavior ● By default, keys in dashboard file ignored by Condor startup/configuration script ● Anything you write into it, it is just for your consumption (e.g. for other scripts of yours) ● A special whitelist file lists the keys that should be passed to Condor ● Referenced as CONDOR_VARS_FILE Again, source ADD_CONFIG_LINE_SOURCE ● Helper function available add_condor_vars_line UCSD Jan 18th 2012 Validation Scripts 14
  • 15. Condor Vars file ● Each line contains a key ● Seven fields, space (or tab) separated ● Key ● Type - I – Integer, S – String, C – Expr. ● Default value - “-” for no default ● Condor Name - “+” = Key name Useful when others have to define it ● Is it required? - Y|N ● Should be exported to ClassAd? - Y|N ● Should be exported to job environment? - “-” no, “+” Key name, “@” Condor Name http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html#condor_vars UCSD Jan 18th 2012 Validation Scripts 15
  • 16. Example # here is my dashboard file (MUST be called glidein_config) # here is my dashboard file (MUST be called glidein_config) glidein_config=$1 glidein_config=$1 # extract where to find the vars file # extract where to find the vars file # (MUST be called condor_vars_file) # (MUST be called condor_vars_file) condor_vars_file= condor_vars_file= `awk '/^CONDOR_VARS_FILE /{print $2}' $glidein_config` `awk '/^CONDOR_VARS_FILE /{print $2}' $glidein_config` # get helper function # get helper function add_config_line_source= add_config_line_source= `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config` `awk '/^ADD_CONFIG_LINE_SOURCE /{print $2}' $glidein_config` source $add_config_line_source source $add_config_line_source … … # This should already have been set # This should already have been set add_condor_vars_line "GLEXEC_BIN" "C" "-" "GLEXEC" "Y" "N" "-" add_condor_vars_line "GLEXEC_BIN" "C" "-" "GLEXEC" "Y" "N" "-" # tell condor to use glexec # tell condor to use glexec add_config_line 'GLEXEC_JOB' 'True' add_config_line 'GLEXEC_JOB' 'True' add_condor_vars_line "GLEXEC_JOB" "C" "True" "+" "Y" "Y" "-" add_condor_vars_line "GLEXEC_JOB" "C" "True" "+" "Y" "Y" "-" # tell user where is the TMPDIR # tell user where is the TMPDIR add_config_line 'GLEXEC_TMP' $TMPDIR add_config_line 'GLEXEC_TMP' $TMPDIR add_condor_vars_line "GLEXEC_TMP" "S" "-" "+" "Y" "Y" "+" add_condor_vars_line "GLEXEC_TMP" "S" "-" "+" "Y" "Y" "+" UCSD Jan 18th 2012 Validation Scripts 16
  • 17. Error messages ● Your script found a problem ● Now what? ● You definitely want to exit with errno !=0 ● But, please, also print an error message! ● With enough information to understand why the script failed ● Will allow the Factory admins to act on it UCSD Jan 18th 2012 Validation Scripts 17
  • 18. Planned improvements (still speculation at this point) ● Current error codes and messages arbitrary ● Mostly good enough for manual debugging ● But cannot really automatically act on them ● Want to add some more structure ● Based on OSG Common Output Format proposal https://twiki.grid.iu.edu/bin/view/SoftwareTools/CommonTestFormat#Alain_s_proposal_Version_4_evolu ● In addition to exit code, If file not present, scripts expected to write a status file will assume “Error unknown” ● Which will be read and interpreted by the caller and propagated to the Factory Now we can start thinking about automatically acting on errors! UCSD Jan 18th 2012 Validation Scripts 18
  • 19. Standardized error reasons (preliminary - still speculation at this point) ● To allow for automated feedback, need standardized error reasons ● This is what I currently envision: ● Config - e.g. Impossible combinations ● Corruption - e.g. SHA1 check failed ● WN Resource - e.g. Disk full or glexec not found ● Network - e.g. Cannot talk to VO Collector ● VO Proxy - e.g. Proxy too short ● VO Data - e.g. VO SW not installed UCSD Jan 18th 2012 Validation Scripts 19
  • 20. Examples (preliminary - still speculation at this point) <?xml version="1.0"?> <?xml version="1.0"?> <OSGTestResult id="glideinWMS.check_disk" version="7.5.4"> <OSGTestResult id="glideinWMS.check_disk" version="7.5.4"> <result> <result> <status>OK</status> <status>OK</status> <metric name="diskspace" ts="2012-01-12T15:02:20" <metric name="diskspace" ts="2012-01-12T15:02:20" uri="local">/tmp/glidein_15432/</metric> uri="local">/tmp/glidein_15432/</metric> </result> </result> <detail>Enough disk space found.</detail> <detail>Enough disk space found.</detail> </OSGTestResult> </OSGTestResult> <?xml version="1.0"?> <?xml version="1.0"?> <OSGTestResult id="glideinWMS.check_proxy" version="7.5.4"> <OSGTestResult id="glideinWMS.check_proxy" version="7.5.4"> <result> <result> <status>FAILED</status> <status>FAILED</status> <metric name="failure" ts="..." uri="local">VO Proxy</metric> <metric name="failure" ts="..." uri="local">VO Proxy</metric> <metric name="proxy" ts="2012-01-12T15:02:21" <metric name="proxy" ts="2012-01-12T15:02:21" uri="local">/tmp/glidein_15432/proxy/a.proxy</metric> uri="local">/tmp/glidein_15432/proxy/a.proxy</metric> </result> </result> <detail>Proxy had less than 12h left.</detail> <detail>Proxy had less than 12h left.</detail> </OSGTestResult> </OSGTestResult> UCSD Jan 18th 2012 Validation Scripts 20
  • 21. Validation script types UCSD Jan 18th 2012 Validation Scripts 21
  • 22. Why should you use VS? ● Of course: What we discussed until now ● Check for obviously broken nodes ● But also: ● To discover and advertise dynamic information ● Non-trivial configuration ● Site-specific customizations UCSD Jan 18th 2012 Validation Scripts 22
  • 23. Dynamic information ● Some information dynamic by nature ● E.g. location of VO software ● You want to discover at run-time where it is located ● And fail, if you cannot find it! ● Makes life easier for the users ● Once discovered, good practice to advertise it ● In either/both the ClassAd and/or job environment UCSD Jan 18th 2012 Validation Scripts 23
  • 24. Example # check if CMSSW installed locally # check if CMSSW installed locally if [ -f "$CMSSW" ]; then if [ -f "$CMSSW" ]; then source "$CMSSW" source "$CMSSW" If [ -z “$CMSSW_LIST” -o -z "$CMSSW_LOC" ]; then If [ -z “$CMSSW_LIST” -o -z "$CMSSW_LOC" ]; then echo "Corrupted CMSSW at $CMSSW!n" 1>&2 echo "Corrupted CMSSW at $CMSSW!n" 1>&2 exit 1 exit 1 fi fi else else echo "CMSSW not found!n" 1>&2 echo "CMSSW not found!n" 1>&2 exit 1 exit 1 fi fi # publish to user job env # publish to user job env add_config_line "CMSSW_LOC" "$CMSSW_LOC" add_config_line "CMSSW_LOC" "$CMSSW_LOC" add_condor_vars_line "CMSSW_LOC" "S" "-" "+" "Y" "N" "+" add_condor_vars_line "CMSSW_LOC" "S" "-" "+" "Y" "N" "+" # publish to Condor # publish to Condor add_config_line "CMSSW_LIST" "$CMSSW_LIST" add_config_line "CMSSW_LIST" "$CMSSW_LIST" add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-" add_condor_vars_line "CMSSW_LIST" "S" "-" "+" "Y" "Y" "-" exit 0 exit 0 UCSD Jan 18th 2012 Validation Scripts 24
  • 25. Non-trivial configuration (Not really a “validation” script) ● You may want to generate some data on the fly ● e.g. a random seed let s=$RANDOM%123+17 let s=$RANDOM%123+17 add_config_line "MY_SEED" “$s” add_config_line "MY_SEED" “$s” add_condor_vars_line "MY_SEED" "I" "-" "+" "Y" "N" "+" add_condor_vars_line "MY_SEED" "I" "-" "+" "Y" "N" "+" ● And sometimes it is just inconvenient to specify some values in the frontend XML file ● e.g a long list l="1" l="1" for ((i=2; $i<100; i++)); do for ((i=2; $i<100; i++)); do l="$l:$i" l="$l:$i" done done add_config_line "MY_LIST" “$l” add_config_line "MY_LIST" “$l” add_condor_vars_line "MY_LIST" "S" "-" "+" "Y" "N" "+" add_condor_vars_line "MY_LIST" "S" "-" "+" "Y" "N" "+" UCSD Jan 18th 2012 Validation Scripts 25
  • 26. Site specific customization ● Currently, the frontend XML file does not allow site-specific customizations ● Unless you want to have a group per site! Limiting, since only one level of groups ● And there is the option for you to arrange for the Factory to provide it for you Maintenance will be a mess ● You can code the per-site config into a “validation script” Still not ideal, but may be better than the alternative Especially, if you can apply a rule with few exceptions UCSD Jan 18th 2012 Validation Scripts 26
  • 27. Example glidein_config=$1 glidein_config=$1 site=`awk '/^GLIDEIN_CMSSITE /{print $2}' $glidein_config` site=`awk '/^GLIDEIN_CMSSITE /{print $2}' $glidein_config` country=`echo $site| awk '{print substr($1,8,2)}'` country=`echo $site| awk '{print substr($1,8,2)}'` if [ "$country" == "US" ]; then if [ "$country" == "US" ]; then myvar="OSG" myvar="OSG" elif [ "$country" == "IT" -o "$country" == "FR" ]; then elif [ "$country" == "IT" -o "$country" == "FR" ]; then myvar="EGI" myvar="EGI" else else echo "Cannot run in $country" 1>&2 echo "Cannot run in $country" 1>&2 exit 1 exit 1 fi fi add_config_line "MY_VAR" "$myvar" add_config_line "MY_VAR" "$myvar" add_condor_vars_line "MY_VAR" "I" "-" "+" "Y" "N" "+" add_condor_vars_line "MY_VAR" "I" "-" "+" "Y" "N" "+" UCSD Jan 18th 2012 Validation Scripts 27
  • 28. Limitations UCSD Jan 18th 2012 Validation Scripts 28
  • 29. Limits of validation scripts ● Whatever is discovered on the WN is ● Used by the script for its own testing ● At best, propagated to glidein ClassAd or job env ● The discovered info cannot be used for Frontend matchmaking purposes! ● At best, for Negotiator matchmaking ● As a result, you may be requesting glideins that will never run any user jobs If condition common to ● Wither fail validation or do not match all WNs UCSD Jan 18th 2012 Validation Scripts 29
  • 30. What can you do? ● How do you notice it? ● If validation errors – The Factory admins will likely contact you ● If Negotiator not matching jobs – You will need to discover it yourself ● What to do afterwards? Maybe you were just too aggressive? ● Tune the script Pretty much a hack! ● Manually blacklist a site is your frontend XML ● Or convince the Factory admins to advertise VO specific info Can be hard to maintain long term UCSD Jan 18th 2012 Validation Scripts 30
  • 31. The End UCSD Jan 18th 2012 Validation Scripts 31
  • 32. Pointers ● The official glideinWMS project Web page is http://tinyurl.com/glideinWMS ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● The OSG glidein factory is reachable at osg-gfactory-support@physics.ucsd.edu UCSD Jan 18th 2012 Validation Scripts 32
  • 33. Acknowledgments ● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI ● The glideinWMS factory operations at UCSD is sponsored by OSG ● The funding comes from NSF, DOE and the UC system UCSD Jan 18th 2012 Validation Scripts 33