Successfully reported this slideshow.
Your SlideShare is downloading. ×

Wedding convenience and control with RemoteCondor

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 26 Ad

More Related Content

Similar to Wedding convenience and control with RemoteCondor (20)

More from Igor Sfiligoi (20)

Advertisement

Recently uploaded (20)

Wedding convenience and control with RemoteCondor

  1. 1. UCSD HEP Group Trainings Wedding convenience and control with RemoteCondor by Igor Sfiligoi RemoteCondor co-developed with J. Dost UC San Diego Apr 2012 Remote Condor 1
  2. 2. The Condor Batch System ● Condor is a Workload Management System ● i.e. a batch system ● Strong points ● Fault tolerant ● Robust feature set ● Flexible ● Large community base ● Both commercial and scientific http://research.cs.wisc.edu/condor/ Apr 2012 Remote Condor 2
  3. 3. Condor Architecture ● Clearly separates Machines (aka worker nodes) CPUs, Memory, IO,... ● Resource providers from ● Resource consumers Job queues (aka submit nodes) ● Each has a daemon Jobs submitted by users process to represent it ● Startd for resource provides ● Schedd for resource consumers ● A central service connects them all ● Managed by a Collector/Negotiator pair Apr 2012 Remote Condor 3
  4. 4. Condor Architecture in a picture Schedd Startd . . Collector . . . Negotiator . Schedd Startd Apr 2012 Remote Condor 4
  5. 5. The truth about submit nodes ● Corollary ● The submit node is a server! ● There is no real “Condor client” ● The cmdline tools are just a convenience to talk to the daemon process Submit node Collector Negotiator Schedd Startd condor_submit condor_q Apr 2012 Remote Condor 5
  6. 6. Implications ● Being a server has several implications ● Security implications ● Will have incoming connectivity ● All security configuration on the submit node ● Submit node controls user authentication and authorization ● Unfriendly to non-dedicated hardware ● Requires always on operation ● Must be on a public&static IP address Apr 2012 Remote Condor 6
  7. 7. Implications ● Being a server has several implications ● Security implications High exploit risk ● Will have incoming connectivity ● All security configuration on the submit node ● Submit node controls user Requires high trust between all nodes authentication and authorization in the cluster ● Unfriendly to non-dedicated hardware ● Requires always on operation Impossible to use on a laptop ● Must be on a public&static IP address Apr 2012 Remote Condor 7
  8. 8. Implications ● Being a server has several implications ● Security implications High exploit risk ● Will have incoming connectivity ● All security configuration on the submit node ● Not suitable Requires high trust Submit node controls user for and authorization between cluster authentication an unmanaged in the all nodes ● user machine Unfriendly to non-dedicated hardware ● Requires always on operation Impossible to use on a laptop ● Must be on a public&static IP address Apr 2012 Remote Condor 8
  9. 9. What are the alternatives? ● Out of the box, Condor provides ● Remote submission ● Condor-C ● In the contrib sections, you can find ● RemoteCondor Apr 2012 Remote Condor 9
  10. 10. What are the alternatives? ● Out of the box, Condor provides ● Remote submission ● Condor-C ● In the contrib sections, you can find ● RemoteCondor This presentation argues that this is the best solution Apr 2012 Remote Condor 10
  11. 11. What are the alternatives? ● Out of the box, Condor provides ● Remote submission So what is wrong with these? ● Condor-C ● In the contrib sections, you can find ● RemoteCondor This presentation argues that this is the best solution Apr 2012 Remote Condor 11
  12. 12. Remote submission ● Essentially, connecting to a remote Schedd ● condor_submit -remote … + condor_transfer_data and ● condor_q -name ..., condor_rm -name ..., … ● So no daemon processes on the submit node ● A true client solution! Submit node Schedd node Collector Negotiator Auth Schedd Schedd condor_submit Startd condor_q condor_transfer_data http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.html Apr 2012 Remote Condor 12
  13. 13. So, what's the problem? ● No local user log file ● Annoying at best ● Must use ● High monitoring load condor_q ● And it does not work to monitor progress with DAGMan ● Fully Condor-based user authentication ● While rich, not what users expect (e.g. no user/password) ● Hard to tie into campus-wide auth ● Staged input data not shared Could be a problem with large datasets Apr 2012 Remote Condor 13
  14. 14. Condor-C ● Based on the Grid paradigm ● Submit locally, then delegate to remote Schedd ● Still running a daemon process ● Secure ● Laptop ● But requires no incoming connections friendly Submit node Schedd node Collector Negotiator Schedd Auth Schedd Schedd Startd condor_submit condor_q http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-C Apr 2012 Remote Condor 14
  15. 15. What are the drawbacks? ● Awkward syntax ● At least compared to Vanilla universe Can be mitigated with Job Router ● See the Condor manual for examples (but adds another layer of complexity) ● Has scalability problems ● Could likely be improved, but this is the current state-of-the-art ● Fully Condor-based user authentication ● Staged input data not shared Same as remote submissions Apr 2012 Remote Condor 15
  16. 16. Introducing RemoteCondor Apr 2012 Remote Condor 16
  17. 17. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there True client approach Apr 2012 Remote Condor 17
  18. 18. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there Advantages: No exceptions ● True local Condor experience ● Standard system ● Minimize security risk ● Central handling authentication and authorization ● Familiar to users ● No admin privileges for the users ● Trust based on “central” Schedd admin skills ● Can regulate and transform Condor submissions Apr 2012 Remote Condor 18
  19. 19. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there Advantages: No exceptions ● True local Condor experience Minimize security risk Big deal! ● ● Standard system Central handling ● authentication and authorization Familiar to users ● Where's the news? ● No admin privileges for the users ● Trust based on “central” Schedd admin skills ● Can regulate and transform Condor submissions Apr 2012 Remote Condor 19
  20. 20. What's the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there ● … while preserving the local look-and-feel ● RemoteCondor provides ● Wrappers around major Condor cmdline tools ● Integration with sshfs https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor Apr 2012 Remote Condor 20
  21. 21. RemoteCondor wrappers ● Provide wrappers that use ssh under the hood ● Users (almost) unaware of the trick ● But may be prompted for a password ● Works best with public key authentication Submit node Schedd node Collector Negotiator Auth sshd Schedd Startd condor_submit condor_submit condor_q condor_q Apr 2012 Remote Condor 21
  22. 22. RemoteCondor and sshfs ● But being able to talk to Condor is not enough ● Users must be able to create and read data! ● Using sshfs solves the problem ● Schedd-local disk mounted on submit node ● Using ssh as a tunnel Disk local to Schedd for maximum performance ● All in user space (FUSE) ● RemoteCondor will properly convert paths (within certain limits) http://fuse.sourceforge.net/sshfs.html Apr 2012 Remote Condor 22
  23. 23. RemoteCondor and sshfs ● But being able to talk to Condor is not enough ● Users must be able to create and read data! ● Using sshfs solves the problem ● Schedd-local disk mounted on submit node Submit node Schedd node Collector Negotiator Auth sshd Schedd Startd sshfs Real disk Apr 2012 Remote Condor 23
  24. 24. Using RemoteCondor ● Distributed in the Condor src tarball ● In the Contrib section ● Requires a “make install” ● To put the proper files in place ● Plus minimal configuration ● Where is the remote Schedd node? ● What username to use? ● Where to mount the sshfs partition? https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor Apr 2012 Remote Condor 24
  25. 25. Summary ● Traditional Condor not suitable for user machines ● Keeping Schedd nodes professionally maintained highly desirable ● To minimize security risks and control job flow ● RemoteCondor allows this operation mode while preserving the local look-and-feel ● Requires minimal local install Apr 2012 Remote Condor 25
  26. 26. Acknowledgements This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI) and PHY-0612805 (CMS Maintenance & Operations), and ● the US Department of Energy under Grant No. DE- FC02-06ER41436 subcontract No. 647F290 (OSG). Apr 2012 Remote Condor 26

×