Wedding convenience and control with RemoteCondor

6,069 views
5,981 views

Published on

This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
6,069
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Wedding convenience and control with RemoteCondor

  1. 1. UCSD HEP Group Trainings Wedding convenience and control with RemoteCondor by Igor Sfiligoi RemoteCondor co-developed with J. Dost UC San DiegoApr 2012 Remote Condor 1
  2. 2. The Condor Batch System ● Condor is a Workload Management System ● i.e. a batch system ● Strong points ● Fault tolerant ● Robust feature set ● Flexible ● Large community base ● Both commercial and scientific http://research.cs.wisc.edu/condor/Apr 2012 Remote Condor 2
  3. 3. Condor Architecture ● Clearly separates Machines (aka worker nodes) CPUs, Memory, IO,... ● Resource providers from ● Resource consumers Job queues (aka submit nodes) ● Each has a daemon Jobs submitted by users process to represent it ● Startd for resource provides ● Schedd for resource consumers ● A central service connects them all ● Managed by a Collector/Negotiator pairApr 2012 Remote Condor 3
  4. 4. Condor Architecture in a picture Schedd Startd . . Collector . . . Negotiator . Schedd StartdApr 2012 Remote Condor 4
  5. 5. The truth about submit nodes ● Corollary ● The submit node is a server! ● There is no real “Condor client” ● The cmdline tools are just a convenience to talk to the daemon process Submit node Collector Negotiator Schedd Startd condor_submit condor_qApr 2012 Remote Condor 5
  6. 6. Implications ● Being a server has several implications ● Security implications ● Will have incoming connectivity ● All security configuration on the submit node ● Submit node controls user authentication and authorization ● Unfriendly to non-dedicated hardware ● Requires always on operation ● Must be on a public&static IP addressApr 2012 Remote Condor 6
  7. 7. Implications ● Being a server has several implications ● Security implications High exploit risk ● Will have incoming connectivity ● All security configuration on the submit node ● Submit node controls user Requires high trust between all nodes authentication and authorization in the cluster ● Unfriendly to non-dedicated hardware ● Requires always on operation Impossible to use on a laptop ● Must be on a public&static IP addressApr 2012 Remote Condor 7
  8. 8. Implications ● Being a server has several implications ● Security implications High exploit risk ● Will have incoming connectivity ● All security configuration on the submit node ● Not suitable Requires high trust Submit node controls user for and authorization between cluster authentication an unmanaged in the all nodes ● user machine Unfriendly to non-dedicated hardware ● Requires always on operation Impossible to use on a laptop ● Must be on a public&static IP addressApr 2012 Remote Condor 8
  9. 9. What are the alternatives? ● Out of the box, Condor provides ● Remote submission ● Condor-C ● In the contrib sections, you can find ● RemoteCondorApr 2012 Remote Condor 9
  10. 10. What are the alternatives? ● Out of the box, Condor provides ● Remote submission ● Condor-C ● In the contrib sections, you can find ● RemoteCondor This presentation argues that this is the best solutionApr 2012 Remote Condor 10
  11. 11. What are the alternatives? ● Out of the box, Condor provides ● Remote submission So what is wrong with these? ● Condor-C ● In the contrib sections, you can find ● RemoteCondor This presentation argues that this is the best solutionApr 2012 Remote Condor 11
  12. 12. Remote submission ● Essentially, connecting to a remote Schedd ● condor_submit -remote … + condor_transfer_data and ● condor_q -name ..., condor_rm -name ..., … ● So no daemon processes on the submit node ● A true client solution! Submit node Schedd node Collector Negotiator Auth Schedd Schedd condor_submit Startd condor_q condor_transfer_data http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.htmlApr 2012 Remote Condor 12
  13. 13. So, whats the problem? ● No local user log file ● Annoying at best ● Must use ● High monitoring load condor_q ● And it does not work to monitor progress with DAGMan ● Fully Condor-based user authentication ● While rich, not what users expect (e.g. no user/password) ● Hard to tie into campus-wide auth ● Staged input data not shared Could be a problem with large datasetsApr 2012 Remote Condor 13
  14. 14. Condor-C ● Based on the Grid paradigm ● Submit locally, then delegate to remote Schedd ● Still running a daemon process ● Secure ● Laptop ● But requires no incoming connections friendly Submit node Schedd node Collector Negotiator Schedd Auth Schedd Schedd Startd condor_submit condor_q http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-CApr 2012 Remote Condor 14
  15. 15. What are the drawbacks? ● Awkward syntax ● At least compared to Vanilla universe Can be mitigated with Job Router ● See the Condor manual for examples (but adds another layer of complexity) ● Has scalability problems ● Could likely be improved, but this is the current state-of-the-art ● Fully Condor-based user authentication ● Staged input data not shared Same as remote submissionsApr 2012 Remote Condor 15
  16. 16. Introducing RemoteCondorApr 2012 Remote Condor 16
  17. 17. Whats the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there True client approachApr 2012 Remote Condor 17
  18. 18. Whats the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there Advantages: No exceptions ● True local Condor experience ● Standard system ● Minimize security risk ● Central handling authentication and authorization ● Familiar to users ● No admin privileges for the users ● Trust based on “central” Schedd admin skills ● Can regulate and transform Condor submissionsApr 2012 Remote Condor 18
  19. 19. Whats the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there Advantages: No exceptions ● True local Condor experience Minimize security risk Big deal! ● ● Standard system Central handling ● authentication and authorization Familiar to users ● Wheres the news? ● No admin privileges for the users ● Trust based on “central” Schedd admin skills ● Can regulate and transform Condor submissionsApr 2012 Remote Condor 19
  20. 20. Whats the big idea? ● Let the users login into a remote machine ● And run the cmdline tools there ● … while preserving the local look-and-feel ● RemoteCondor provides ● Wrappers around major Condor cmdline tools ● Integration with sshfs https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondorApr 2012 Remote Condor 20
  21. 21. RemoteCondor wrappers ● Provide wrappers that use ssh under the hood ● Users (almost) unaware of the trick ● But may be prompted for a password ● Works best with public key authentication Submit node Schedd node Collector Negotiator Auth sshd Schedd Startd condor_submit condor_submit condor_q condor_qApr 2012 Remote Condor 21
  22. 22. RemoteCondor and sshfs ● But being able to talk to Condor is not enough ● Users must be able to create and read data! ● Using sshfs solves the problem ● Schedd-local disk mounted on submit node ● Using ssh as a tunnel Disk local to Schedd for maximum performance ● All in user space (FUSE) ● RemoteCondor will properly convert paths (within certain limits) http://fuse.sourceforge.net/sshfs.htmlApr 2012 Remote Condor 22
  23. 23. RemoteCondor and sshfs ● But being able to talk to Condor is not enough ● Users must be able to create and read data! ● Using sshfs solves the problem ● Schedd-local disk mounted on submit node Submit node Schedd node Collector Negotiator Auth sshd Schedd Startd sshfs Real diskApr 2012 Remote Condor 23
  24. 24. Using RemoteCondor ● Distributed in the Condor src tarball ● In the Contrib section ● Requires a “make install” ● To put the proper files in place ● Plus minimal configuration ● Where is the remote Schedd node? ● What username to use? ● Where to mount the sshfs partition? https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondorApr 2012 Remote Condor 24
  25. 25. Summary● Traditional Condor not suitable for user machines● Keeping Schedd nodes professionally maintained highly desirable ● To minimize security risks and control job flow● RemoteCondor allows this operation mode while preserving the local look-and-feel ● Requires minimal local install Apr 2012 Remote Condor 25
  26. 26. Acknowledgements This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI) and PHY-0612805 (CMS Maintenance & Operations), and ● the US Department of Energy under Grant No. DE- FC02-06ER41436 subcontract No. 647F290 (OSG).Apr 2012 Remote Condor 26

×