Privilege Separation In Condor
Upcoming SlideShare
Loading in...5
×
 

Privilege Separation In Condor

on

  • 909 views

 

Statistics

Views

Total Views
909
Views on SlideShare
909
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Privilege Separation In Condor Privilege Separation In Condor Presentation Transcript

  • Privilege separation in Condor Bruce Beckles University of Cambridge Computing Service
  • What is privilege separation?
    • Isolation of those parts of the code that run at different privilege levels
    root Condor daemons Condor job
    • No privilege separation:
    root Condor daemons Condor job
    • Privilege separation:
  • The principle of least privilege
    • Code only ever runs at the minimum level of privilege required for its current task
    root Condor daemons
    • Violation of principle of least privilege:
    • Not in violation of principle of least privilege:
    root Condor daemons Condor asks root to perform privileged action Upon completion of action control returns to Condor
  • How do we do this?
    • User context switching:
    root context switching daemon Condor daemons
    • GNU userv allows one process to invoke another (in either the same or a different user context) in a secure fashion when only limited trust exists between them (see http:// www.gnu.org/software/userv / ).
    Condor job
  • Who does what?
    • Execute nodes:
      • Change ownership of any file ( CAP_CHOWN capability)
      • Switch to any user context ( CAP_SETUID capability)
      • Send signals to any process ( CAP_KILL capability)
    • Submit nodes:
      • Switch to any user context ( CAP_SETUID capability)
      • Send signals to any process ( CAP_KILL capability)
    • Central manager:
      • No privilege switching required (unless GSI authentication is being used, in which case need to be able to switch to any user context ( CAP_SETUID capability))
      • (Note that if you have the ability to switch to any user context you effectively have the ability to send signals to any process.)
  • Details – Central manager
    • Unless using GSI authentication, Condor daemons don’t need to run as root , so…
    • … don’t run them as root …(!)
    • If using GSI:
      • Flawed security paradigm, so…
      • …waste of time trying to make the system more “secure” until underlying paradigm is “fixed”…
  • Details – Execute node root uservd wrapper script signal handler Condor job condor_startd Condor cron job kill –KILL -1 condor_starter "Rogue" process stdout stderr Send signal to Condor job Change file ownership fork() Condor job Change file ownership
  • Details – Execute node
    • No Condor daemon or process runs as root
    • Condor passes job to our wrapper script
    • Our script installs a signal handler to intercept Condor’s signals to the job
    • Our script changes the ownership of the job’s files (via userv) as necessary at beginning and end of job
    • Our script fork() ’s a userv process which runs the job in a different (also non-privileged) user context
    • On receipt of a signal from Condor, our script calls userv to send the signal to the job
    • userv uses pipes to pass the job’s standard output and standard error back to our script (and so to Condor)
    • A cron job (using Condor’s STARTD_CRON mechanism) runs once a minute to make sure that if there is no job executing then there are no processes running in our dedicated “Condor job” user context
  • What does this gain us?
    • If there is a vulnerability in Condor then the entire machine is not compromised as a result.
    • We do not have any Condor processes, whose real user ID is root , listening to the network.
    • Much greater control over the job’s execution environment – could run the job chroot ’d if desired.
    • Our wrapper script can examine the job and make more sophisticated decisions about whether or not to run it (restrict to local executables, etc.)
    • We now have a “hook” for executing arbitrary tasks on job completion , a feature “ordinary” Condor lacks.
    • If a Condor job leaves behind any processes running after job completion they will be killed – normally, Condor only does this properly if specially configured (EXECUTE_LOGIN_IS_DEDICATED).
  • What do we lose?
    • Can no longer suspend jobs (cannot catch SIGSTOP)
    • We now need to handle passing the job environment variables, setting resource limits and scheduling priority, etc., which Condor would normally handle
    • Condor can no longer correctly distinguish between the load Condor processes and Condor jobs are putting on the machine, and the load generated by other processes
    • Information returned by Condor about the job’s CPU utilization, etc. is incorrect
    • Cannot work with GSI authentication
    • Does not yet work with Condor’s Standard universe (although this may not be difficult to fix)
    • Adds of the order of 5 seconds to job execution time
  • Details – Submit node
    • If not using “strong” authentication (Kerberos, etc.), can use a similar technique
    • If using strong authentication then this won’t work:
      • condor_shadow processes need to run as the submitting user, but…
      • … machine credentials need to not be accessible to ordinary users…
      • Damn!
    • So…
  • Fully privilege separated architecture (1) IPC e.g. Unix domain sockets root context switching daemon Condor daemon Network network listener daemon credential on filesystem chroot() dedicated condor user dedicated different user Request privileged services as needed
  • Fully privilege separated architecture (1) – details
    • Requires extensive changes to Condor code
    • No Condor daemon or process runs as root
    • User context switching daemon provides services to Condor components that would otherwise need a higher level of privilege
    • Long running Condor daemons that communicate with the network are privilege separated “OpenBSD-style”:
      • Split into two components that run in different user contexts: one that exclusively handles network communication, one that does everything else
      • Network component chroot() ’d to somewhere “safe”
      • Components communicate via some form of IPC mechanism (e.g. Unix domain sockets)
      • Allows authentication credentials to be protected but still used for authentication (network component acts as a relay between other component and the remote daemon)
  • Fully privilege separated architecture (2) IPC "network socket" root context switching daemon Condor daemon (long running) Network network listener daemon credential chroot() dedicated condor user dedicated different user Condor daemon (short running) IPC possibly different user context, e.g. user who submitted job e.g. condor_shadow
  • Fully privilege separated architecture (2) – details
    • For “short” running Condor daemons ( condor_starter , condor_shadow , etc):
      • Typically instantiated by a long running daemon
      • Parent daemon could set up secure network channel which child then inherits
      • This gives the child a session key
      • When session key is due to expire (or if re-negotiation of secure channel is required):
        • Use session key to “authenticate” to parent daemon
        • Now act as a relay for passing authentication messages between remote daemon and parent daemon to re-negotiate secure channel
  • Future work
    • Architecture described on the last two slides is just a proposal
    • Details may (will!) change as design evolves…
    • Some form of privilege separation should appear in the Condor 6.9 development series
  • Further Details
    • See:
    • Implementing privilege separation in the Condor ® system (2005):
    • http://www.allhands.org.uk/2005/proceedings/papers/568.pdf