Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DTrace in the Non-global Zone


Published on

My presentation at the BayLISA SmartOS meetup on August 16th, 2012. More details at

Published in: Technology, Business
  • Login to see the comments

DTrace in the Non-global Zone

  1. 1. DTrace in theNon-global ZoneBryan CantrillSVP Engineering,
  2. 2. DTrace and zones: Fraternal twins • DTrace and zones were developed in parallel during development of Solaris 10 • DTrace integrated (September 2003) before zones (early 2004) • When zones integrated, the priority was making DTrace in the global zone be able to meaningfully instrument non-global zones • DTrace in the non-global zone was hard — and a lower priority than other work on both technologies
  3. 3. DTrace and zones: Basic functionality • In 2006, Dan Price (with help from Adam Leventhal and Jonathan Adams) added initial support for DTrace in the non-global zone • Allowed use of syscall provider, pid provider and (in a deranged, broken way) the profile provider • This was significant work: required modifications to both the zones privilege model and the DTrace privilege model • For example, required an implicit predicate on syscall and profile probes
  4. 4. DTrace and zones in SmartOS • As the worldʼs heaviest user of zones, we at Joyent ran into (and fixed) a number of annoying bugs: • USDT probes from the non-global were not properly being enabled in the global zone (illumos#908) • Tick and profile probes did not properly fire when used in the non-global zone (illumos#1456) • Fixing the latter required an extension of the DTrace privilege model: introduced a notion of restricted operation in which args could not be referenced
  5. 5. DTrace and zones in SmartOS • Other (very) annoying issues still lurked: • Inability to read “cpu” in the non-global zone • Inability to read any fields from “curlwpsinfo” and “curpsinfo”— especially “pr_dmodel” • Inability to read the “fds[]” array • Failure mode highly obnoxious: [my-non-global-zone ~]# dtrace -n BEGIN{trace(curpsinfo->pr_psargs)} dtrace: description BEGIN matched 1 probe dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid kernel access in action #1 at DIF offset 44
  6. 6. Divide and conquer • curlwpsinfo and curpsinfo both are translators over the current thread (“kthread_t”) and current process (“proc_t”) • Importantly, the state contained in oneʼs own kthread_t and proc_t: • Is safe to read while executing (threads cannot disappear out from under themselves) • Does not represent potential privilege escalation • This can be fixed by simply allowing the loads where one has privileges to the current process!
  7. 7. fds[]: A magic bullet? • Somehow, I convinced myself that the problem with fds[] was the translator that translates the member accesses into kernel accesses: inline fileinfo_t fds[int fd] = xlate ( fd >= 0 && fd < t_procp->p_user.u_finfo.fi_nfiles ? curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL); • If the problem was the static translators, the solution must be dynamic translators — a(n in)famously unimplemented feature of DTrace! • After dtrace.conf(12), I realized that the expression was orthogonal to the fact that the in-kernel implementation must not allow privilege escalation
  8. 8. fds[]: No magic bullets • Focussing on the implementation, allows one to consider the specifics of the fds[] case • Helped by the fact that the fi_list implementation uses memory retiring for scalability of file descriptor lookups: the array is only freed upon process exit • Assures that oneʼs own fi_list is always pointing to memory that is (or was) an array of uf_entry_t • Leaves the file_t itself, which can be freed during probe context (specifically, by another thread in the same process)
  9. 9. Dealing with file_t • We can deal with this by forcing everyone out of probe context after a file_t has been removed from the uf_entry_t, but before being freed • This is done by issuing a dtrace_sync() — a synchronous (empty) cross-call to all CPUs • This is expensive, and required answering an important question: just how hot is the closef() path, anyway? • By instrumenting our guinea pigs production cloud, we could answer this concisely: closef() is pretty damned hot (> 5,000/second on some machines!)
  10. 10. Adding getf() • To track when fds[] was active in the non-global zone, we added a getf() subroutine (ht: ken) • Allows us to issue the sync only when we have a closef() from a non-global zone using fds[] • Had to take the final step of cleaning up the path output to strip off the zone path from the file name (as a cleanliness issue, not a security issue) • De-mo, de-mo, de-mo!
  11. 11. sched and proc providers • With fds[] done, focus turned the only meaningful impediment to DTrace in the non-global zone: enabling the sched and proc providers • Recall the restricted operation introduced for the profile provider in the non-global zone... • Used this to have limited (non-global) DTrace privileges imply restricted operation for some SDT providers • Thanks to the curlwpsinfo/curpsinfo work, these providers can be meaningfully used without access to arguments