Dave Probert, Ph.D. -  Windows Kernel Architect Microsoft Windows Division Copyright Microsoft Corporation
About Me Ph.D. in Computer Engineering  (Operating Systems w/o Kernels) Kernel Architect at Microsoft for over 13 years Managed platform-independent kernel development in Win2K/XP Working on multi-core & heterogeneous parallel computing support Architect for UMS in Windows 7 / Windows Server 2008 R2 Co-instigator of the Windows Academic Program Providing kernel source and curriculum materials to universities http://microsoft.com/WindowsAcademic  or  [email_address]   Wrote the Windows material for leading OS textbooks Tanenbaum, Silberschatz, Stallings Consulted on others, including a successful OS textbook in China
UNIX vs NT Design Environments Copyright Microsoft Corporation Environment which influenced  fundamental design decisions   UNIX  [1969] Windows (NT)  [1989] 16-bit program address space Kbytes of physical memory Swapping system with memory mapping Kbytes of disk, fixed disks Uniprocessor State-machine based I/O devices Standalone interactive systems  Small number of friendly users  32-bit program address space Mbytes of physical memory Virtual memory Mbytes of disk, removable disks Multiprocessor (4-way) Micro-controller based I/O devices Client/Server distributed computing  Large, diverse user populations
Effect on OS Design Copyright Microsoft Corporation NT vs UNIX Although both Windows and Linux have adapted to changes in the environment, the original design environments (i.e. in 1989 and 1969) heavily influenced the design choices: Unit of concurrency: Process creation: I/O: Namespace root: Security: Threads vs processes CreateProcess() vs fork() Async vs sync Virtual vs Filesystem ACLs vs uid/gid Addr space, uniproc Addr space, swapping Swapping, I/O devices Removable storage User populations
Today’s Environment  [2009] Copyright Microsoft Corporation 64-bit addresses GBytes of physical memory TBytes of rotational disk New Storage hierarchies (SSDs) Hypervisors, virtual processors Multi-core/Many-core Heterogeneous CPU architectures, Fixed function hardware High-speed internet/intranet, Web Services Media-rich applications Single user, but vulnerable to hackers worldwide Convergence:  Smartphone / Netbook / Laptop / Desktop / TV / Web / Cloud
Windows Architecture hardware interfaces (buses, I/O devices, interrupts,  interval timers, DMA, memory cache control, etc., etc.) System Service Dispatcher Object Mgr. Windows USER, GDI File  System  Cache I/O Mgr Environment  Subsystems System Processes Services Applications System Threads NTDLL.DLL Device & File Sys. Drivers Session  Manager POSIX Windows DLLs Plug and Play Mgr. Power Mgr. Security Reference Monitor Virtual Memory Processes & Threads Local Procedure Call Graphics Drivers Kernel Hardware Abstraction Layer (HAL) (kernel mode callable interfaces) Configura- tion Mgr (registry) OS/2 Windows Copyright Microsoft Corporation Task Manager Explorer SvcHost.Exe WinMgt.Exe SpoolSv.Exe Service Control Mgr. LSASS User Application Subsystem DLLs User Mode Kernel Mode WinLogon Services.Exe
Kernel-mode Architecture of Windows Copyright Microsoft Corporation NT API stubs (wrap sysenter)  --  system library (ntdll.dll) user mode kernel mode NTOS executive layer Trap/Exception/Interrupt Dispatch CPU mgmt:  scheduling, synchr, ISRs/DPCs/APCs Drivers Devices, Filters, Volumes, Networking, Graphics Hardware Abstraction Layer (HAL):  BIOS/chipset details firmware/ hardware CPU, MMU, APIC, BIOS/ACPI, memory, devices NTOS kernel layer Caching Mgr Security Procs/Threads Virtual Memory IPC glue I/O Object Mgr Registry Copyright Microsoft Corporation
Kernel/Executive layers Kernel layer –  ntos/ke  –  ~ 5% of NTOS source) Abstracts the CPU Threads, Asynchronous Procedure Calls (APCs) Interrupt Service Routines (ISRs) Deferred Procedure Calls (DPCs – aka Software Interrupts) Providers low-level synchronization Executive layer  OS Services running in a multithreaded environment Full virtual memory, heap, handles Extensions to NTOS: drivers, file systems, network, … Copyright Microsoft Corporation
NT (Native) API examples NtCreateProcess  (&ProcHandle, Access,  SectionHandle , DebugPort, ExceptionPort, …) NtCreateThread  (&ThreadHandle,  ProcHandle , Access, ThreadContext, bCreateSuspended, …) NtAllocateVirtualMemory  ( ProcHandle , Addr, Size, Type, Protection, …) NtMapViewOfSection  (SectionHandle,  ProcHandle , Addr, Size, Protection, …) NtReadVirtualMemory  ( ProcHandle , Addr, Size, …) NtDuplicateObject  ( srcProcHandle , srcObjHandle,  dstProcHandle , dstHandle, Access, Attributes, Options) Copyright Microsoft Corporation
Windows Vista Kernel Changes Kernel changes mostly minor improvements Algorithms, scalability, code maintainability CPU timing: Uses Time Stamp Counter (TSC) Interrupts not charged to threads Timing and quanta are more accurate Communication ALPC: Advanced Lightweight Procedure Calls Kernel-mode RPC New TCP/IP stack (integrated IPv4 and IPv6) I/O Remove a context switch from I/O Completion Ports I/O cancellation improvements Memory management Address space randomization (DLLs, stacks) Kernel address space dynamically configured Security: BitLocker, DRM, UAC, Integrity Levels Copyright Microsoft Corporation
Windows 7 Kernel Changes Miscellaneous kernel changes MinWin Change how Windows is built Lots of DLL refactoring API Sets (virtual DLLs) Working-set  management Runaway processes quickly start reusing own pages Break up kernel working-set into multiple working-sets System cache, paged pool, pageable system code Security Better UAC, new account types, less BitLocker blockers Energy efficiency Trigger-started background services  Core Parking Timer-coalescing, tick skipping Major scalability improvements for large server apps Broke apart last two major kernel locks,  >64p Kernel support for ConcRT User-Mode Scheduling (UMS) Copyright Microsoft Corporation
MinWin MinWin is first step at creating architectural partitions Can be built, booted and tested separately from the rest of the system Higher layers can evolve independently An engineering process improvement, not a microkernel NT! MinWin was defined as set of components required to boot and access network Kernel, file system driver, TCP/IP stack, device drivers, services No servicing, WMI, graphics, audio or shell, etc, etc, etc MinWin footprint:  150 binaries, 25MB on disk, 40MB in-memory
MinWin Layering Shell, Graphics, Multimedia, Layered Services, Applets,  Etc. Kernel,  HAL, TCP/IP, File Systems, Drivers, Core System Services MinWin
Timer Coalescing Secret of energy efficiency:  Go idle and Stay idle Staying idle requires minimizing timer interrupts Before, periodic timers had independent cycles even when period was the same New timer APIs permit timer coalescing Application or driver specifies tolerable delay Timer system shifts timer firing MarkRuss
Broke apart the Dispatcher Lock Scheduler Dispatcher lock hottest on server workloads Lock protects all thread state changes (wait, unwait)  Very  lock at >64x  Dispatcher lock broken up in Windows 7 / Server 2008 R2 Each object protected by its own lock Many operations are lock-free Copyright Microsoft Corporation
Removed PFN Lock Windows tracks the state of pages in physical memory In use: in working sets: Not assigned: on paging lists: freemodified, standby, … Before, all page state changes protected by global PFN (Physical Frame Number) lock As of Windows 7 the PFN lock is gone Pages are now locked individually Improves scalability for large memory applications Copyright Microsoft Corporation
The Silicon Power Wall The situation: Power 2   ∝  Clock frequency Voltage  ∝  Power 2 Clock frequency and Voltage offset each other Clock frequency inversely proportional to logic path length Bad News: Power is about as low as it can go Logic paths between clocked elements are pretty short Good News: Moore’s Law continues (# transistors doubles ~22 months) All that parallel computational theory is going into practice Transistors going into more cores, not faster cores! Software subject to Amdahl’s Law, not Moore’s Law (or Gustafson’s Law  –  if my wife can find large enough datasets she cares about)
Approaches to HW parallelism Homogeneous More big superscalar cores Extend with private (or shared) SIMD engines (SSE on steroids) ( Maybe) not very energy efficient  A few more big, cores and lots of smaller, slower, cooler cores Use SIMD for performance Shutoff idle small cores for energy efficiency (but leakage?) Lots of little fully programmable cores, all the same Nobody has ever gotten this to work  – more on this later Heterogeneous Programmable Accelerators (e.g. GPUs) Attach loosely-coupled, specialized (non-x86),  energy-efficient cores Fixed-function Accelerators Very energy-efficient, device-like computational units for very-specific tasks
User Mode Scheduling (UMS)  Improve support for efficient cooperative multithreaded scheduling of small tasks (over-decomposition) Want to schedule tasks in user-mode Use NT threads to simulate CPUs, multiplex tasks onto these threads When a task calls into the kernel and blocks, the CPU may get scheduled to a different app If a single NT thread per CPU, when it blocks it blocks. Could have extra threads, but then kernel and user-mode are competing to schedule the CPU Tasks run arbitrary Win32 code (but only x64/IA64) Assumes running on an NT thread (TEB, kernel thread) Used by ConcRT (Visual Studio 2010’s Concurrency Run-Time) Copyright Microsoft Corporation
Windows 7 User-Mode Scheduling UMS breaks NT thread into two parts: UT: user-mode portion (TEB, ustack, registers ) KT: kernel-mode portion (ETHREAD, kstack, registers) Three key properties: User-mode scheduler switches UTs w/o ring crossing KT switch is lazy: at kernel entry (e.g. syscall, pagefault) CPU returned to user-mode scheduler when KT blocks KT “returns” to user-mode by queuing completion User-mode scheduler schedules corresponding UT (similar to scheduler activations, etc) Copyright Microsoft Corporation
Normal NT Threading kernel user KT 0 KT 1 KT 2 UT 2 UT 1 UT 0 Kernel-mode Scheduler NTOS executive trap code NT Thread is Kernel Thread (KT) and User Thread (UT) UT/KT form a single logical thread representing NT thread in user or kernel KT:  ETHREAD, KSTACK, link to EPROCESS UT:  TEB, USTACK x86 core Copyright Microsoft Corporation
User-Mode Scheduling (UMS) kernel user Thread Parking KT 0 KT 1 KT 2 UT Completion list Primary Thread UT 0 UT 1 UT 0 User-mode Scheduler trap code NTOS executive KT 0  blocks Only primary thread runs in user-mode Trap code switches to parked KT KT blocks    primary returns to user-mode KT unblocks & parks    queue UT completion Copyright Microsoft Corporation
UMS Based on NT threads Each NT thread has user & kernel parts (UT & KT) When a thread becomes UMS, KT never returns to UT (Well, sort of) Instead, the  primary  thread calls the USched USched Switches between UTs, all in user-mode When a UT enters kernel and blocks, the primary thread will hand CPU back to the USched declaring UT blocked When UT unblocks, kernel queues notification USched consumes notifications, marks UT runnable Primary Thread Self-identified by entering kernel with wrong TEB So UTs can migrate between threads Affinities of primaries and KTs are orthogonal issues Copyright Microsoft Corporation
UMS Thread Roles Primary threads:   represent CPUs, normal app threads enter the USched world and become primaries, primaries also can be created by UScheds to allow parallel execution Primaries represent concurrent execution UMS threads (UT/KTs):   allow blocking in the kernel without losing the CPU UMS thread represent concurrent blocking in kernel Copyright Microsoft Corporation
Thread Scheduling vs UMS Core 2 Core 1 Core 2 Core 1 MarkRuss Thread 3 Non-running threads Thread 4 Thread 5 Thread 1 Thread 2 Thread 6 User Thread 2 Kernel Thread 2 User Thread 1 Kernel Thread 1 User Thread 3 Kernel Thread 3 User Thread 4 Kernel Thread 4 User Thread 5 Kernel Thread 5 User Thread 6 Kernel Thread 6
Win32 compat considerations Why not Win32 fibers? TEB issues Contains TLS and Win32-specific fields (incl LastError) Fibers run on multiple threads, so TEB state doesn’t track Kernel thread issues Visibility to TEB I/O is queued to thread Mutexes record thread owner Impersonation Cross-thread operations expect to find threads and IDs Win32 code has thread and affinity awareness Copyright Microsoft Corporation
Futures: Master/Slave UMS? remote kernel Remote x86 Thread Parking KT 0 KT 1 KT 2 UT 2 UT 1 Remote Scheduler trap code NTOS executive Kernel-mode Scheduler Syscall Completion Queue Syscall Request Queue UT 0 x86 core UTs (can) run on accelerators or x86s KTs run on x86s, syscalls remoted/batched Pagefaults are just like syscalls Accelerator never “loses the CPU” (implicit primary) Copyright Microsoft Corporation
Operating Systems Futures Many-core challenge New driving force in software innovation: Amdahl’s  Law overtakes Moore’s Law as high-order bit Heterogeneous cores? OS Scalability Loosely –coupled OS:  mem + cpu + services? Energy efficiency Shrink-wrap and Freeze-dry applications? Hypervisor/Kernel/Runtime relationships Move kernel scheduling (cpu/memory) into run-times? Move kernel resource management into Hypervisor? Copyright Microsoft Corporation
Windows Academic Program Windows Kernel Internals Windows kernel in source (Windows Research Kernel – WRK) Windows kernel in PowerPoint (Curriculum Resource Kit – CRK) Based on Windows Server 2008 Service Pack 1 Latest kernel at time of release First kernel release with AMD64 support Joint program between Windows Product Group and MS Academic Groups Program directed by Arkady Retik  (Need a DVD? Have questions?) Information available at http://microsoft.com/WindowsAcademic  OR [email_address]   Microsoft Academic Contacts in Buenos Aires Miguel Saez ( [email_address] ) or  Ezequiel Glinsky ( [email_address] ) Copyright Microsoft Corporation
muchas gracias

Evolution of the Windows Kernel Architecture, by Dave Probert

  • 1.
    Dave Probert, Ph.D.- Windows Kernel Architect Microsoft Windows Division Copyright Microsoft Corporation
  • 2.
    About Me Ph.D.in Computer Engineering (Operating Systems w/o Kernels) Kernel Architect at Microsoft for over 13 years Managed platform-independent kernel development in Win2K/XP Working on multi-core & heterogeneous parallel computing support Architect for UMS in Windows 7 / Windows Server 2008 R2 Co-instigator of the Windows Academic Program Providing kernel source and curriculum materials to universities http://microsoft.com/WindowsAcademic or [email_address] Wrote the Windows material for leading OS textbooks Tanenbaum, Silberschatz, Stallings Consulted on others, including a successful OS textbook in China
  • 3.
    UNIX vs NTDesign Environments Copyright Microsoft Corporation Environment which influenced fundamental design decisions UNIX [1969] Windows (NT) [1989] 16-bit program address space Kbytes of physical memory Swapping system with memory mapping Kbytes of disk, fixed disks Uniprocessor State-machine based I/O devices Standalone interactive systems Small number of friendly users 32-bit program address space Mbytes of physical memory Virtual memory Mbytes of disk, removable disks Multiprocessor (4-way) Micro-controller based I/O devices Client/Server distributed computing Large, diverse user populations
  • 4.
    Effect on OSDesign Copyright Microsoft Corporation NT vs UNIX Although both Windows and Linux have adapted to changes in the environment, the original design environments (i.e. in 1989 and 1969) heavily influenced the design choices: Unit of concurrency: Process creation: I/O: Namespace root: Security: Threads vs processes CreateProcess() vs fork() Async vs sync Virtual vs Filesystem ACLs vs uid/gid Addr space, uniproc Addr space, swapping Swapping, I/O devices Removable storage User populations
  • 5.
    Today’s Environment [2009] Copyright Microsoft Corporation 64-bit addresses GBytes of physical memory TBytes of rotational disk New Storage hierarchies (SSDs) Hypervisors, virtual processors Multi-core/Many-core Heterogeneous CPU architectures, Fixed function hardware High-speed internet/intranet, Web Services Media-rich applications Single user, but vulnerable to hackers worldwide Convergence: Smartphone / Netbook / Laptop / Desktop / TV / Web / Cloud
  • 6.
    Windows Architecture hardwareinterfaces (buses, I/O devices, interrupts, interval timers, DMA, memory cache control, etc., etc.) System Service Dispatcher Object Mgr. Windows USER, GDI File System Cache I/O Mgr Environment Subsystems System Processes Services Applications System Threads NTDLL.DLL Device & File Sys. Drivers Session Manager POSIX Windows DLLs Plug and Play Mgr. Power Mgr. Security Reference Monitor Virtual Memory Processes & Threads Local Procedure Call Graphics Drivers Kernel Hardware Abstraction Layer (HAL) (kernel mode callable interfaces) Configura- tion Mgr (registry) OS/2 Windows Copyright Microsoft Corporation Task Manager Explorer SvcHost.Exe WinMgt.Exe SpoolSv.Exe Service Control Mgr. LSASS User Application Subsystem DLLs User Mode Kernel Mode WinLogon Services.Exe
  • 7.
    Kernel-mode Architecture ofWindows Copyright Microsoft Corporation NT API stubs (wrap sysenter) -- system library (ntdll.dll) user mode kernel mode NTOS executive layer Trap/Exception/Interrupt Dispatch CPU mgmt: scheduling, synchr, ISRs/DPCs/APCs Drivers Devices, Filters, Volumes, Networking, Graphics Hardware Abstraction Layer (HAL): BIOS/chipset details firmware/ hardware CPU, MMU, APIC, BIOS/ACPI, memory, devices NTOS kernel layer Caching Mgr Security Procs/Threads Virtual Memory IPC glue I/O Object Mgr Registry Copyright Microsoft Corporation
  • 8.
    Kernel/Executive layers Kernellayer – ntos/ke – ~ 5% of NTOS source) Abstracts the CPU Threads, Asynchronous Procedure Calls (APCs) Interrupt Service Routines (ISRs) Deferred Procedure Calls (DPCs – aka Software Interrupts) Providers low-level synchronization Executive layer OS Services running in a multithreaded environment Full virtual memory, heap, handles Extensions to NTOS: drivers, file systems, network, … Copyright Microsoft Corporation
  • 9.
    NT (Native) APIexamples NtCreateProcess (&ProcHandle, Access, SectionHandle , DebugPort, ExceptionPort, …) NtCreateThread (&ThreadHandle, ProcHandle , Access, ThreadContext, bCreateSuspended, …) NtAllocateVirtualMemory ( ProcHandle , Addr, Size, Type, Protection, …) NtMapViewOfSection (SectionHandle, ProcHandle , Addr, Size, Protection, …) NtReadVirtualMemory ( ProcHandle , Addr, Size, …) NtDuplicateObject ( srcProcHandle , srcObjHandle, dstProcHandle , dstHandle, Access, Attributes, Options) Copyright Microsoft Corporation
  • 10.
    Windows Vista KernelChanges Kernel changes mostly minor improvements Algorithms, scalability, code maintainability CPU timing: Uses Time Stamp Counter (TSC) Interrupts not charged to threads Timing and quanta are more accurate Communication ALPC: Advanced Lightweight Procedure Calls Kernel-mode RPC New TCP/IP stack (integrated IPv4 and IPv6) I/O Remove a context switch from I/O Completion Ports I/O cancellation improvements Memory management Address space randomization (DLLs, stacks) Kernel address space dynamically configured Security: BitLocker, DRM, UAC, Integrity Levels Copyright Microsoft Corporation
  • 11.
    Windows 7 KernelChanges Miscellaneous kernel changes MinWin Change how Windows is built Lots of DLL refactoring API Sets (virtual DLLs) Working-set management Runaway processes quickly start reusing own pages Break up kernel working-set into multiple working-sets System cache, paged pool, pageable system code Security Better UAC, new account types, less BitLocker blockers Energy efficiency Trigger-started background services Core Parking Timer-coalescing, tick skipping Major scalability improvements for large server apps Broke apart last two major kernel locks, >64p Kernel support for ConcRT User-Mode Scheduling (UMS) Copyright Microsoft Corporation
  • 12.
    MinWin MinWin isfirst step at creating architectural partitions Can be built, booted and tested separately from the rest of the system Higher layers can evolve independently An engineering process improvement, not a microkernel NT! MinWin was defined as set of components required to boot and access network Kernel, file system driver, TCP/IP stack, device drivers, services No servicing, WMI, graphics, audio or shell, etc, etc, etc MinWin footprint: 150 binaries, 25MB on disk, 40MB in-memory
  • 13.
    MinWin Layering Shell,Graphics, Multimedia, Layered Services, Applets, Etc. Kernel, HAL, TCP/IP, File Systems, Drivers, Core System Services MinWin
  • 14.
    Timer Coalescing Secretof energy efficiency: Go idle and Stay idle Staying idle requires minimizing timer interrupts Before, periodic timers had independent cycles even when period was the same New timer APIs permit timer coalescing Application or driver specifies tolerable delay Timer system shifts timer firing MarkRuss
  • 15.
    Broke apart theDispatcher Lock Scheduler Dispatcher lock hottest on server workloads Lock protects all thread state changes (wait, unwait) Very lock at >64x Dispatcher lock broken up in Windows 7 / Server 2008 R2 Each object protected by its own lock Many operations are lock-free Copyright Microsoft Corporation
  • 16.
    Removed PFN LockWindows tracks the state of pages in physical memory In use: in working sets: Not assigned: on paging lists: freemodified, standby, … Before, all page state changes protected by global PFN (Physical Frame Number) lock As of Windows 7 the PFN lock is gone Pages are now locked individually Improves scalability for large memory applications Copyright Microsoft Corporation
  • 17.
    The Silicon PowerWall The situation: Power 2 ∝ Clock frequency Voltage ∝ Power 2 Clock frequency and Voltage offset each other Clock frequency inversely proportional to logic path length Bad News: Power is about as low as it can go Logic paths between clocked elements are pretty short Good News: Moore’s Law continues (# transistors doubles ~22 months) All that parallel computational theory is going into practice Transistors going into more cores, not faster cores! Software subject to Amdahl’s Law, not Moore’s Law (or Gustafson’s Law – if my wife can find large enough datasets she cares about)
  • 18.
    Approaches to HWparallelism Homogeneous More big superscalar cores Extend with private (or shared) SIMD engines (SSE on steroids) ( Maybe) not very energy efficient A few more big, cores and lots of smaller, slower, cooler cores Use SIMD for performance Shutoff idle small cores for energy efficiency (but leakage?) Lots of little fully programmable cores, all the same Nobody has ever gotten this to work – more on this later Heterogeneous Programmable Accelerators (e.g. GPUs) Attach loosely-coupled, specialized (non-x86), energy-efficient cores Fixed-function Accelerators Very energy-efficient, device-like computational units for very-specific tasks
  • 19.
    User Mode Scheduling(UMS) Improve support for efficient cooperative multithreaded scheduling of small tasks (over-decomposition) Want to schedule tasks in user-mode Use NT threads to simulate CPUs, multiplex tasks onto these threads When a task calls into the kernel and blocks, the CPU may get scheduled to a different app If a single NT thread per CPU, when it blocks it blocks. Could have extra threads, but then kernel and user-mode are competing to schedule the CPU Tasks run arbitrary Win32 code (but only x64/IA64) Assumes running on an NT thread (TEB, kernel thread) Used by ConcRT (Visual Studio 2010’s Concurrency Run-Time) Copyright Microsoft Corporation
  • 20.
    Windows 7 User-ModeScheduling UMS breaks NT thread into two parts: UT: user-mode portion (TEB, ustack, registers ) KT: kernel-mode portion (ETHREAD, kstack, registers) Three key properties: User-mode scheduler switches UTs w/o ring crossing KT switch is lazy: at kernel entry (e.g. syscall, pagefault) CPU returned to user-mode scheduler when KT blocks KT “returns” to user-mode by queuing completion User-mode scheduler schedules corresponding UT (similar to scheduler activations, etc) Copyright Microsoft Corporation
  • 21.
    Normal NT Threadingkernel user KT 0 KT 1 KT 2 UT 2 UT 1 UT 0 Kernel-mode Scheduler NTOS executive trap code NT Thread is Kernel Thread (KT) and User Thread (UT) UT/KT form a single logical thread representing NT thread in user or kernel KT: ETHREAD, KSTACK, link to EPROCESS UT: TEB, USTACK x86 core Copyright Microsoft Corporation
  • 22.
    User-Mode Scheduling (UMS)kernel user Thread Parking KT 0 KT 1 KT 2 UT Completion list Primary Thread UT 0 UT 1 UT 0 User-mode Scheduler trap code NTOS executive KT 0 blocks Only primary thread runs in user-mode Trap code switches to parked KT KT blocks  primary returns to user-mode KT unblocks & parks  queue UT completion Copyright Microsoft Corporation
  • 23.
    UMS Based onNT threads Each NT thread has user & kernel parts (UT & KT) When a thread becomes UMS, KT never returns to UT (Well, sort of) Instead, the primary thread calls the USched USched Switches between UTs, all in user-mode When a UT enters kernel and blocks, the primary thread will hand CPU back to the USched declaring UT blocked When UT unblocks, kernel queues notification USched consumes notifications, marks UT runnable Primary Thread Self-identified by entering kernel with wrong TEB So UTs can migrate between threads Affinities of primaries and KTs are orthogonal issues Copyright Microsoft Corporation
  • 24.
    UMS Thread RolesPrimary threads: represent CPUs, normal app threads enter the USched world and become primaries, primaries also can be created by UScheds to allow parallel execution Primaries represent concurrent execution UMS threads (UT/KTs): allow blocking in the kernel without losing the CPU UMS thread represent concurrent blocking in kernel Copyright Microsoft Corporation
  • 25.
    Thread Scheduling vsUMS Core 2 Core 1 Core 2 Core 1 MarkRuss Thread 3 Non-running threads Thread 4 Thread 5 Thread 1 Thread 2 Thread 6 User Thread 2 Kernel Thread 2 User Thread 1 Kernel Thread 1 User Thread 3 Kernel Thread 3 User Thread 4 Kernel Thread 4 User Thread 5 Kernel Thread 5 User Thread 6 Kernel Thread 6
  • 26.
    Win32 compat considerationsWhy not Win32 fibers? TEB issues Contains TLS and Win32-specific fields (incl LastError) Fibers run on multiple threads, so TEB state doesn’t track Kernel thread issues Visibility to TEB I/O is queued to thread Mutexes record thread owner Impersonation Cross-thread operations expect to find threads and IDs Win32 code has thread and affinity awareness Copyright Microsoft Corporation
  • 27.
    Futures: Master/Slave UMS?remote kernel Remote x86 Thread Parking KT 0 KT 1 KT 2 UT 2 UT 1 Remote Scheduler trap code NTOS executive Kernel-mode Scheduler Syscall Completion Queue Syscall Request Queue UT 0 x86 core UTs (can) run on accelerators or x86s KTs run on x86s, syscalls remoted/batched Pagefaults are just like syscalls Accelerator never “loses the CPU” (implicit primary) Copyright Microsoft Corporation
  • 28.
    Operating Systems FuturesMany-core challenge New driving force in software innovation: Amdahl’s Law overtakes Moore’s Law as high-order bit Heterogeneous cores? OS Scalability Loosely –coupled OS: mem + cpu + services? Energy efficiency Shrink-wrap and Freeze-dry applications? Hypervisor/Kernel/Runtime relationships Move kernel scheduling (cpu/memory) into run-times? Move kernel resource management into Hypervisor? Copyright Microsoft Corporation
  • 29.
    Windows Academic ProgramWindows Kernel Internals Windows kernel in source (Windows Research Kernel – WRK) Windows kernel in PowerPoint (Curriculum Resource Kit – CRK) Based on Windows Server 2008 Service Pack 1 Latest kernel at time of release First kernel release with AMD64 support Joint program between Windows Product Group and MS Academic Groups Program directed by Arkady Retik (Need a DVD? Have questions?) Information available at http://microsoft.com/WindowsAcademic OR [email_address] Microsoft Academic Contacts in Buenos Aires Miguel Saez ( [email_address] ) or Ezequiel Glinsky ( [email_address] ) Copyright Microsoft Corporation
  • 30.