Quickthreads for GNU-2013


Published on

Thread management library

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Quickthreads for GNU-2013

  1. 1. Quickthreads FROM STRANGER TO MOTHER
  2. 2. About me Name:Frankie Onuonga Contributions: Fedora and Opensuse Irc: frankieonuonga Email:onuonga@gnu.org
  3. 3. Introduction ● Programming is a tricky art trading between flexibility and overhead ● Constructs often trade of between portability and overhead ● We therefore program in a way where we can achieve high portability, flexibility and performance
  4. 4. Techniques ● So there are two main ways to build threaded packages: – Build threaded packages around a thread abstract data type that has a machine independent interface and only encapsulate the most machine depended operations :initialization and execution – Expose portable details of thread package implementation to allow higher level higher level thread operations to be optimized for thread operation
  5. 5. Lets dive and swim ● Lets go deep into the two and introduce quick threads along the way.
  6. 6. Basics ● Quickthreads is not a standalone threads package ● It is a threads package core that is used to build non-preemptive user space threads package ● It will provide and interface to machine dependent code that performs initialization and context switches ● It lets the client do the rest( implement synchronization and scheduling)
  7. 7. Goals ● Make it easy to write and port thread packages : through providing machine dependent code for creating. running and stopping threads, thus client only has to reconfigure and run it on different architecture allowing for high portability ● Make thread packages with replaceable components and with performance close to that of hand coded packages with fixed policies
  8. 8. ● But flexibility can lead to slowness but because it is minimalist then it is often close to the operation of hand coded operations ● In machines where there are many registers where restoring and saving them(registers) is a dominating factor then the benefits are seen ● In fact when experimenting with two thread packages no chance was noticed with the norm which for starting is a good thing.
  9. 9. Design Decisions ● The trick is to push sync to the clients and allow them to so that the thread core only sorts out context switching:” oh no we have given up some performance for flexibility “ ● “let us see if we can get it back”
  10. 10. Synchronization ● Threads avoid this by pushing all of the sync functions to the client ● The goal is to ensure that a thread does not prohibit or require thread lock on a context switch ● If this existed then we would have race conditions that are bad.
  11. 11. Scheduler Threads ● Each blocking operation returns control to a central scheduler thread which is one per processor ● The scheduler threads enqueue blocked threads only if they are completely blocked and their stacks frozen and thus will ensure no races ● The scheduler thread is never queued with other threads so no races during context switch to and from other scheduler threads
  12. 12. Disadvantages With ST ● each blocking operation requires a context switch to the scheduler then a second context switch to the next thread ● on machines without per-processor private memory it is difficult to locate per-processor schedulers cheaply and shared schedulers must be protected with locks
  13. 13. Trick to locking ● But we can avoid the extra context switch to the scheduler if the blocking thread locks the queue until the stack being used is no longer in. ● You can perform unlock in the middle of a context switch which would require hand coding or providing it carefully out of line (having a problem of longer lock times) ● If we rely on separate functions of lock and unlock this removes atomicity which is bad practice in this case.
  14. 14. Register States ● Stack sharing problems can also be avoided if all transitional states can exist in a machine register. ● This means that processor A can run old thread even though processor B that blocked it is starting a new runable thread
  15. 15. problem ● Space is limited to machine registers and thus limiting the kind of operations that can be performed E. G a stackless thread can not perform procedure calls and some architectures and operating systems do not a thread to be temporarily stackless.
  16. 16. Preswitch technique ● Another method is to block the old thread, switch to the new thread and then run some code on the new threads stack on behalf of the old thread. ● This is good but also has some limitations: – Operations must be transparent – The new thread must have a stack so lazy allocation is not possible here
  17. 17. Stateless Schedulers ● Another technique is to create a “lightweight” scheduler threads that consists of stacks space but initialized state. ● A context switch saves old thread then switches to the new scheduler stack , but no scheduler state is stored ● The scheduler task is used as a place to store a function of a thread that just got blocked ● It therefore follows when a new thread is started no scheduler state is saved ● This is faster than using “heavy” schedulers because no scheduler state is saved or called.
  18. 18. Slight Challenge ● It may be hard to locate schedulers cheaply on machines without per processor private memory ● Lightweight schedulers are slightly slow but good as they can perform function calls. ● Stateless scheduler is similar to storing all transitional state in registers in so far as scheduler state exists only during the context switch
  19. 19. Our choice ● Preswitch :- it is designed in a way to emulate all other models except storing transitional state in the registers ● Remember that locking is not part of a Quickthreads operative during context switch and thus threads must perform locking on the end of context switch
  20. 20. ● We have also seen that hard coding may improve performance but as previously discussed it it not takes as an option because :- – Avoiding embedded synchronization improves portability – Keeps the programming model simple – Synchronization is not always needed – It would be harder to perform sync inline in quickthreads than in hand coded systems, – Besides an out if line call would make the switches slower – Thus we perform switched without any locking.
  21. 21. Flexibility and simplicity ● Usually can be achieved in a number of ways:- – Using powerful operations – Using customization operations – Stripping away operations that limit flexibility
  22. 22. Disadvantages ● Customization in operations are often slow and most existing languages do not provide fine grain customization ● Stripping removes operations that are not flexible but implement a useful purpose thus forcing the client to re-implement this functionality somehow ● In our design we strip everything leaving the essentials : initialization and context switch.
  23. 23. Key decisions ● The operations(initialize, start and stop ) are simple enough that they are close to the cost of fixed alternatives ● Quickthreads does not depend on other routines :-client does not need to worry about it introducing spurious races or deadlock ● The client is given the choice to handle storage and this is an advantage as client chooses what is best for it and user ● Quickthreads implements no scheduling mechanism . This is left to the client so as to choose what best suits it. ● It also lacks semaphores, monitors , non blocking I/O ....so as to ensure clients that don't use them are not paying the price of those that do
  24. 24. ● Quickthreads provides basic operations to save and restore thread state. ● A huge bottleneck as compared to other packages is that scheduling and locking are provided by the client and executed via indirect procedure calls ● Quickthreads is designed to minimize procedure calls on each context switch to just two ● Hand coded thread operations are faster because locking and scheduling policies are fixed , simple and in-lined to minimize procedure calls overheads and holding time
  25. 25. Flexibility and Simplicity ● Flexibility can be achieved in several ways: – Using powerful operations – Using customization operations – Removing operations that limit flexibility
  26. 26. disadvantages ● Powerful operations are often slower ● Customizable operations are slower and existing languages do not support fine grain customization ● Stripping could remove parts of code that serve a useful purpose
  27. 27. ● Quickthreads takes away all operations leaving behind context switching and thread initialization ● A design decision is that withing the library the start stop and initialize operations are so cheap in terms of cost as compared to fixed alternatives
  28. 28. Variant argument lists ● Thread creation primitives often take as its arguments a function pointer and 0 or more arguments to the function ● This is difficult because when a thread function is called some memory has to be set aside for parameter search and store ● From the view of thread packages it is easier to point to one argument and data structure The other structure will allow you to store more in it apart from that.
  29. 29. ● Even though varargs does accept single arguments , it sadly makes threads slower to initialize ● It is therefore necessary to provide two interfaces, one for varargs and one fast one ● Using two interfaces makes threads harder to understand
  30. 30. The cool beans part ● As noted earlier quickthreads does not perform any allocation : it allows on the client thread package to allocate stacks, threads, queues, or any auxiliary data structures ● This therefore means it also does not implement any semaphores, monitors ,e. t. C ● It will perform the context switch and let the client worry about the clean up and allocation in a queue
  31. 31. ● A thread may be in various states: initialized , uninitialized and ready to run but not started,running on a processor, blocked and waiting to be awakened or aborted in which case it is dead and can not be awakened ● Initialized threads are started the same way blocked threads are started: when thee distinction is unimportant they are both considered run-able ● All thread manipulations is done using a thread's stack pointer
  32. 32. ● A client creates a thread by allocating a stack region, whereby its growth whether up or down is dependent on the machine ● A client will therefore initialize a thread passing in address and size of the stack region and getting back the stack pointer of the uninitialized thread. ● The client does this by calling a QT initialization primitive, which initializes the stack with functions and arguments to be used when the thread is started.
  33. 33. ● As noted then an initialized thread and a suspended thread have no difference. ● The stack's thread pointer is passed to the context switch primitive along with a helper function and some arguments to assist in cleaning the old thread once the switch is done. ● The helper function is a parameter to the context switch primitive and thus can be changed dynamically ● Threads are known to run in a queue but here we do something different: we allow use of arbitrary data structures and thus threads can run when embedded in the data structure
  34. 34. Programming interface ● Quick threads is written in C and assembly language ● It must be bundled with the executable ● Include path options are used to tell it where to locate the header ● The basic routing interface consists of routines(functions or macros) to create , initialize, run and stop threads