2. Outline
1. Introduction to C++AMP
2. Introduction to Tiling
3. tile_static
4. barrier.wait and solutions
a. C++11 thread
b. setjmp/longjmp
c. ucontext
2
11. Goal
● Implement all the C++AMP function on CPU
instead of GPU without any compiler
modification.
11
12. tiled_static
● The limitation of C++ syntax leads to the
following choices
○ const, volatile
○ __attribute__(...)
○ static
● Choose static
○ static memory can be shared among all the threads
○ side effect: At most one thread group can be
executed at the same time.
#define tile_static static
12
13. Barrier.wait
● Threads in the same thread group will be
waited at the point where “wait” is called.
● Program can
a. perform real barrier action
b. jump out of current execution context
13
15. C++11 thread
● launch hundreds of threads at a time.
● implemente my own barrier by using C++11
mutex library.
→ extremely slow.
→ The data on static memory will be corrupted
15
16. setjmp/longjmp
● int setjmp(jmp_buf env)
○ setjmp() saves the stack context/environment in env
for later use by longjmp.
○ The stack context will be invalidated if the function
which called setjmp() returns.
● void longjmp(jmp_buf env, int val);
○ longjmp() restores the environment saved by the last
call of setjmp.
16
17. 1. #include <stdio.h>
2. #include <setjmp.h>
3. jmp_buf buf;
4. void wait(void) {
5. printf("waitn"); // prints
6. longjmp(buf,1);
7. }
8. void first(void) {
9. wait();
10. printf("firstn"); // does not print
11. }
12. int main() {
13. if (!setjmp(buf))
14. first(); // when executed, setjmp returns 0
15. else // when longjmp jumps back, setjmp returns 1
16. printf("mainn"); // prints
17. return 0;
18. }
17
25. Problems
● Cannot return
○ return address in the stack is destroyed
● Cannot use too many static variables
○ will lost spilled registers
→ can be solved by using “alloca”
http://www.codemud.net/~thinker/GinGin_CGI.
py/show_id_doc/489
25
27. ucontext_t
typedef struct ucontext {
struct ucontext *uc_link;
sigset_t uc_sigmask;
stack_t uc_stack;
mcontext_t uc_mcontext;
...
} ucontext_t;
● uc_link
○ points to the context that will be resumed when the current context
terminates
● uc_stack
○ the stack used by this context
● uc_mcontext
○ machine-specific representation of the saved context, that includes the
calling thread's machine registers
27
28. Functions
● int getcontext(ucontext_t *ucp);
○ initializes the structure pointed at by ucp.
● int setcontext(const ucontext_t *ucp);
○ restores the user context pointed at by ucp
● int swapcontext(ucontext_t *oucp, const
ucontext_t *ucp);
○ saves the current context in the structure pointed to
by oucp, and then activates the context pointed to by
ucp.
28
29. makecontext
● void makecontext(ucontext_t *ucp, void
(*func)(), int argc, ...);
○ glibc(x86_64) saves the arguments to registers
instead of pushing them on stack as AMD64 ABI
said
○ The size of the arguments that passed to
makecontext should be no less than sizeof(register)
29
33. Problems
1. How to pass a lambda?
○ makecontext(&ctx,
(void (*)(void))&Kernel::operator(), …);
2. How to pass non-int arguments?
○ What if sizeof(Type) > sizeof(int)
○ How about complex structure and class
33