Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# ocelot

848

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
848
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1.
PTXOptimizer @ ocelot by Sean-Chen 2011/04/15
• 2.
• PTXOptimizer
• SubkernelFormationPass
• 3. RemoveBarrierPass
• 4. LinearScanRegisterAllocationPass
• 6. Module Kernel DFG0 CFG0 SubkernelFormationPass 2 Step1. Assign top kernel for all CFG0 CFG1 CFG2 a b + a a c entry exit
• 7. Module Kernel 0 Module Kernel 0 Kernel 1 Kernel 0 Kernel 1 kernel_0 kernel_1 Schedule assign Sub Kernels assign SubkernelFormationPass 3 CFG0 CFG1 CFG2 CFG2
• 8. SubkernelFormationPass 1
• Algorithm
• 9. 1) start at a kernel entry point that dominates all remaining blocks
• 10. 2) create a strongly connected subgraph with N instructions and no barriers
• 11. a) This is a new kernel
• 12. 3) For all edges leaving the graph
• 13. a) save all live registers
• 14. b) save the target block's id
• 15. c) create a new scheduler block includes an indirect branch to each
• 16. of the targets
• 17. d) redirect each edge to the kernel exit point
• 18. e) create a new kernel rooted in the new scheduler block, goto 1
Ref: SubkernelFormationPass.cpp
• 19. SubkernelFormationPass 4
• sample methods to do
• Create new Kernel
• Kernel = new kernel();
• Assign New CFG 2 Kernel
• New_kernel->cfg() = new CFG();
• 20. Org_Kernel->cfg()->update();
• Update PTX graph
• PTX->cfg()->update()
• Update module
• module->update()
• Re-schedule()
• 23. SubkernelFormationPass 5
• Why to do it?
• 25. Ps: that is a trade off in “fork” “join” with kernel communication
• 26. RemoveBarrierPass 1 Ref: ocelot-pact.pdf
• 27. RemoveBarrierPass 2
• How to do it?
• Replace Barrier instruction to function call.
• Definition
• The call instruction stores the address of the next instruction, so execution can resume at that point after executing a ret instruction. A call is assumed to be divergent unless the .uni suffix is present, indicating that the call is guaranteed to be non-divergent, meaning that all threads in a warp have identical values for the guard predicate and call target.
ref PTX_isa 2.1
• 28. RemoveBarrierPass 3
• Example
• Assign a=a+1 in CTA with different thread.
• 29. a = a+1 ; sync(); //@ sync mem reg .... b=b+1;
• bar.sync()
• 31.
• 32. RemoveBarrierPass 3
• sample methods to do @ load /store 2 new memory address
• Find branch location and replace it with
• Brn = Kernel->cfg()->terminator()->Branch();
• 33. Instruiction *IT= new Instruction(IR::FunctionCall)
• 34. Kernel->cfg()->insert(IT);
• 35. kernel->cfg()->remove(Brn);
• Assign Function call type
• IT->d() = IR::addressType // dest register
• 36. IT->a() = IR::addressType // source register
• 37. IT->type() = IR::FunctionCall
• IT->Preprocessor()->update()
• 38. IT->Successor()->update()
• Call back to original pointer
• new end pointer = org end pointer
• 39. RemoveBarrierPass 4
• Why to do it?
• 40. Reduce the thread waiting time in each barrier synchronous check.
• 41.
• LinearScanRegisterAllocationPass 1
@ %r1 Ref: ocelot-pact.pdf
• 42.
• LinearScanRegisterAllocationPass 2
• sample methods to do
• Base On SSA graph
• Find PHINodes
• kernel->dfg()->hasPHINode()?
• Replace all alive in PHINode
• Foreach (kernel->dfg->PHINode()->aliveIn())...
• Update graph
• kernel->cfg()->update()
• 43. Preprocessor
• 44. Successor
• 45.
• LinearScanRegisterAllocationPass 3
• Why to do it?
• 46. Replace register to local share memory.
• More parallelism to thread access.
• 47. More data sharing
• definition
• Predicated Execution
• reg .pred p, q, r
• Example
• if (i < n)
• 49. j = j + 1;
• 50. setp.lt.s32 p, i, n; // compare i to n
• 51. @!p bra L1; // if false, branch over
• 52. add.s32 j, j, 1;
• 53. L1: ...
j=j+1 j=j+1
• 54.
• sample methods to do
• Find Branch instruction and dominator
• Dom = kernel->dominator_tree();
• 56. Post = kernel->post_dominator_tree();
• 57. kernel->terminator()->hasBranch()?
• Replace Branch to Predicted
• Instruction IT = new Instruction(IR::Instruction::Pred);
• 58. kernel->Instruction->Insert(IT);
• 59. Kernel->Instruction->erase(Bn);
• Update graph
• kernel->cfg()->update();
• 60. kernel->PTX()->update();