2. OUTLINE
• So Far
• Compute Integral Image
Using Multiple Blocks
640 x 480
540 x 720
• Current Work
• Feature Value Extraction and Problems
3. RESULT
• 640*480
• Transpose at GPU ( Read from register; write to global memory)
• Using multiple blocks (columns / 8)
• Serial version: 5.11238 ms
• Parallel version: 1.85987 ms
• Speed up: 63% ( ( 5.11238- 1.85987 )/ 5.11238 )
======== Profiling result:
Time(%)
Time Calls
Avg
Min
Max Name
69.64 927.61us
2 463.81us 422.91us 504.70us computeByColumnTranspose(float*, float*, int, int)
15.69 209.02us
1 209.02us 209.02us 209.02us [CUDA memcpy HtoD]
14.67 195.36us
1 195.36us 195.36us 195.36us [CUDA memcpy DtoH]
4. RESULT (CONT.)
• 540x720
• Transpose at GPU ( Read from register; write to global memory)
• Using multiple blocks ( column / 8 )
• Serial version: 6.4817 ms
• Parallel version: 2.1217 ms
• Speed up: 64% ( ( 6.4817- 2.1217 ) / 6.7817 )
======== Profiling result:
Time(%)
Time Calls
Avg
Min
Max Name
67.69 1.06ms
2 531.80us 448.28us 615.32us
computeByColumnTranspose(float*, float*, int, int)
16.68 262.11us
1 262.11us 262.11us 262.11us [CUDA memcpy HtoD]
15.63 245.66us
1 245.66us 245.66us 245.66us [CUDA memcpy DtoH]
5. HOW TO EXTRACT FEATURE VALUE
1. Count how many features per image we can extract.
2. Use GPU to extract feature values with 11 types.
3. Output the data to a file.
6. HOW TO EXTRACT FEATURE VALUE (CONT.)
1. Count how many features per image we can extract.
Takes the same number of loops as computing feature values
Find another way to assign the size of output data ?
2. Use GPU to extract feature values with 11 types.
Possibility of executing multiple types concurrently
Use streams ? (still at the stage of checking the usage)
3. Output the data to a file.
Will the size be too huge?
( for a 16*16 image, 10640 feature values will be extracted )