20131114

WEEKLY REPORT
Thur., Nov 14, 2013
Pin Yi Tsai

OUTLINE
• Current Work
• Compute Integral Image – computeByRow
 Using shared memory
 Using register

 Result
• CUDA Memory Architecture

USING SHARED MEMORY

• Scope: block

• Shared memory: store the values of the previous line
• computing by Row for img[*][y] and img[*][y+1]
• Time t: calculate img[*][y] + shared memory[*]
• Then store the result back to shared memory[*]
• Time t+1: calculate img[*][y+1] + shared memory[*]

USING REGISTER

• Scope: thread
• One line one thread
 Why not one pixel one thread? The use of _syncthread();

• Using register: store the values of the previous pixel

RESULT
• 16x16

• Serial version: 0.006336 ms
• Parallel version: 5.88559e-39 ms
======== Profiling result:
Time(%)

Time Calls

Avg

Min

Max Name

55.69 18.91us

1 18.91us 18.91us 18.91us computeByRow(float*, int, int)

25.84

8.78us

1

8.78us

12.91

4.38us

2

2.19us 2.18us 2.21us [CUDA memcpy DtoH]

5.56

1.89us

2

944ns

8.78us
928ns

8.78us computeByColumn(float*, int, int)
960ns [CUDA memcpy HtoD]

RESULT (CONT.)
• 640*480

• Serial version: 5.1607 ms
• Parallel version: 4.40496 ms
======== Profiling result:
Time(%)

Time Calls

Avg

Min

Max Name

66.37 2.19ms

1 2.19ms 2.19ms 2.19ms computeByRow(float*, int, int)

12.75 419.74us

2 209.87us 209.28us 210.46us [CUDA memcpy HtoD]

11.74 386.43us

2 193.22us 191.04us 195.39us [CUDA memcpy DtoH]

9.15 301.24us
1 301.24us 301.24us 301.24us
computeByColumn(float*, int, int)

20131114

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 20131114

Similar to 20131114 (20)

Recently uploaded

Recently uploaded (20)

20131114