CE-4030: OPTIMIZING PHOTO EDITING APPLICATION
FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE
CYBERLINK MARKETING MANAGER
STANLEY LAM
AGENDA

Why Photo Editing Application – PhotoDirector?
Photo Editing Pipelines (RAW processing)
How AMD HSA helps in Photo Editing?
Proof of Concept: HSA Performance Showcase
Key Takeaways
2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
Why Photo Editing Software
– PhotoDirector?
WHY PHOTO EDITING SOFTWARE?
THE RIGHT APPLICATION FOR HSA

CyberLink Multimedia Software
‒ Media Playback: PowerDVD
‒ Video Editing: PowerDirector
‒ Photo Editing: PhotoDirector

Nikon D3S

Resolution
(M)
24

6034

4012

Nikon D4

24

6048

4032

Nikon D70S

24

6034

4028

Nikon D800E

36

7378

4924

Model

Width Height

7360

4912

5616

3744

21

5616

3744

Canon Eos 600D

‒ Many editing tasks can be parallelize
‒ Processing / Decoding RAW files is time consuming
‒ RAW image editing can be both computational & memory
intensive

36
21

Canon Eos 5D Mark Iii

Why Photo Editing Software?

Nikon D90
Canon Eos 20D

22

5760

3840

Canon Eos 7D

20

5472

3648
3648

Samsung Nx11

20

5472

Samsung Dslr-A700

20

5472

3648

Sony Slt-A77V

24

6000

4000

Sony Dslr-A850

24

6000

4000

Sony Dslr-A900

24

6048

4032

Sony Nex-5N

24

6048

4032

Sony Dsc-Rx100

24

6000

4000

Sony Dsc-Rx1

How AMD HSA helps in Photo Editing?
‒ Utilize GPU compute units to speed up performance
‒ Eliminate overheads and memory copy bottlenecks between
HOST and DEVICE memories

20

5472

3648

Sony Dsc-F828

24

6000

4000

Pentax K-5 Ii

40

7264

5440

Phase One P 20

22

4096

5456

Phase One P 30

22

4096

5456

Phase One P40+

32

6526

4904

Phase One P 45+

39

7246

5444

4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Phase One P65+

39

7246

5444

Phase One Dslr-A100

60

8984

6732

MEM Space

193,667,264
195,084,288
194,439,616
290,634,176
289,218,560
168,210,432
168,210,432
176,947,200
159,694,848
159,694,848
159,694,848
192,000,000
192,000,000
195,084,288
195,084,288
192,000,000
159,694,848
192,000,000
316,129,280
178,782,208
178,782,208
256,028,032
315,577,792
315,577,792
483,842,304
Photo Editing Pipeline
PHOTO EDITING PIPELINE
RAW PROCESSING
Photo Retouch
(Preview Size)
RAW Decoder
Photo Retouch

RAW Decoder
RAW Decoder
IMG_0077.CR2
IMG_0077.CR2

RAW
Decoder

JPEG Encoder
JPEG Encoder
NEW.JPG NEW.JPG

Photo Retouch
(Full Scale Size)

KEY Area for potential performance improvement

Camera Model

RAW Decode time
(single photo)

Canon 1D-X

7.347 seconds

Canon 1Ds MK3

8.400 seconds

Panasonic DMC FZ100

7.916 seconds

Test Tool

Phase One P25

10.475 seconds

PhotoDirector 5

Phase One P30

12.495 seconds

Phase One P45

13.049 seconds

Samsung NX10

6.263 seconds

Samsung NX100

5.280 seconds

Sony A700

5.522 seconds

Sony F828

6.996 seconds

‒ RAW Decoder
‒ Decoder elapse time is long for complex RAW formats

RAW Decode is necessary during all stages in the editing
pipeline
‒
‒
‒
‒

When generating FULL SCALE preview
When entering Retouch module for the first time
When resuming from previous editing
When exporting to JPG/TIFF files

6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Test Platform
CPU: AMD A10-4655M
RAM: 4GB
OS: Windows 7 32-bit
PHOTO EDITING PIPELINE
OPENCL AND MEMORY MANAGEMENT

RAW Decoder
(GPU)

Photo Retouch
(CPU & GPU)

RAW Decoder
(GPU)

JPEG Encoder
(CPU)

IMG_0077.CR2

NEW.JPG

Frame Buffer

Frame Buffer

Frame Buffer

UN-MAP
MAP

HOST Memory

UN-MAP

MAP

DEVICE Memory
Frame Buffer

Frame Buffer

Performance can be improved by utilizing GPU compute
power (OpenCL 1.x)
‒ Improve RAW decode performance
‒ Improve EDITING (Retouch) performance
‒ OpenCL 1.x is great, however…
7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Frame Buffer
MEMORY SPACE AND PERFORMANCE
RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS

OpenCL 1.x can speed up performance substantially and
yet creates new challenges
‒ Buffering between HOST and DEVICE creates overheads
‒ Sometimes the overheads are taking up a large portion of
execution time

‒ DEVICE memory space is limited
‒ 512MB can only hold one 36MP photo, or two 24MP photos
‒ Creates more read and writes between HOST and DEVICE
memories
512MB
Frame Buffer

Tiling
36MP

More Reads

More Writes

8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
How AMD HSA helps in
Photo Editing?
OPTIMIZING PERFORMANCE WITH AMD HSA
THE ADVANTAGE OF ADOPTING HSA WITH OPENCL

RAW Decoder

Photo Retouch

RAW Decoder

JPEG Encoder

IMG_0077.CR2

NEW.JPG

HOST Memory
Frame Buffer

Frame Buffer

Frame Buffer

DEVICE Memory
Using AMD HSA to improve performance over OpenCL 1.x
‒ Share virtual memory breaks border of CPU and GPU
‒ Reduce overheads of moving data
‒ Use AMD APU platform to achieve true Heterogeneous Computing

10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
3 LEVELS OF SHARED VIRTUAL MEMORY
CHOOSING SHARED VIRTUAL MEMORY

3 Levels of Shared Virtual Memory support (can be configured during initialization)
‒ Coarse Grain Buffer
‒ Ability to share virtual pointers between HOST and DEVICE

‒ Fine Grain Buffer
‒ Ability to share buffer space between HOST and DEVICE

‒ Fine Grain System Buffer
‒ Ability to allow DEVICE to access entire HOST address space
‒ **Eliminates the need to specify explicit SVM pointers

Coding Complexity
‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System

11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
COARSE GRAIN SHARED BUFFER
OPENCL BUFFER VS. HSA BUFFER

PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer
type that gives the best performance

Standard OCL Buffers

HSA Coarse Grain Buffers

DEVICE

Buffer 1

Buffer 2

Buffer 2
Buffer 1

12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

HOST

DEVICE

Buffer 1

Buffer 1

Buffer 2

HOST

Buffer 2
Proof of concept:
HSA Performance Showcase
AMD HSA BUFFER TYPES
RELATIVE PERFORMANCE COMPARISON
Performance Index of Applying Hue Change to RAW Photo

Our proof of concept codes showed
potential performance difference
‒ Good potential performance when using
Coarse Grain Buffers
‒ Results show roughly 2x difference between
Coarse Grain vs. Fine Grain implementation

Test Tool
PhotoDirector 5 Testbed

Test Platform
CPU: AMD KAVERI
RAM: 4GB
OS: Windows 7 64-bit

14 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

Coarse
Grain

Fine
Grain
Key Takeaways
KEY TAKEAWAY
AMD HSA SHOWS GREAT POTENTIAL

AMD HSA shows great potential for
photo editing application
– CyberLink PhotoDirector
‒ Many more photo editing tasks can
leverage the performance advantage on
AMD HSA Platforms
‒ It’s important to experiment and work
with the most suitable HSA buffer type
‒ Potential performance improvements for
Parallelizable and Memory intensive
applications

16 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software
changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD
reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of
such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices,
Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names
are for informational purposes only and may be trademarks of their respective owners.

17 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL

CE-4030, Optimizing Photo Editing Application with HSA Technology, by Stanley Lam

  • 1.
    CE-4030: OPTIMIZING PHOTOEDITING APPLICATION FOR AMD HETEROGENEOUS SYSTEM ARCHITECTURE CYBERLINK MARKETING MANAGER STANLEY LAM
  • 2.
    AGENDA Why Photo EditingApplication – PhotoDirector? Photo Editing Pipelines (RAW processing) How AMD HSA helps in Photo Editing? Proof of Concept: HSA Performance Showcase Key Takeaways 2 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 3.
    Why Photo EditingSoftware – PhotoDirector?
  • 4.
    WHY PHOTO EDITINGSOFTWARE? THE RIGHT APPLICATION FOR HSA CyberLink Multimedia Software ‒ Media Playback: PowerDVD ‒ Video Editing: PowerDirector ‒ Photo Editing: PhotoDirector Nikon D3S Resolution (M) 24 6034 4012 Nikon D4 24 6048 4032 Nikon D70S 24 6034 4028 Nikon D800E 36 7378 4924 Model Width Height 7360 4912 5616 3744 21 5616 3744 Canon Eos 600D ‒ Many editing tasks can be parallelize ‒ Processing / Decoding RAW files is time consuming ‒ RAW image editing can be both computational & memory intensive 36 21 Canon Eos 5D Mark Iii Why Photo Editing Software? Nikon D90 Canon Eos 20D 22 5760 3840 Canon Eos 7D 20 5472 3648 3648 Samsung Nx11 20 5472 Samsung Dslr-A700 20 5472 3648 Sony Slt-A77V 24 6000 4000 Sony Dslr-A850 24 6000 4000 Sony Dslr-A900 24 6048 4032 Sony Nex-5N 24 6048 4032 Sony Dsc-Rx100 24 6000 4000 Sony Dsc-Rx1 How AMD HSA helps in Photo Editing? ‒ Utilize GPU compute units to speed up performance ‒ Eliminate overheads and memory copy bottlenecks between HOST and DEVICE memories 20 5472 3648 Sony Dsc-F828 24 6000 4000 Pentax K-5 Ii 40 7264 5440 Phase One P 20 22 4096 5456 Phase One P 30 22 4096 5456 Phase One P40+ 32 6526 4904 Phase One P 45+ 39 7246 5444 4 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Phase One P65+ 39 7246 5444 Phase One Dslr-A100 60 8984 6732 MEM Space 193,667,264 195,084,288 194,439,616 290,634,176 289,218,560 168,210,432 168,210,432 176,947,200 159,694,848 159,694,848 159,694,848 192,000,000 192,000,000 195,084,288 195,084,288 192,000,000 159,694,848 192,000,000 316,129,280 178,782,208 178,782,208 256,028,032 315,577,792 315,577,792 483,842,304
  • 5.
  • 6.
    PHOTO EDITING PIPELINE RAWPROCESSING Photo Retouch (Preview Size) RAW Decoder Photo Retouch RAW Decoder RAW Decoder IMG_0077.CR2 IMG_0077.CR2 RAW Decoder JPEG Encoder JPEG Encoder NEW.JPG NEW.JPG Photo Retouch (Full Scale Size) KEY Area for potential performance improvement Camera Model RAW Decode time (single photo) Canon 1D-X 7.347 seconds Canon 1Ds MK3 8.400 seconds Panasonic DMC FZ100 7.916 seconds Test Tool Phase One P25 10.475 seconds PhotoDirector 5 Phase One P30 12.495 seconds Phase One P45 13.049 seconds Samsung NX10 6.263 seconds Samsung NX100 5.280 seconds Sony A700 5.522 seconds Sony F828 6.996 seconds ‒ RAW Decoder ‒ Decoder elapse time is long for complex RAW formats RAW Decode is necessary during all stages in the editing pipeline ‒ ‒ ‒ ‒ When generating FULL SCALE preview When entering Retouch module for the first time When resuming from previous editing When exporting to JPG/TIFF files 6 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Test Platform CPU: AMD A10-4655M RAM: 4GB OS: Windows 7 32-bit
  • 7.
    PHOTO EDITING PIPELINE OPENCLAND MEMORY MANAGEMENT RAW Decoder (GPU) Photo Retouch (CPU & GPU) RAW Decoder (GPU) JPEG Encoder (CPU) IMG_0077.CR2 NEW.JPG Frame Buffer Frame Buffer Frame Buffer UN-MAP MAP HOST Memory UN-MAP MAP DEVICE Memory Frame Buffer Frame Buffer Performance can be improved by utilizing GPU compute power (OpenCL 1.x) ‒ Improve RAW decode performance ‒ Improve EDITING (Retouch) performance ‒ OpenCL 1.x is great, however… 7 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Frame Buffer
  • 8.
    MEMORY SPACE ANDPERFORMANCE RELATIVE KERNEL VS. BUFFER PERFORMANCE ANALYSIS OpenCL 1.x can speed up performance substantially and yet creates new challenges ‒ Buffering between HOST and DEVICE creates overheads ‒ Sometimes the overheads are taking up a large portion of execution time ‒ DEVICE memory space is limited ‒ 512MB can only hold one 36MP photo, or two 24MP photos ‒ Creates more read and writes between HOST and DEVICE memories 512MB Frame Buffer Tiling 36MP More Reads More Writes 8 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 9.
    How AMD HSAhelps in Photo Editing?
  • 10.
    OPTIMIZING PERFORMANCE WITHAMD HSA THE ADVANTAGE OF ADOPTING HSA WITH OPENCL RAW Decoder Photo Retouch RAW Decoder JPEG Encoder IMG_0077.CR2 NEW.JPG HOST Memory Frame Buffer Frame Buffer Frame Buffer DEVICE Memory Using AMD HSA to improve performance over OpenCL 1.x ‒ Share virtual memory breaks border of CPU and GPU ‒ Reduce overheads of moving data ‒ Use AMD APU platform to achieve true Heterogeneous Computing 10 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 11.
    3 LEVELS OFSHARED VIRTUAL MEMORY CHOOSING SHARED VIRTUAL MEMORY 3 Levels of Shared Virtual Memory support (can be configured during initialization) ‒ Coarse Grain Buffer ‒ Ability to share virtual pointers between HOST and DEVICE ‒ Fine Grain Buffer ‒ Ability to share buffer space between HOST and DEVICE ‒ Fine Grain System Buffer ‒ Ability to allow DEVICE to access entire HOST address space ‒ **Eliminates the need to specify explicit SVM pointers Coding Complexity ‒ Complexity: Coarse Grain > Fine Grain > Fine Grain System 11 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 12.
    COARSE GRAIN SHAREDBUFFER OPENCL BUFFER VS. HSA BUFFER PhotoDirector’s existing code base does not contain excessive pointers, we are able to choose the buffer type that gives the best performance Standard OCL Buffers HSA Coarse Grain Buffers DEVICE Buffer 1 Buffer 2 Buffer 2 Buffer 1 12 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL HOST DEVICE Buffer 1 Buffer 1 Buffer 2 HOST Buffer 2
  • 13.
    Proof of concept: HSAPerformance Showcase
  • 14.
    AMD HSA BUFFERTYPES RELATIVE PERFORMANCE COMPARISON Performance Index of Applying Hue Change to RAW Photo Our proof of concept codes showed potential performance difference ‒ Good potential performance when using Coarse Grain Buffers ‒ Results show roughly 2x difference between Coarse Grain vs. Fine Grain implementation Test Tool PhotoDirector 5 Testbed Test Platform CPU: AMD KAVERI RAM: 4GB OS: Windows 7 64-bit 14 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL Coarse Grain Fine Grain
  • 15.
  • 16.
    KEY TAKEAWAY AMD HSASHOWS GREAT POTENTIAL AMD HSA shows great potential for photo editing application – CyberLink PhotoDirector ‒ Many more photo editing tasks can leverage the performance advantage on AMD HSA Platforms ‒ It’s important to experiment and work with the most suitable HSA buffer type ‒ Potential performance improvements for Parallelizable and Memory intensive applications 16 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL
  • 17.
    DISCLAIMER & ATTRIBUTION Theinformation presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners. 17 | PRESENTATION TITLE | NOVEMBER 19, 2013 | CONFIDENTIAL