SlideShare a Scribd company logo
1 of 24
Download to read offline
REALTIME	
  4K	
  HDR	
  DECODING	
  WITH	
  GPU	
  ACES	
  
GARY	
  DEMOS	
  
IMAGE	
  ESSENCE	
  LLC	
  
 	
  	
  4k	
  Real4me	
  (24fps	
  2D)	
  Image	
  Bandwidth	
  
•	
  Exr	
  half-­‐float	
  (e.g.	
  ACES/OCES)	
  or	
  16-­‐bit	
  unsigned	
  short	
  integers:	
  
	
  -­‐	
  2Bytes/col	
  x	
  3cols(RGB)	
  x	
  4096	
  x	
  2160	
  x	
  24fps	
  =	
  1.27GBytes/sec	
  =	
  10.2gbps	
  
•	
  32-­‐bit	
  floats	
  (used	
  inside	
  OpenCL	
  in	
  the	
  GPU	
  and	
  within	
  most	
  CPU	
  decoding	
  steps):	
  
	
  -­‐	
  4Bytes/col	
  x	
  3cols(RGB)	
  x	
  4096	
  x	
  2160	
  x	
  24fps	
  =	
  2.54GBytes/sec	
  =	
  20.4gbps	
  
•	
  10-­‐bit	
  dpx-­‐packed	
  pixels:	
  
-­‐	
  4Bytes/3cols	
  x	
  3cols(RGB)	
  x	
  4096	
  x	
  2160	
  x	
  24fps	
  =	
  .85GBytes/sec	
  =	
  6.8gbps	
  
Future	
  Fron4ers	
  
•	
  2Bytes/col	
  x	
  3cols(RGB)	
  x	
  4096	
  x	
  2160	
  x	
  60fps	
  =	
  3.19GBytes/sec	
  =	
  25.5gbps	
  
•	
  2Bytes/col	
  x	
  3cols(RGB)	
  x	
  4096	
  x	
  2160	
  x	
  120fps	
  =	
  6.37GBytes/sec	
  =	
  51.0gbps	
  
•	
  2Bytes/col	
  x	
  3cols(RGB)	
  x	
  8192	
  x	
  4320	
  x	
  24fps	
  =	
  5.10GBytes/sec	
  =	
  40.8gbps	
  
•	
  2Bytes/col	
  x	
  3cols(RGB)	
  x	
  8192	
  x	
  4320	
  x	
  120fps	
  =	
  25.48GBytes/sec	
  =	
  203.8gbps	
  
•	
  3D	
  any	
  of	
  the	
  above	
  x2	
  	
  
	
  
•	
  DisplayPort	
  1.2	
  goes	
  up	
  to	
  20gbps	
  
	
  
•	
  A	
  W9000	
  has	
  six	
  DisplayPort	
  1.2	
  outputs	
  
	
  
•	
  The	
  demonstra4on	
  system	
  has	
  four	
  W9000’s	
  
	
  
•	
  That’s	
  24	
  DisplayPort	
  1.2	
  outputs!	
  
	
  
•	
  Total	
  available	
  pixel	
  output	
  is	
  24	
  x	
  20gbps	
  =	
  480gbps	
  
	
  
 	
  •	
  That’s	
  more	
  than:	
  
	
  
	
  	
  -­‐	
  2x	
  (3D)	
  2Bytes/col	
  x	
  3cols(RGB)	
  x	
  8192	
  x	
  4320	
  x	
  120fps	
  =	
  51.0GBytes/sec	
  =	
  407.7gbps!	
  
	
  
-­‐	
  Could	
  work	
  up	
  to	
  this	
  in	
  an	
  array	
  of	
  displays	
  
	
  
•	
  S4ll	
  a	
  few	
  issues	
  (at	
  least	
  for	
  this	
  author):	
  
	
  -­‐	
  Locking	
  playback	
  speed	
  with	
  pixels	
  from	
  CL	
  
	
  -­‐	
  Synchronizing	
  audio	
  
	
  
Real4me	
  Floa4ng	
  Point	
  ACES	
  Decoding	
  
	
  
Including	
  Real4me	
  Interac4ve	
  Adjustment	
  and	
  
	
  	
  	
  RRT/ODT	
  in	
  the	
  GPU	
  	
  

2x	
  Intel	
  E5-­‐2690	
  CPUs	
  
Compressed	
  
Bidiles	
  
(SATA	
  FlashRam)	
  

4k	
  
Real4me	
  
10/12-­‐bits	
  
RGB	
  

DVS	
  
Atomix	
  

Floa4ng	
  
Point	
  
Decoding	
  

ACES	
  

Packed	
  Pixels	
  
Ready	
  for	
  Display	
  

	
  	
  	
  	
  	
  Fifo	
  of	
  Frames	
  
For	
  Smooth	
  Playout	
  

4x	
  FirePro	
  W9000s	
  
GPU	
  Processing	
  in	
  OpenCL	
  
	
  
•	
  Sharpen/soeen	
  spa4al	
  filter	
  
•	
  Transform	
  to	
  P3	
  Colorspace	
  
•	
  ASC	
  CDL	
  adjustments	
  
•	
  Transform	
  back	
  to	
  ACES	
  
•	
  RRT	
  and	
  ODT	
  in	
  3D	
  LUT	
  
•	
  Fix	
  and	
  pack	
  pixels	
  
 	
  	
  	
  	
  	
  	
  CPU	
  Par44oning	
  
	
  
•	
  Running	
  Scien4fic	
  Linux	
  6.4	
  
•	
  Relying	
  on	
  a	
  fifo-­‐of-­‐frames	
  in	
  the	
  DVS	
  Atomix	
  using	
  the	
  FIFO-­‐API	
  
	
  to	
  smooth	
  out	
  the	
  non-­‐real4me	
  ahributes	
  of	
  Linux	
  
•	
  Mul4ple	
  decoder	
  processes	
  forked	
  at	
  startup	
  
•	
  Compressed	
  bidiles	
  are	
  retrieved	
  by	
  each	
  process	
  from	
  SATA	
  FlashRAM/SSD	
  
•	
  The	
  number	
  of	
  decoder	
  processes	
  is	
  selected	
  at	
  run4me	
  startup	
  
	
  (tuned	
  for	
  performance	
  and	
  available	
  memory)	
  
CPU	
  Par44oning	
  (cont.)	
  
	
  
•	
  Parent	
  process	
  becomes	
  display	
  process	
  
•	
  Display	
  process	
  creates	
  shared	
  memory	
  and	
  sends	
  semaphores	
  
	
  to	
  decoder	
  processes	
  that	
  buffers	
  are	
  available	
  
•	
  Each	
  decoder	
  process	
  creates	
  a	
  frame	
  or	
  range	
  of	
  frames	
  
•	
  A	
  display	
  process	
  manages	
  shared	
  memory	
  and	
  DMA	
  
	
  to/from	
  GPU’s	
  and	
  DVS	
  Atomix	
  
•	
  Display	
  process	
  tells	
  decoder	
  processes	
  when	
  buffers	
  again	
  
	
  become	
  available	
  
 	
  	
  GPU	
  Par44oning:	
  
	
  
•	
  numDevices	
  OpenCL	
  call	
  provides	
  the	
  number	
  of	
  GPU’s	
  available	
  
	
  
•	
  Ver4cal	
  screen	
  height	
  par44oned	
  into	
  numDevices	
  
	
  
•	
  Four	
  Firepro	
  W9000	
  GPUs	
  in	
  this	
  demonstra4on	
  system	
  
	
  
•	
  All	
  GPUs	
  share	
  a	
  common	
  “context”	
  and	
  associated	
  “kernels”	
  
	
  (one	
  CL	
  interpret)	
  
	
  
•	
  Each	
  of	
  the	
  four	
  GPUs	
  given	
  a	
  “command_queue”	
  and	
  separate	
  
	
  “cl_mem”	
  buffers	
  
	
  
 	
  	
  	
  	
  GPU	
  Par44oning	
  (cont.)	
  
	
  
•	
  Kernel	
  args	
  for	
  each	
  cl_mem	
  are	
  updated	
  for	
  each	
  of	
  the	
  four	
  GPUs	
  before	
  
	
  	
  invoking	
  the	
  kernel	
  with	
  that	
  GPU’s	
  command_queue	
  
	
  
•	
  Each	
  GPU	
  given	
  1/4	
  of	
  screen	
  height	
  EnqueuedWrites	
  of	
  half-­‐float	
  ACES	
  
	
  
•	
  Each	
  GPU’s	
  packed	
  pixels	
  retrieved	
  into	
  appropriate	
  quarter	
  
	
  of	
  screen	
  height	
  via	
  EnqueuedReads	
  of	
  packed	
  pixels	
  
	
  
•	
  Double-­‐buffered	
  DMA	
  (getbuffer/putbuffer)	
  to	
  DVS	
  Atomix	
  using	
  
	
  FIFO	
  API	
  (fifo	
  of	
  frames	
  helps	
  smooth	
  linux	
  non-­‐real4me	
  aspects	
  
	
  yielding	
  real4me)	
  
	
  
 	
  	
  	
  	
  OpenCL	
  Code:	
  
•	
  Macros	
  are	
  used	
  for	
  all	
  math	
  
•	
  For	
  CPU	
  code,	
  “.h”	
  files	
  are	
  included	
  and	
  macros	
  invoked	
  
•	
  For	
  GPU	
  code,	
  cl	
  includes	
  the	
  same	
  “.h”	
  files,	
  and	
  macros	
  invoked	
  with	
  
	
  each	
  cl	
  kernel	
  
•	
  Macros	
  separated	
  into	
  various	
  types:	
  
	
  	
  	
  -­‐	
  Interac4on	
  processing,	
  ACES	
  to/from	
  P3	
  and	
  ASC_CDL	
  applied	
  in	
  P3	
  
	
  	
  -­‐	
  RRT	
  (Reference	
  Rendering	
  Transform)	
  processing,	
  
	
  	
  	
  using	
  LUT	
  (faster	
  but	
  less	
  accurate,	
  real4me	
  at	
  4k)	
  or	
  
	
  	
  direct	
  computa4on	
  (slower	
  but	
  highly	
  accurate,	
  real4me	
  at	
  2k)	
  
	
  	
  -­‐	
  ODT	
  (Output	
  Device	
  Transform)	
  processing,	
  for	
  the	
  type	
  
	
  	
  	
  of	
  ODT	
  selected	
  
 	
  OpenCL	
  Code	
  (cont.)	
  
	
  
•	
  Final	
  step	
  in	
  cl	
  is	
  32-­‐bit	
  floats	
  to	
  fix,	
  and	
  RGB	
  packing	
  (either	
  10bits	
  or	
  16bits),	
  
	
  adding	
  +-­‐1/2lsb	
  noise	
  dither	
  
•	
  OpenCL	
  does	
  not	
  include	
  a	
  random	
  number	
  intrinsic,	
  so	
  random	
  numbers	
  
	
  for	
  dithering	
  are	
  DMA’d	
  up	
  to	
  the	
  GPU	
  for	
  use	
  in	
  noise	
  dither,	
  using	
  a	
  
	
  randomizing	
  func4on	
  of	
  frame	
  number	
  and	
  scanline	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Reasons	
  for	
  liking	
  OpenCL:	
  
	
  
•	
  Support	
  for	
  DEVICE_TYPE_CPU	
  as	
  well	
  as	
  DEVICE_TYPE_GPU	
  
•	
  Vendor	
  independence	
  
•	
  Portability	
  
•	
  Easily	
  extended	
  to	
  automa4cally	
  u4lize	
  mul4ple	
  GPU’s	
  by	
  seqng	
  up	
  
	
  mul4ple	
  command	
  queues	
  based	
  upon	
  number	
  of	
  devices	
  detected	
  at	
  run4me	
  
•	
  Run4me	
  interpret	
  is	
  oeen	
  convenient	
  
•	
  Excellent	
  descrip4on	
  of	
  expected	
  precision	
  for	
  math	
  intrinsic	
  func4ons	
  
•	
  Strong	
  support	
  for	
  both	
  32-­‐bit	
  and	
  64-­‐bit	
  floa4ng	
  point	
  
 	
  	
  Reasons	
  for	
  liking	
  OpenCL	
  (cont.)	
  
	
  
•	
  Well-­‐thought-­‐out	
  device	
  and	
  system	
  query	
  capabili4es	
  
•	
  getGlobalID	
  provides	
  an	
  excellent	
  mechanism	
  for	
  parallelism	
  
	
  	
  without	
  requiring	
  further	
  considera4on	
  of	
  lower	
  level	
  hardware	
  organiza4on	
  
•	
  Easy	
  specifica4on	
  of	
  global,	
  constant,	
  and	
  local	
  datatypes	
  
•	
  Pipelining	
  control	
  via	
  blocking	
  and	
  non-­‐blocking	
  read	
  and	
  write	
  queues	
  
	
  and	
  via	
  clFinish	
  and	
  kernel	
  barriers	
  
•	
  First-­‐class	
  support	
  of	
  half-­‐float	
  using	
  vload_half	
  and	
  vstore_half	
  
 	
  	
  Weaknesses	
  of	
  OpenCL	
  (aka	
  “wish	
  list”):	
  
	
  
•	
  Difficult	
  to	
  obtain	
  visibility	
  during	
  debugging	
  
	
  (although	
  print	
  statements	
  available	
  on	
  some	
  systems	
  with	
  DEVICE_TYPE_CPU)	
  
	
  
•	
  No	
  detail	
  provided	
  by	
  “out	
  of	
  resources”	
  error	
  
	
  (e.g.	
  what	
  resources	
  are	
  we	
  out	
  of?)	
  
	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Weaknesses	
  of	
  OpenCL	
  (aka	
  “wish	
  list”,	
  cont.):	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  •	
  Lack	
  of	
  visibility	
  during	
  performance	
  tuning	
  
	
  -­‐	
  How	
  much	
  4me	
  is	
  being	
  spent	
  in	
  read/write	
  queues	
  to/from	
  CPU?	
  
	
  -­‐	
  How	
  full	
  are	
  global	
  and	
  constant	
  memory?	
  
	
  -­‐	
  How	
  much	
  global	
  memory	
  bandwidth	
  is	
  being	
  u4lized?	
  
	
  -­‐	
  How	
  full	
  are	
  registers?	
  
	
  -­‐	
  If	
  caches	
  are	
  present,	
  how	
  effec4ve	
  are	
  they	
  on	
  a	
  given	
  kernel?	
  	
  
	
  -­‐	
  Are	
  there	
  unnecessary	
  waits	
  that	
  could	
  be	
  async	
  overlapped?	
  	
  
	
  
•	
  The	
  4,	
  8,	
  16	
  CL	
  SIMD	
  types	
  are	
  not	
  mirrored	
  in	
  CPU	
  SSE/AVX/F16	
  intrinsics.	
  
	
  -­‐	
  Were	
  they	
  to	
  be	
  iden4cal,	
  they	
  could	
  be	
  used	
  in	
  macros	
  that	
  
	
  	
  	
  are	
  included	
  in	
  common	
  between	
  CL	
  kernels	
  and	
  CPU	
  threads	
  
	
  
System	
  Performance:	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  •	
  Limited	
  by	
  memory	
  and	
  bus	
  bandwidth	
  issues	
  
	
  
	
  	
  	
  	
  	
  	
  	
  •	
  DirectGMA	
  will	
  improve	
  this	
  
	
  
•	
  Plenty	
  of	
  GPU	
  power	
  s4ll	
  available	
  for	
  real4me	
  4k	
  processing	
  
	
  when	
  using	
  3D	
  LUT	
  RRT/ODT	
  
	
  
•	
  CPU	
  power	
  sufficient	
  for	
  wavelet-­‐only	
  floa4ng	
  point	
  decoding	
  
	
  at	
  4k	
  
	
  
•	
  CPU	
  power	
  sufficient	
  for	
  mo4on-­‐compensated	
  flowfield	
  sinc-­‐and-­‐wavelet	
  
	
  full	
  configura4on	
  at	
  2k.	
  	
  Speed	
  is	
  about	
  1/3	
  real4me	
  at	
  4k.	
  
	
  
•	
  With	
  threads	
  and	
  forked	
  processes,	
  will	
  be	
  able	
  to	
  take	
  advantage	
  
	
  of	
  an4cipated	
  major	
  increase	
  in	
  computa4onal	
  cores	
  
CL/GL	
  Interop	
  Explora4on:	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  •	
  Using	
  X11	
  on	
  Linux	
  (no	
  glut	
  support)	
  
	
  	
  	
  	
  	
  	
  	
  	
  •	
  Get	
  10-­‐bit	
  depth	
  at	
  setup	
  from	
  X11	
  (as	
  configura4on	
  using	
  GLXChooseFBConfigs)	
  
	
  	
  	
  	
  •	
  Uses	
  GL,	
  GLX,	
  and	
  CL/GL	
  context	
  (some	
  of	
  this	
  is	
  recent,	
  as	
  of	
  CL	
  1.2)	
  
•	
  Improves	
  (reduces)	
  memory	
  transfer	
  amount	
  required	
  by	
  direct	
  output	
  from	
  GPU	
  
•	
  Can	
  take	
  over	
  the	
  screen	
  (using	
  X11	
  XChangeProperty)	
  
•	
  Relies	
  on	
  “FrameBufferObject”	
  and	
  “Acquire”	
  and	
  “Release”	
  by	
  CL	
  
	
  	
  (Release	
  by	
  CL	
  implies	
  re-­‐acquire	
  by	
  GL,	
  must	
  CLFinish	
  and	
  GLFinish	
  correspondingly)	
  
•	
  Can	
  support	
  4k	
  at	
  10bits	
  via	
  DisplayPort	
  1.2	
  (and	
  HDMI	
  1.4a	
  via	
  DP	
  to	
  HDMI	
  dongle)	
  
•	
  Reportedly	
  can	
  be	
  used	
  with	
  MacOSX	
  and	
  Windows	
  (with	
  X11-­‐style	
  constructs)	
  
 	
  CL/GL	
  Interop	
  Weaknesses:	
  
•	
  Limited	
  to	
  single	
  GPU	
  for	
  CL	
  when	
  using	
  a	
  CL/GL	
  FBO	
  
	
  	
  -­‐	
  Would	
  be	
  nice	
  to	
  have	
  separate	
  FBO	
  quadrant	
  output	
  from	
  each	
  of	
  the	
  four	
  GPU’s	
  
•	
  Not	
  smooth	
  4ming	
  if	
  GPU	
  running	
  near	
  capacity	
  
•	
  No	
  locked	
  audio	
  sync	
  
•	
  No	
  “fifo-­‐of-­‐frames”	
  to	
  smooth	
  out	
  the	
  non-­‐smooth-­‐non-­‐real4me	
  Linux	
  behavior	
  
	
  	
  -­‐	
  Working	
  on	
  using	
  cl_gl	
  event	
  sync	
  to	
  simulate	
  this	
  
Ahributes	
  of	
  the	
  floa4ng	
  point	
  codec	
  
	
  
•	
  Layered	
  with	
  5	
  layers	
  up	
  to	
  base	
  layer	
  at	
  1k	
  using	
  wavelets	
  
	
  
•	
  Two	
  more	
  layers	
  from	
  1k	
  to	
  2k	
  and	
  2k	
  to	
  4k	
  built	
  with	
  sinc	
  filters,	
  
	
  using	
  wavelet	
  stacks	
  to	
  code	
  the	
  up-­‐res	
  deltas	
  
	
  
•	
  Base	
  and	
  up-­‐res	
  layers	
  can	
  be	
  mo4on	
  compensated	
  (sinc	
  filter	
  
	
  is	
  phase-­‐neutral	
  and	
  sub-­‐pixel	
  displacement	
  precision	
  to	
  1/100	
  pixel)	
  
 	
  	
  	
  	
  Floa4ng	
  Point	
  Codec	
  (cont.)	
  
	
  
•	
  Flowfield	
  is	
  used	
  at	
  low	
  resolu4on	
  for	
  mo4on	
  displacement,	
  
	
  coded	
  also	
  as	
  wavelet	
  stack.	
  	
  Upsized	
  for	
  each	
  layer	
  when	
  applied.	
  
	
  
•	
  Floa4ng	
  point	
  coding	
  is	
  automa4cally	
  adap4ve	
  to	
  gamma,	
  
	
  since	
  a	
  floa4ng	
  point	
  quan4za4on	
  scale	
  is	
  used	
  for	
  each	
  image	
  region	
  
	
  using	
  the	
  average	
  and	
  minimum	
  brightness	
  
	
  
•	
  YUV	
  encoding	
  takes	
  advantage	
  of	
  codec’s	
  unlimited	
  range	
  and	
  nega4ve	
  
	
  number	
  reproduc4on	
  to	
  support	
  full	
  ACES	
  gamut	
  and	
  dynamic	
  range	
  
	
  
Fron4ers	
  for	
  using	
  the	
  available	
  GPU	
  power:	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  •	
  Spectral	
  color	
  processing	
  to	
  improve	
  upon	
  CIE	
  1931	
  limita4ons	
  
	
  
	
  	
  	
  	
  	
  •	
  More	
  ODTs	
  to	
  take	
  advantage	
  of	
  new	
  HD	
  and	
  UHD	
  displays	
  
	
  and	
  new	
  projectors	
  and	
  projec4on	
  light	
  sources	
  as	
  they	
  increase	
  
	
  dynamic	
  range	
  and	
  gamut	
  
	
  
•	
  More	
  	
  processing	
  in	
  the	
  pipeline	
  
	
  -­‐	
  more	
  elaborate	
  sharpening	
  
	
  
	
  -­‐	
  dynamic	
  range	
  regional	
  contrast	
  adapta4on	
  
	
  -­‐	
  addi4onal	
  interac4ve	
  controls	
  
	
  -­‐	
  adapta4on	
  to	
  viewing	
  surround	
  (if	
  not	
  dark	
  surround)	
  
	
  
•	
  Addi4onal	
  work	
  on	
  the	
  RRT,	
  and	
  on	
  exis4ng	
  ODT	
  types	
  
	
  (in	
  conjunc4on	
  with	
  the	
  RRT	
  algorithmic	
  modifica4ons)	
  
	
  
Many	
  Thanks	
  to	
  the	
  AMD/ATI	
  FirePro	
  Professional	
  Graphics	
  Group	
  For	
  Their	
  Support	
  
Many	
  Thanks	
  to	
  AMD/APU	
  team	
  for	
  providing	
  4k	
  Display	
  
Thanks	
  also	
  to	
  R&S/DVS	
  
	
  
ACES	
  Overview:	
  
hhp://www.oscars.org/science-­‐technology/council/projects/pdf/ACESOverview.pdf	
  
Reference	
  papers	
  for	
  Gary	
  Demos:	
  
•	
  The	
  Unfolding	
  Merger	
  of	
  Television	
  and	
  Movie	
  Technology	
  
	
  SMPTE	
  Conference,	
  Oct	
  2012	
  
•	
  File	
  and	
  Folder	
  InteracMve	
  Decoding	
  
	
  SMPTE	
  Conference,	
  Oct	
  2011,	
  including	
  YouTube	
  Video:	
  
	
  hQp://www.youtube.com/watch?v=Ggt_8qseGtw	
  
•	
  Layered	
  MoMon	
  CompensaMon	
  SMPTE	
  Journal,	
  Jan	
  2009	
  
DISCLAIMER	
  &	
  ATTRIBUTION	
  

The	
  informa4on	
  presented	
  in	
  this	
  document	
  is	
  for	
  informa4onal	
  purposes	
  only	
  and	
  may	
  contain	
  technical	
  inaccuracies,	
  omissions	
  and	
  typographical	
  errors.	
  
	
  
The	
  informa4on	
  contained	
  herein	
  is	
  subject	
  to	
  change	
  and	
  may	
  be	
  rendered	
  inaccurate	
  for	
  many	
  reasons,	
  including	
  but	
  not	
  limited	
  to	
  product	
  and	
  roadmap	
  
changes,	
  component	
  and	
  motherboard	
  version	
  changes,	
  new	
  model	
  and/or	
  product	
  releases,	
  product	
  differences	
  between	
  differing	
  manufacturers,	
  soeware	
  
changes,	
  BIOS	
  flashes,	
  firmware	
  upgrades,	
  or	
  the	
  like.	
  AMD	
  assumes	
  no	
  obliga4on	
  to	
  update	
  or	
  otherwise	
  correct	
  or	
  revise	
  this	
  informa4on.	
  However,	
  AMD	
  
reserves	
  the	
  right	
  to	
  revise	
  this	
  informa4on	
  and	
  to	
  make	
  changes	
  from	
  4me	
  to	
  4me	
  to	
  the	
  content	
  hereof	
  without	
  obliga4on	
  of	
  AMD	
  to	
  no4fy	
  any	
  person	
  of	
  
such	
  revisions	
  or	
  changes.	
  
	
  
AMD	
  MAKES	
  NO	
  REPRESENTATIONS	
  OR	
  WARRANTIES	
  WITH	
  RESPECT	
  TO	
  THE	
  CONTENTS	
  HEREOF	
  AND	
  ASSUMES	
  NO	
  RESPONSIBILITY	
  FOR	
  ANY	
  
INACCURACIES,	
  ERRORS	
  OR	
  OMISSIONS	
  THAT	
  MAY	
  APPEAR	
  IN	
  THIS	
  INFORMATION.	
  
	
  
AMD	
  SPECIFICALLY	
  DISCLAIMS	
  ANY	
  IMPLIED	
  WARRANTIES	
  OF	
  MERCHANTABILITY	
  OR	
  FITNESS	
  FOR	
  ANY	
  PARTICULAR	
  PURPOSE.	
  IN	
  NO	
  EVENT	
  WILL	
  AMD	
  BE	
  
LIABLE	
  TO	
  ANY	
  PERSON	
  FOR	
  ANY	
  DIRECT,	
  INDIRECT,	
  SPECIAL	
  OR	
  OTHER	
  CONSEQUENTIAL	
  DAMAGES	
  ARISING	
  FROM	
  THE	
  USE	
  OF	
  ANY	
  INFORMATION	
  
CONTAINED	
  HEREIN,	
  EVEN	
  IF	
  AMD	
  IS	
  EXPRESSLY	
  ADVISED	
  OF	
  THE	
  POSSIBILITY	
  OF	
  SUCH	
  DAMAGES.	
  
	
  
ATTRIBUTION	
  
©	
  2013	
  Advanced	
  Micro	
  Devices,	
  Inc.	
  All	
  rights	
  reserved.	
  AMD,	
  the	
  AMD	
  Arrow	
  logo	
  and	
  combina4ons	
  thereof	
  are	
  trademarks	
  of	
  Advanced	
  Micro	
  Devices,	
  
Inc.	
  in	
  the	
  United	
  States	
  and/or	
  other	
  jurisdic4ons.	
  	
  SPEC	
  	
  is	
  a	
  registered	
  trademark	
  of	
  the	
  Standard	
  Performance	
  Evalua4on	
  Corpora4on	
  (SPEC).	
  Other	
  
names	
  are	
  for	
  informa4onal	
  purposes	
  only	
  and	
  may	
  be	
  trademarks	
  of	
  their	
  respec4ve	
  owners.	
  
24	
   |	
  	
  	
  PRESENTATION	
  TITLE	
  	
  	
  |	
  	
  	
  NOVEMBER	
  19,	
  2013	
  	
  	
  |	
  	
  	
  CONFIDENTIAL	
  

More Related Content

What's hot

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoAMD Developer Central
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...AMD Developer Central
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterAMD Developer Central
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...AMD Developer Central
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderAMD Developer Central
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbr Skip
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...AMD Developer Central
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningAMD Developer Central
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...AMD Developer Central
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...AMD Developer Central
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelAMD Developer Central
 

What's hot (20)

MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey PavlenkoMM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
MM-4097, OpenCV-CL, by Harris Gasparakis, Vadim Pisarevsky and Andrey Pavlenko
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
 
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
 
HSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben GasterHSA-4123, HSA Memory Model, by Ben Gaster
HSA-4123, HSA Memory Model, by Ben Gaster
 
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
PL-4044, OpenACC on AMD APUs and GPUs with the PGI Accelerator Compilers, by ...
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben SanderPT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
PT-4059, Bolt: A C++ Template Library for Heterogeneous Computing, by Ben Sander
 
Final lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tbFinal lisa opening_keynote_draft_-_v12.1tb
Final lisa opening_keynote_draft_-_v12.1tb
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
PL-4042, Wholly Graal: Accelerating GPU offload for Java/Sumatra using the Op...
 
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
MM-4104, Smart Sharpen using OpenCL in Adobe Photoshop CC – Challenges and Ac...
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe ClavelIS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
IS-4081, Rabbit: Reinventing Video Chat, by Philippe Clavel
 

Viewers also liked

画像からの倍率色収差の自動推定補正スライド
画像からの倍率色収差の自動推定補正スライド画像からの倍率色収差の自動推定補正スライド
画像からの倍率色収差の自動推定補正スライドdoboncho
 
Photographers 02
Photographers  02Photographers  02
Photographers 02Helga
 
H D R I Mages
H D R I MagesH D R I Mages
H D R I MagesHelga
 
紫色の研究 - 画像処理によるパープルフリンジ除去スライド
紫色の研究 - 画像処理によるパープルフリンジ除去スライド紫色の研究 - 画像処理によるパープルフリンジ除去スライド
紫色の研究 - 画像処理によるパープルフリンジ除去スライドdoboncho
 
対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文doboncho
 
High dynamic range video
High dynamic range videoHigh dynamic range video
High dynamic range videoHung Sun
 
理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ
理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ
理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップMayumi Ishikawa
 
紫色の研究 - 画像処理によるパープルフリンジ除去
紫色の研究 - 画像処理によるパープルフリンジ除去紫色の研究 - 画像処理によるパープルフリンジ除去
紫色の研究 - 画像処理によるパープルフリンジ除去doboncho
 
100枚の中から2枚を選ぶlightroom術
100枚の中から2枚を選ぶlightroom術100枚の中から2枚を選ぶlightroom術
100枚の中から2枚を選ぶlightroom術Tatsuya Iwama
 
Exploring HDR: Beyond the Single Exposure
Exploring HDR: Beyond the Single ExposureExploring HDR: Beyond the Single Exposure
Exploring HDR: Beyond the Single Exposureambientphoto
 
物理ベースの絵作りのための基礎
物理ベースの絵作りのための基礎物理ベースの絵作りのための基礎
物理ベースの絵作りのための基礎fumoto kazuhiro
 

Viewers also liked (11)

画像からの倍率色収差の自動推定補正スライド
画像からの倍率色収差の自動推定補正スライド画像からの倍率色収差の自動推定補正スライド
画像からの倍率色収差の自動推定補正スライド
 
Photographers 02
Photographers  02Photographers  02
Photographers 02
 
H D R I Mages
H D R I MagesH D R I Mages
H D R I Mages
 
紫色の研究 - 画像処理によるパープルフリンジ除去スライド
紫色の研究 - 画像処理によるパープルフリンジ除去スライド紫色の研究 - 画像処理によるパープルフリンジ除去スライド
紫色の研究 - 画像処理によるパープルフリンジ除去スライド
 
対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文対応点を用いないローリングシャッタ歪み補正と映像安定化論文
対応点を用いないローリングシャッタ歪み補正と映像安定化論文
 
High dynamic range video
High dynamic range videoHigh dynamic range video
High dynamic range video
 
理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ
理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ
理想の写真に近づく!RAW現像ビフォー/アフター実践ワークショップ
 
紫色の研究 - 画像処理によるパープルフリンジ除去
紫色の研究 - 画像処理によるパープルフリンジ除去紫色の研究 - 画像処理によるパープルフリンジ除去
紫色の研究 - 画像処理によるパープルフリンジ除去
 
100枚の中から2枚を選ぶlightroom術
100枚の中から2枚を選ぶlightroom術100枚の中から2枚を選ぶlightroom術
100枚の中から2枚を選ぶlightroom術
 
Exploring HDR: Beyond the Single Exposure
Exploring HDR: Beyond the Single ExposureExploring HDR: Beyond the Single Exposure
Exploring HDR: Beyond the Single Exposure
 
物理ベースの絵作りのための基礎
物理ベースの絵作りのための基礎物理ベースの絵作りのための基礎
物理ベースの絵作りのための基礎
 

Similar to MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos

VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!Linaro
 
PowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDAPowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDAAlexander Grudanov
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linuxmountpoint.io
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
IDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking SessionIDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking SessionHWBOT
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs systèmeLudovic Piot
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVELinaro
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 

Similar to MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos (20)

VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep DiveVMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!BKK16-103 OpenCSD - Open for Business!
BKK16-103 OpenCSD - Open for Business!
 
PowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDAPowerDRC/LVS 2.0.1 released by POLYTEDA
PowerDRC/LVS 2.0.1 released by POLYTEDA
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
IDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking SessionIDF'16 San Francisco - Overclocking Session
IDF'16 San Francisco - Overclocking Session
 
pps Matters
pps Matterspps Matters
pps Matters
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
Shak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-finalShak larry-jeder-perf-and-tuning-summit14-part1-final
Shak larry-jeder-perf-and-tuning-summit14-part1-final
 
Porting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVEPorting and Optimization of Numerical Libraries for ARM SVE
Porting and Optimization of Numerical Libraries for ARM SVE
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 

More from AMD Developer Central

Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14AMD Developer Central
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14AMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...
 
Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14Mantle - Introducing a new API for Graphics - AMD at GDC14
Mantle - Introducing a new API for Graphics - AMD at GDC14
 
Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14Direct3D and the Future of Graphics APIs - AMD at GDC14
Direct3D and the Future of Graphics APIs - AMD at GDC14
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 

Recently uploaded

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

MM-4105, Realtime 4K HDR Decoding with GPU ACES, by Gary Demos

  • 1. REALTIME  4K  HDR  DECODING  WITH  GPU  ACES   GARY  DEMOS   IMAGE  ESSENCE  LLC  
  • 2.      4k  Real4me  (24fps  2D)  Image  Bandwidth   •  Exr  half-­‐float  (e.g.  ACES/OCES)  or  16-­‐bit  unsigned  short  integers:    -­‐  2Bytes/col  x  3cols(RGB)  x  4096  x  2160  x  24fps  =  1.27GBytes/sec  =  10.2gbps   •  32-­‐bit  floats  (used  inside  OpenCL  in  the  GPU  and  within  most  CPU  decoding  steps):    -­‐  4Bytes/col  x  3cols(RGB)  x  4096  x  2160  x  24fps  =  2.54GBytes/sec  =  20.4gbps   •  10-­‐bit  dpx-­‐packed  pixels:   -­‐  4Bytes/3cols  x  3cols(RGB)  x  4096  x  2160  x  24fps  =  .85GBytes/sec  =  6.8gbps  
  • 3. Future  Fron4ers   •  2Bytes/col  x  3cols(RGB)  x  4096  x  2160  x  60fps  =  3.19GBytes/sec  =  25.5gbps   •  2Bytes/col  x  3cols(RGB)  x  4096  x  2160  x  120fps  =  6.37GBytes/sec  =  51.0gbps   •  2Bytes/col  x  3cols(RGB)  x  8192  x  4320  x  24fps  =  5.10GBytes/sec  =  40.8gbps   •  2Bytes/col  x  3cols(RGB)  x  8192  x  4320  x  120fps  =  25.48GBytes/sec  =  203.8gbps   •  3D  any  of  the  above  x2      
  • 4. •  DisplayPort  1.2  goes  up  to  20gbps     •  A  W9000  has  six  DisplayPort  1.2  outputs     •  The  demonstra4on  system  has  four  W9000’s     •  That’s  24  DisplayPort  1.2  outputs!     •  Total  available  pixel  output  is  24  x  20gbps  =  480gbps    
  • 5.    •  That’s  more  than:        -­‐  2x  (3D)  2Bytes/col  x  3cols(RGB)  x  8192  x  4320  x  120fps  =  51.0GBytes/sec  =  407.7gbps!     -­‐  Could  work  up  to  this  in  an  array  of  displays     •  S4ll  a  few  issues  (at  least  for  this  author):    -­‐  Locking  playback  speed  with  pixels  from  CL    -­‐  Synchronizing  audio    
  • 6. Real4me  Floa4ng  Point  ACES  Decoding     Including  Real4me  Interac4ve  Adjustment  and        RRT/ODT  in  the  GPU     2x  Intel  E5-­‐2690  CPUs   Compressed   Bidiles   (SATA  FlashRam)   4k   Real4me   10/12-­‐bits   RGB   DVS   Atomix   Floa4ng   Point   Decoding   ACES   Packed  Pixels   Ready  for  Display            Fifo  of  Frames   For  Smooth  Playout   4x  FirePro  W9000s   GPU  Processing  in  OpenCL     •  Sharpen/soeen  spa4al  filter   •  Transform  to  P3  Colorspace   •  ASC  CDL  adjustments   •  Transform  back  to  ACES   •  RRT  and  ODT  in  3D  LUT   •  Fix  and  pack  pixels  
  • 7.              CPU  Par44oning     •  Running  Scien4fic  Linux  6.4   •  Relying  on  a  fifo-­‐of-­‐frames  in  the  DVS  Atomix  using  the  FIFO-­‐API    to  smooth  out  the  non-­‐real4me  ahributes  of  Linux   •  Mul4ple  decoder  processes  forked  at  startup   •  Compressed  bidiles  are  retrieved  by  each  process  from  SATA  FlashRAM/SSD   •  The  number  of  decoder  processes  is  selected  at  run4me  startup    (tuned  for  performance  and  available  memory)  
  • 8. CPU  Par44oning  (cont.)     •  Parent  process  becomes  display  process   •  Display  process  creates  shared  memory  and  sends  semaphores    to  decoder  processes  that  buffers  are  available   •  Each  decoder  process  creates  a  frame  or  range  of  frames   •  A  display  process  manages  shared  memory  and  DMA    to/from  GPU’s  and  DVS  Atomix   •  Display  process  tells  decoder  processes  when  buffers  again    become  available  
  • 9.      GPU  Par44oning:     •  numDevices  OpenCL  call  provides  the  number  of  GPU’s  available     •  Ver4cal  screen  height  par44oned  into  numDevices     •  Four  Firepro  W9000  GPUs  in  this  demonstra4on  system     •  All  GPUs  share  a  common  “context”  and  associated  “kernels”    (one  CL  interpret)     •  Each  of  the  four  GPUs  given  a  “command_queue”  and  separate    “cl_mem”  buffers    
  • 10.          GPU  Par44oning  (cont.)     •  Kernel  args  for  each  cl_mem  are  updated  for  each  of  the  four  GPUs  before      invoking  the  kernel  with  that  GPU’s  command_queue     •  Each  GPU  given  1/4  of  screen  height  EnqueuedWrites  of  half-­‐float  ACES     •  Each  GPU’s  packed  pixels  retrieved  into  appropriate  quarter    of  screen  height  via  EnqueuedReads  of  packed  pixels     •  Double-­‐buffered  DMA  (getbuffer/putbuffer)  to  DVS  Atomix  using    FIFO  API  (fifo  of  frames  helps  smooth  linux  non-­‐real4me  aspects    yielding  real4me)    
  • 11.          OpenCL  Code:   •  Macros  are  used  for  all  math   •  For  CPU  code,  “.h”  files  are  included  and  macros  invoked   •  For  GPU  code,  cl  includes  the  same  “.h”  files,  and  macros  invoked  with    each  cl  kernel   •  Macros  separated  into  various  types:        -­‐  Interac4on  processing,  ACES  to/from  P3  and  ASC_CDL  applied  in  P3      -­‐  RRT  (Reference  Rendering  Transform)  processing,        using  LUT  (faster  but  less  accurate,  real4me  at  4k)  or      direct  computa4on  (slower  but  highly  accurate,  real4me  at  2k)      -­‐  ODT  (Output  Device  Transform)  processing,  for  the  type        of  ODT  selected  
  • 12.    OpenCL  Code  (cont.)     •  Final  step  in  cl  is  32-­‐bit  floats  to  fix,  and  RGB  packing  (either  10bits  or  16bits),    adding  +-­‐1/2lsb  noise  dither   •  OpenCL  does  not  include  a  random  number  intrinsic,  so  random  numbers    for  dithering  are  DMA’d  up  to  the  GPU  for  use  in  noise  dither,  using  a    randomizing  func4on  of  frame  number  and  scanline  
  • 13.                        Reasons  for  liking  OpenCL:     •  Support  for  DEVICE_TYPE_CPU  as  well  as  DEVICE_TYPE_GPU   •  Vendor  independence   •  Portability   •  Easily  extended  to  automa4cally  u4lize  mul4ple  GPU’s  by  seqng  up    mul4ple  command  queues  based  upon  number  of  devices  detected  at  run4me   •  Run4me  interpret  is  oeen  convenient   •  Excellent  descrip4on  of  expected  precision  for  math  intrinsic  func4ons   •  Strong  support  for  both  32-­‐bit  and  64-­‐bit  floa4ng  point  
  • 14.      Reasons  for  liking  OpenCL  (cont.)     •  Well-­‐thought-­‐out  device  and  system  query  capabili4es   •  getGlobalID  provides  an  excellent  mechanism  for  parallelism      without  requiring  further  considera4on  of  lower  level  hardware  organiza4on   •  Easy  specifica4on  of  global,  constant,  and  local  datatypes   •  Pipelining  control  via  blocking  and  non-­‐blocking  read  and  write  queues    and  via  clFinish  and  kernel  barriers   •  First-­‐class  support  of  half-­‐float  using  vload_half  and  vstore_half  
  • 15.      Weaknesses  of  OpenCL  (aka  “wish  list”):     •  Difficult  to  obtain  visibility  during  debugging    (although  print  statements  available  on  some  systems  with  DEVICE_TYPE_CPU)     •  No  detail  provided  by  “out  of  resources”  error    (e.g.  what  resources  are  we  out  of?)    
  • 16.                                        Weaknesses  of  OpenCL  (aka  “wish  list”,  cont.):                      •  Lack  of  visibility  during  performance  tuning    -­‐  How  much  4me  is  being  spent  in  read/write  queues  to/from  CPU?    -­‐  How  full  are  global  and  constant  memory?    -­‐  How  much  global  memory  bandwidth  is  being  u4lized?    -­‐  How  full  are  registers?    -­‐  If  caches  are  present,  how  effec4ve  are  they  on  a  given  kernel?      -­‐  Are  there  unnecessary  waits  that  could  be  async  overlapped?       •  The  4,  8,  16  CL  SIMD  types  are  not  mirrored  in  CPU  SSE/AVX/F16  intrinsics.    -­‐  Were  they  to  be  iden4cal,  they  could  be  used  in  macros  that        are  included  in  common  between  CL  kernels  and  CPU  threads    
  • 17. System  Performance:                                            •  Limited  by  memory  and  bus  bandwidth  issues                  •  DirectGMA  will  improve  this     •  Plenty  of  GPU  power  s4ll  available  for  real4me  4k  processing    when  using  3D  LUT  RRT/ODT     •  CPU  power  sufficient  for  wavelet-­‐only  floa4ng  point  decoding    at  4k     •  CPU  power  sufficient  for  mo4on-­‐compensated  flowfield  sinc-­‐and-­‐wavelet    full  configura4on  at  2k.    Speed  is  about  1/3  real4me  at  4k.     •  With  threads  and  forked  processes,  will  be  able  to  take  advantage    of  an4cipated  major  increase  in  computa4onal  cores  
  • 18. CL/GL  Interop  Explora4on:                                  •  Using  X11  on  Linux  (no  glut  support)                  •  Get  10-­‐bit  depth  at  setup  from  X11  (as  configura4on  using  GLXChooseFBConfigs)          •  Uses  GL,  GLX,  and  CL/GL  context  (some  of  this  is  recent,  as  of  CL  1.2)   •  Improves  (reduces)  memory  transfer  amount  required  by  direct  output  from  GPU   •  Can  take  over  the  screen  (using  X11  XChangeProperty)   •  Relies  on  “FrameBufferObject”  and  “Acquire”  and  “Release”  by  CL      (Release  by  CL  implies  re-­‐acquire  by  GL,  must  CLFinish  and  GLFinish  correspondingly)   •  Can  support  4k  at  10bits  via  DisplayPort  1.2  (and  HDMI  1.4a  via  DP  to  HDMI  dongle)   •  Reportedly  can  be  used  with  MacOSX  and  Windows  (with  X11-­‐style  constructs)  
  • 19.    CL/GL  Interop  Weaknesses:   •  Limited  to  single  GPU  for  CL  when  using  a  CL/GL  FBO      -­‐  Would  be  nice  to  have  separate  FBO  quadrant  output  from  each  of  the  four  GPU’s   •  Not  smooth  4ming  if  GPU  running  near  capacity   •  No  locked  audio  sync   •  No  “fifo-­‐of-­‐frames”  to  smooth  out  the  non-­‐smooth-­‐non-­‐real4me  Linux  behavior      -­‐  Working  on  using  cl_gl  event  sync  to  simulate  this  
  • 20. Ahributes  of  the  floa4ng  point  codec     •  Layered  with  5  layers  up  to  base  layer  at  1k  using  wavelets     •  Two  more  layers  from  1k  to  2k  and  2k  to  4k  built  with  sinc  filters,    using  wavelet  stacks  to  code  the  up-­‐res  deltas     •  Base  and  up-­‐res  layers  can  be  mo4on  compensated  (sinc  filter    is  phase-­‐neutral  and  sub-­‐pixel  displacement  precision  to  1/100  pixel)  
  • 21.          Floa4ng  Point  Codec  (cont.)     •  Flowfield  is  used  at  low  resolu4on  for  mo4on  displacement,    coded  also  as  wavelet  stack.    Upsized  for  each  layer  when  applied.     •  Floa4ng  point  coding  is  automa4cally  adap4ve  to  gamma,    since  a  floa4ng  point  quan4za4on  scale  is  used  for  each  image  region    using  the  average  and  minimum  brightness     •  YUV  encoding  takes  advantage  of  codec’s  unlimited  range  and  nega4ve    number  reproduc4on  to  support  full  ACES  gamut  and  dynamic  range    
  • 22. Fron4ers  for  using  the  available  GPU  power:                                  •  Spectral  color  processing  to  improve  upon  CIE  1931  limita4ons              •  More  ODTs  to  take  advantage  of  new  HD  and  UHD  displays    and  new  projectors  and  projec4on  light  sources  as  they  increase    dynamic  range  and  gamut     •  More    processing  in  the  pipeline    -­‐  more  elaborate  sharpening      -­‐  dynamic  range  regional  contrast  adapta4on    -­‐  addi4onal  interac4ve  controls    -­‐  adapta4on  to  viewing  surround  (if  not  dark  surround)     •  Addi4onal  work  on  the  RRT,  and  on  exis4ng  ODT  types    (in  conjunc4on  with  the  RRT  algorithmic  modifica4ons)    
  • 23. Many  Thanks  to  the  AMD/ATI  FirePro  Professional  Graphics  Group  For  Their  Support   Many  Thanks  to  AMD/APU  team  for  providing  4k  Display   Thanks  also  to  R&S/DVS     ACES  Overview:   hhp://www.oscars.org/science-­‐technology/council/projects/pdf/ACESOverview.pdf   Reference  papers  for  Gary  Demos:   •  The  Unfolding  Merger  of  Television  and  Movie  Technology    SMPTE  Conference,  Oct  2012   •  File  and  Folder  InteracMve  Decoding    SMPTE  Conference,  Oct  2011,  including  YouTube  Video:    hQp://www.youtube.com/watch?v=Ggt_8qseGtw   •  Layered  MoMon  CompensaMon  SMPTE  Journal,  Jan  2009  
  • 24. DISCLAIMER  &  ATTRIBUTION   The  informa4on  presented  in  this  document  is  for  informa4onal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informa4on  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  soeware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obliga4on  to  update  or  otherwise  correct  or  revise  this  informa4on.  However,  AMD   reserves  the  right  to  revise  this  informa4on  and  to  make  changes  from  4me  to  4me  to  the  content  hereof  without  obliga4on  of  AMD  to  no4fy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combina4ons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdic4ons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  Evalua4on  Corpora4on  (SPEC).  Other   names  are  for  informa4onal  purposes  only  and  may  be  trademarks  of  their  respec4ve  owners.   24   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL