MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi
Upcoming SlideShare
Loading in...5
×
 

MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi

on

  • 1,920 views

Presentation MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi at the AMD Developer Summit (APU13) November 11-13, 2013.

Presentation MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi at the AMD Developer Summit (APU13) November 11-13, 2013.

Statistics

Views

Total Views
1,920
Views on SlideShare
1,920
Embed Views
0

Actions

Likes
0
Downloads
26
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi Presentation Transcript

    • OPTIMIZING  FFMPEG  AND  HANDBRAKE   USING  OPENCL   SRIKANTH  GOLLAPUDI  &  MICHAEL  WOOTTON  
    • FFMPEG   INTRODUCTION   !  FFMPEG  is  a  very  popular  open  source  mulLmedia  soNware  library  used  to  record,  convert  and  stream   Audio  &  Video.   !  Used  by  popular  OpenSource  projects  like  Handbrake,  VLC  player,  Chrome  etc.   !  Single  stop  soluLon  for   ‒  Decoding  different  codec  formats  (Audio  &  Video)   ‒  Handling  various  container  formats  (mp4,  wmv,  avi,  m2ts,  m2ps  etc.)   ‒  Encoding  to  popular  Video  &  Audio  codec  formats  (H.264,  VC-­‐1,  Mpeg2  etc.)   ‒  Different  video  filtering  algorithms  (Deshake,  Scale,  Unsharp  etc.)   ‒  Managing  different  pixel  formats  (NV12,  RGB,  YV12  etc.)   ‒  Cross-­‐placorm  support  (Windows  and  Linux)   2   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • FFMPEG  –  TYPICAL  USAGE  SCENARIO  AND  PROCESSING  INVOLVED    Imagine  a  video  edit  using  FFMPEG   Video        Decode   3   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL   Video  shake          removal   Sharp/Blur      Scale   Video        Encode  
    • FFMPEG  –  TYPICAL  USAGE  SCENARIO  AND  PROCESSING  INVOLVED    Imagine  a  video  edit  using  FFMPEG   Video        Decode   Video  shake          removal   Sharp/Blur      Scale   GPU   HW  Decoder   CPU   AMD  APU   HETEROGENEOUS  SOLUTION   4   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL   Video        Encode   HW  Encoder  
    • FFMPEG  –  SCOPE  FOR  ACCELERATION   Leverage  Heterogeneous  compute   !  Accelerate  Video  Decode  and  Encode  using  HW  accelerators   ‒  Load  on  CPU  to  perform  decode  and  encode  is  taken  off   ‒  Power  savings  =>  longer  baiery  life   !  Accelerate  Video  Processing  filter  using  GPU   ‒  Increased  performance  compared  to  CPU  implementaLon   ‒  ApplicaLon  runs  at  higher  fps   ‒  Possible  to  apply  more  filters  to  achieve  beier  video  quality   !  Use  CPU  for  Serial  processing  and  control   ‒  Efficient  usage  of  resources     5   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • FFMPEG  –  OUR  WORK     !  AMD  and  MulLcoreware  Inc.,  worked  on  acceleraLng  FFMPEG     !  Enable  usage  of  Hardware  decoder     ‒  To  support  decoding  of  H.264,  VC-­‐1,  MPEG2  and  Mpeg4  pt2  codecs   ‒  Windows   ‒ IntegraLon  of  DXVA2  API  to  ffmpeg.exe   ‒ DXVA2  funcLonality  already  available  in  ffmpeg’s  libavcodec  library   ‒ Extremely  difficult  for  applicaLon  developers  to  make  use  of  DXVA2  API  in  libavcodec   ‒  Needs  deep  understanding  of  DXVA2  API  and  specific  codec  level  knowledge   ‒ Coded  up  all  the  necessary  steps  needed  to  use  HW  decoder  using  DXVA2  in  ffmpeg.exe  app   ‒ Created  a  command  line  opLon  for  ffmpeg.exe  to  enable  usage  of  HW  assisted  decode   !  Make  use  of  DirectX(R)  9  to  OpenCLTM  interop  APIs  available  in  OpenCL1.2TM   ‒  This  ensures  the  decoded  frame  is  retained  in  GPU  memory  and  passed  on  to  OpenCLTM  filter     6   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • FFMPEG  –  OUR  WORK     !  Introduced  OpenCLTM  in  ffmpeg   ‒  Created  OpenCLTM    infrastructure  in  libavuLl  to  enable  usage  of  OpenCLTM    in  ffmpeg     !  AcceleraLon  of  Video  processing  filters  on  GPU  using  OpenCLTM   ‒  Added  OpenCLTM    implementaLon  for  the  following  filters  in  libavfilter   ‒  Deshake  -­‐  This  filter  helps  remove  camera  shake  from  hand-­‐holding  a  camera,  moving  on  a  vehicle,  etc.   ‒  Unsharp  -­‐  Sharpen  or  blur  the  input  video   ‒  Scale  -­‐  Scale  (resize)  the  input  video   ‒  Denoise  -­‐  High  precision/quality  3d  denoise  filter.  This  filter  aims  to  reduce  image  noise  producing  smooth  images   ‒  Yadif  -­‐  Deinterlace  the  input  video   ‒  Lnterlace  -­‐  temporal  field  interlacing   ‒  Gradfun  -­‐  Fix  the  banding  arLfacts  introduced  by  truncaLon  to  8bit  color  depth   !  OpLmizaLon  of  ffmpeg  pipeline  to  run  decode,  filters  &  encode  in  parallel     7   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • FFMPEG  –  PERFORMANCE   " Performance  numbers  of  transcode  pipeline  using  ffmpeg  on  A10-­‐6800K  APU   Accelerated  ffmpeg     55   60   57   Original  ffmpeg  (CPU)   FPS   50   29   40   30   22   20   10   1.3   0   8   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL   23   16   1.2  
    • FFMPEG  –  STATUS     !  Ffmpeg  2.0  contains  OpenCL  work   ‒  OpenCL  framework  in  libavuLl   ‒  Deshake  and  unsharp  OpenCL  implementaLons  in  libavfilter   !  DXVA2  patch  is  under  review   !  Further  OpLmizaLons  and  tuning  in  progress  for  other  filters.   9   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • FFMPEG  –  CHALLENGES     !  Introducing  OpenCL  into  ffmpeg   ‒  Reviewers  were  not  well  versed  with  OpenCL   !  Retaining  data  on  GPU  memory  in  the  pipeline   ‒  Ffmpeg  soNware  architectural  changes  needed  for  this   !  RecompilaLon  of  kernels  on  every  run   ‒  Ffmpeg  does  not  allow  saving  compiled  binary  files  on  local  machine   !  Ffmpeg  soNware  needs  pipeline  level  opLmizaLons  to  take  benefit  of  heterogeneous  placorm   10   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • FFMPEG  –  FUTURE  WORK     !  Add  support  for  HW  assisted  encode  (H.264)   ‒  AMD  is  going  to  give  out  C++  API  to  access  HW  Encoder  called  AMF   ‒  More  details  available  in  the  talk  tomorrow    Innova'ng  with  AMD  Mul'media  Technologies  (MM-­‐4095)     !  OpLmize  OpenCL  implementaLon  of  filters  for  beier  performance   !  Explore  using  HSA  features  to  boost  performance   !  OpLmize  memory  transfers     ‒  Retain  buffers  on  device  memory  across  Decode,  Filter  and  Encode  modules   11   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • Handbrake   Improvements  
    • WHAT  IS  HANDBRAKE?   !  Open  Source  Video  Transcoder   !  Converts  videos  from  most  popular  format   !  Selectable  output  format  and  bitrates   !  Video  Resizing   !  Video  Filters   ‒ Deinterlacing   ‒ Decomb   ‒ Deblock   ‒ Grayscale   ‒ Cropping   13   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • CURRENT  ENHANCEMENTS   !  Hardware  Video  Decode   ‒ Input  video  decoded  via  DXVA2   ‒ ULlizes  UVD  on  AMD  GPUs  and  APUs   !  OpenCL™  accelerated  Video  ResoluLon  changes   ‒ Video  Frames  are  resized  using  OpenCL  kernels   ‒ Example:  1080p  converted  to  720p   14   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • IMPROVING  OPENCL  SCALING   !  The  OpenCL  Scaling  Enhancement  was  under-­‐performing   !  IdenLfied  Issues:   ‒ Image  format  conversion   ‒ Buffer  staging   ‒ Separable  Scaling  using  two  kernels   15   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • OPENCL  SCALING  IMPROVEMENTS   Reduce  Memory  Copies:     !  Modify  the  exisLng  HandBrake  buffer  system   !  IdenLfy  which  buffers  will  contain  video  data  (vs.  audio,  capLons,  etc.)   !  Video  buffers  are  allocated  out  of  pinned  Host  Memory   !  Non-­‐OpenCL  aware  code  writes  data  to  the  correct  place   !  Kernels  can  directly  read/write  the  buffers  via  Zero  Copy   16   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • OPENCL  SCALING  IMPROVEMENTS   Switch  to  a  Single  Kernel:     !  Eliminate  the  two  kernel  approach   !  Process  blocks  of  data  rather  than  lines   !  Support  HandBrake  naLve  image  packing   !  Use  LDS  to  further  reduce  Global  Memory  accesses   17   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • RESULTS   !  The  single  kernel  completes  quickly   !  No  extra  memory  copies  are  required   !  Kernel  execuLon  Lme  to  scale  one  frame  (1080p  -­‐>  720p)*   ‒ AMD  A10-­‐6800K  –  2.4  ms   ‒ AMD  HD7750  –  1.0  ms   !  ApplicaLon  Performance  on  A10-­‐6800K    Feature   Performance  (FPS)   Improvement  over  SW   SoNware   36.08   0.0   Scaling   39.64   9.9%   UVD   40.53     12.3%   Scaling  +  UVD   44.95   23.9%   *  All  Lmes  measured  on  a  development  system   18   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • THANK  YOU   QuesLons   19   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL  
    • DISCLAIMER  &  ATTRIBUTION   The  informaLon  presented  in  this  document  is  for  informaLonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informaLon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  soNware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaLon  to  update  or  otherwise  correct  or  revise  this  informaLon.  However,  AMD   reserves  the  right  to  revise  this  informaLon  and  to  make  changes  from  Lme  to  Lme  to  the  content  hereof  without  obligaLon  of  AMD  to  noLfy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combinaLons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdicLons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  EvaluaLon  CorporaLon  (SPEC).  Other   names  are  for  informaLonal  purposes  only  and  may  be  trademarks  of  their  respecLve  owners.   20   |      PRESENTATION  TITLE      |      November  19,  2013      |      CONFIDENTIAL