0
X265	
  OPEN	
  SOURCE	
  H.265	
  ENCODER	
  
OPTIMIZATION	
  DETAILS	
  
X265	
  OPEN	
  SOURCE	
  H.265	
  ENCODER	
  
OPTIMIZATION	
  DETAILS	
  
	
  
HEVC	
  wrinkles	
  
H.265/HEVC	
  FINALIZED	
  JANUARY	
  25,	
  2013	
  
NOTABLE	
  CHANGES	
  FROM	
  H.264	
  

!  H.264’s	
  16x16	
  macr...
H.265/HEVC	
  FINALIZED	
  JANUARY	
  25,	
  2013	
  
NOTABLE	
  CHANGES	
  FROM	
  H.264	
  

!  More	
  intra	
  predic]...
H.265/HEVC	
  PARALLELIZATION	
  CONSIDERATIONS	
  
NOTABLE	
  CHANGES	
  FROM	
  H.264	
  

!  WaveFront	
  Parallel	
  P...
H.265/HEVC	
  PARALLELIZATION	
  CONSIDERATIONS	
  
THE	
  FINE	
  PRINT	
  

!  Larger	
  block	
  sizes	
  reduce	
  the...
Introducing	
  
x265	
  
X265	
  –	
  A	
  SHORT	
  HISTORY	
  
!  x265	
  Consor]um	
  founded	
  in	
  April	
  of	
  2013	
  	
  
‒  Dual	
  com...
X265	
  –	
  A	
  SHORT	
  HISTORY	
  
!  Ecosystem	
  
‒  Licensed	
  to	
  reuse	
  x264	
  source	
  code	
  and	
  alg...
Encoding	
  and	
  
GPUs	
  
GPU	
  CONSIDERATIONS	
  
A	
  SAD	
  HISTORY	
  

!  Historically,	
  GPUs	
  have	
  been	
  poor	
  for	
  video	
  enc...
APU	
  CONSIDERATIONS	
  

A	
  WELL	
  BALANCED	
  COMPUTE	
  PROCESSOR	
  

!  Heterogeneous	
  architecture	
  
‒  GPU	...
DISCLAIMER	
  &	
  ATTRIBUTION	
  

The	
  informa]on	
  presented	
  in	
  this	
  document	
  is	
  for	
  informa]onal	...
Upcoming SlideShare
Loading in...5
×

MM-4096, x265: Open Source H.265/HEVC Video Encoder, by Steve Borho

2,747

Published on

Presentation MM-4096 by Steve Borho at the AMD Developer Summit (APU13) November 11-13, 2013.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,747
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
103
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "MM-4096, x265: Open Source H.265/HEVC Video Encoder, by Steve Borho"

  1. 1. X265  OPEN  SOURCE  H.265  ENCODER   OPTIMIZATION  DETAILS  
  2. 2. X265  OPEN  SOURCE  H.265  ENCODER   OPTIMIZATION  DETAILS    
  3. 3. HEVC  wrinkles  
  4. 4. H.265/HEVC  FINALIZED  JANUARY  25,  2013   NOTABLE  CHANGES  FROM  H.264   !  H.264’s  16x16  macroblocks  replaced  with  64x64  CUs  and  QuadTrees   ‒  Coding    QuadTree  can  be  recursively  split  down  to  8x8  blocks   ‒  At  all  levels,  the  coding  blocks  can  chose  inter  or  intra  predic]on   ‒  The  final  coding  blocks  can  be  further  split     ‒  The  residual  is    signaled  in  a  second  QuadTree  which  can  have  more  depth  than  the  coding  QT   !  Inter  predic]on  has  more  accuracy   ‒  HPEL  filter  has  8-­‐taps,  QPEL  has  7-­‐taps.    (H.264  has  6-­‐tap  HPEL  and  avg  QPEL)   ‒  Merge  candidates  replace  direct  and  skip  H.264  modes   ‒  AMVP  allows  mo]on  predic]on  to  be  selected  from  a  list,  in  H.264  it  was  en]rely  implicit   4   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  5. 5. H.265/HEVC  FINALIZED  JANUARY  25,  2013   NOTABLE  CHANGES  FROM  H.264   !  More  intra  predic]ons   ‒  DC  and  planar  modes,  similar  to  H.264   ‒  33  angular  predic]ons  with  emphasis  on  near-­‐ver]cal  and  near-­‐horizontal  angles   ‒  35  predic]ons  in  total  (for  all  block  sizes  from  32x32  to  4x4)  but  few  special  cases   !  Sample  Adap]ve  Offset  loop  filter  for  reduced  compression  ar]facts   5   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  6. 6. H.265/HEVC  PARALLELIZATION  CONSIDERATIONS   NOTABLE  CHANGES  FROM  H.264   !  WaveFront  Parallel  Processing   ‒  Each  row  of  largest  CU  blocks  can  be  encoded  in  parallel,  with  a  two  block  lag  to  row  above   ‒  The  CABAC  state  of  block  2  is  communicated  to  block  0  of  row  below   ‒  <1%  loss  of  compression  efficiency,  much  more  efficient    than  slices  or  ]les   !  Tiles  –  split  each  frame  into  regular  rectangular  parts,  encode  each  in  parallel   !  Deblocking  only  on  8x8  boundaries,  and  beler  ordering  of  opera]ons   6   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  7. 7. H.265/HEVC  PARALLELIZATION  CONSIDERATIONS   THE  FINE  PRINT   !  Larger  block  sizes  reduce  the  effec]veness  of  frame  parallelism   ‒  Only  a  quarter  of  the  available  block  rows  as  H.264  for  the  same  resolu]on  video   ‒  Aner  accoun]ng  for  deblocking,  and  SAO  there  is  a  three  row  (192  line)  lag  between  references   ‒  Wavefront  analysis  or  ]les  must  be  used  in  conjunc]on  with  frame  parallelism  to  make  up  for  this   ‒  High  percentage  of  B  frames  to  P  frames  alleviates  this  bolleneck   !  Large  blocks  increase  serial  opera]ons,  add  longer  data  dependencies   ‒  Each  CU  in  the  quad-­‐tree  must  be  analyzed  in  Z-­‐scan  order   ‒  Since  each  CU  can  chose  intra,  all  prior  blocks  must  generate  recon  pixels  –  no  shortcuts   ‒  Varia]ons  in  CU  encode  ]mes  reduce  the  effec]veness  of  wavefront  analysis  by  causing  stalls   7   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  8. 8. Introducing   x265  
  9. 9. X265  –  A  SHORT  HISTORY   !  x265  Consor]um  founded  in  April  of  2013     ‒  Dual  commercial  and  GPLv2+  license   ‒  Development    primarily  centered  in  Chennai,  India  with  contribu]ons  from  China  and  US   ‒  Started  from  the  HEVC  reference  encoder  (HM),  less  than  half  of  HM  source  remains  today   ‒  Achieved  1080p  15fps  in  June   ‒  Public    announcement  and  first  open  source  release  in  July   !  Op]miza]ons   ‒  WPP  wavefront  CTU  analysis  and  frame  parallelism   ‒  Compiler  intrinsic  SIMD  based  performance  primi]ves   ‒  Hand-­‐wrilen  assembly  performance  primi]ves   ‒  Data  flow  improvements,  early  outs,  RDO  reduc]ons   !  Today   ‒  1080p@30fps  or  720p@200fps  on  16-­‐core  SandyBridge  Xeon   9   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  10. 10. X265  –  A  SHORT  HISTORY   !  Ecosystem   ‒  Licensed  to  reuse  x264  source  code  and  algorithms   ‒  Open  development  on  mailing  list  and  IRC   ‒  Public  repositories  on  Bitbucket  and  VideoLan.org   ‒  Integra]on  into  VLC,  libav,  ffmpeg,  and  Handbrake  in  various  stages  of  comple]on   !  x264  feature  adop]on   ‒  Lookahead  /  slicetype  decision  and  scene  cut  detec]on   ‒  Mo]on  es]ma]on  and  bitcost  func]ons   ‒  CLI  interface  and  public  C  interface   ‒  Assembly  primi]ves  for  SAD,  SATD,  SSD,  etc   ‒  ABR  and  CRF  rate  control  –  VBV  adop]on  in  progress  by  O/S  contributor   !  It  took  eight  years  for  x264  to  dominate  H.264  encoding  market   ‒  We  would  like  to  achieve  dominance  in  the  HEVC  market  sooner   10   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  11. 11. Encoding  and   GPUs  
  12. 12. GPU  CONSIDERATIONS   A  SAD  HISTORY   !  Historically,  GPUs  have  been  poor  for  video  encoding   ‒  Intra  predic]on  requires  blocks  above  and  to  the  len  to  be  fully  encoded  and  decoded   ‒  Inter  predic]on  requires  blocks  above  and  to  the  len  to  be  fully  analyzed   ‒  Rate  distor]on  op]miza]ons  require  all  blocks  to  be  encoded  in  scan  order   ‒  Together,  these  dependencies  severely  limit  the  amount  of  parallelism  that  can  be  exposed  to  the  GPU   !  Encoder  data  dependencies  are  complex   ‒  Copying  data  to  and  from  GPU  device  memory  generally  outweighs  any  performance  improvements   ‒  Even  zero  copy  memory  is  insufficient,  the  CPU  and  GPU  must  share  structures  at  full  speed   !  Previous  alempts  at  GPU  encoding  take  short  cuts   ‒  One  can  ignore  some  of  these    dependencies  at  the  cost  of  compression  efficiency  and  quality   ‒  In  x264,  we  only  used  the  GPU  for  lookahead  analysis  that  has  no  intra  and  RDO  dependencies   12   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  13. 13. APU  CONSIDERATIONS   A  WELL  BALANCED  COMPUTE  PROCESSOR   !  Heterogeneous  architecture   ‒  GPU  compute  units  can  perform  high  bandwidth  opera]ons  and  highly  parallel  opera]ons     ‒  CPU  performs  necessary  serial  and  logis]cal  opera]ons   ‒  CPU  and  GPU  can  see  each  other’s  memory   !  x265  opportunity   ‒  Via  WPP  and  frame  parallelism  we  can  expose  two  dozen  parallel    CU  blocks  to  be  encoded   ‒  Each  parallel  CU  block  requires  recursive  analysis   ‒  Control  must  transfer  between  the  CPU  and  GPU  many  ]mes  to  complete  analysis   ‒  GPU  performs  all  cost  es]mates  for  inter  and  inter  compression,  loop  filters,  and  pixel  weigh]ng   ‒  CPU  makes  QT  split  and  encode  decisions,  entropy  encoding,  and  dependency  tracking   ‒  Many  CUs  can  be  busy  on  the  GPU  at  once,  only  four  may  use  the  CPU  cores  at  a  ]me.   ‒  Making  use  the  GPU  compute  units  with  minimal  CPU  overhead  is  the  key   13   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  14. 14. DISCLAIMER  &  ATTRIBUTION   The  informa]on  presented  in  this  document  is  for  informa]onal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informa]on  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  sonware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obliga]on  to  update  or  otherwise  correct  or  revise  this  informa]on.  However,  AMD   reserves  the  right  to  revise  this  informa]on  and  to  make  changes  from  ]me  to  ]me  to  the  content  hereof  without  obliga]on  of  AMD  to  no]fy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combina]ons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdic]ons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  Evalua]on  Corpora]on  (SPEC).  Other   names  are  for  informa]onal  purposes  only  and  may  be  trademarks  of  their  respec]ve  owners.   14   |      PRESENTATION  TITLE      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×