Optimizing FFMPEG and Handbrake for Hardware Acceleration

OPTIMIZING
FFMPEG
AND
HANDBRAKE

USING
OPENCL

SRIKANTH
GOLLAPUDI
&
MICHAEL
WOOTTON

FFMPEG

INTRODUCTION

!  FFMPEG
is
a
very
popular
open
source
mulLmedia
soNware
library
used
to
record,
convert
and
stream

Audio
&
Video.

!  Used
by
popular
OpenSource
projects
like
Handbrake,
VLC
player,
Chrome
etc.

!  Single
stop
soluLon
for

‒  Decoding
different
codec
formats
(Audio
&
Video)

‒  Handling
various
container
formats
(mp4,
wmv,
avi,
m2ts,
m2ps
etc.)

‒  Encoding
to
popular
Video
&
Audio
codec
formats
(H.264,
VC-‐1,
Mpeg2
etc.)

‒  Different
video
filtering
algorithms
(Deshake,
Scale,
Unsharp
etc.)

‒  Managing
different
pixel
formats
(NV12,
RGB,
YV12
etc.)

‒  Cross-‐placorm
support
(Windows
and
Linux)

2
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

FFMPEG
–
TYPICAL
USAGE
SCENARIO
AND
PROCESSING
INVOLVED

Imagine
a
video
edit
using
FFMPEG

Video

Decode

3
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

Video
shake

removal

Sharp/Blur

Scale

Video

Encode

FFMPEG
–
TYPICAL
USAGE
SCENARIO
AND
PROCESSING
INVOLVED

Imagine
a
video
edit
using
FFMPEG

Video

Decode

Video
shake

removal

Sharp/Blur

Scale

GPU

HW
Decoder

CPU

AMD
APU

HETEROGENEOUS
SOLUTION

4
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

Video

Encode

HW
Encoder

FFMPEG
–
SCOPE
FOR
ACCELERATION

Leverage
Heterogeneous
compute

!  Accelerate
Video
Decode
and
Encode
using
HW
accelerators

‒  Load
on
CPU
to
perform
decode
and
encode
is
taken
off

‒  Power
savings
=>
longer
baiery
life

!  Accelerate
Video
Processing
filter
using
GPU

‒  Increased
performance
compared
to
CPU
implementaLon

‒  ApplicaLon
runs
at
higher
fps

‒  Possible
to
apply
more
filters
to
achieve
beier
video
quality

!  Use
CPU
for
Serial
processing
and
control

‒  Efficient
usage
of
resources

5
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

FFMPEG
–
OUR
WORK

!  AMD
and
MulLcoreware
Inc.,
worked
on
acceleraLng
FFMPEG

!  Enable
usage
of
Hardware
decoder

‒  To
support
decoding
of
H.264,
VC-‐1,
MPEG2
and
Mpeg4
pt2
codecs

‒  Windows

‒ IntegraLon
of
DXVA2
API
to
ffmpeg.exe

‒ DXVA2
funcLonality
already
available
in
ffmpeg’s
libavcodec
library

‒ Extremely
difficult
for
applicaLon
developers
to
make
use
of
DXVA2
API
in
libavcodec

‒  Needs
deep
understanding
of
DXVA2
API
and
specific
codec
level
knowledge

‒ Coded
up
all
the
necessary
steps
needed
to
use
HW
decoder
using
DXVA2
in
ffmpeg.exe
app

‒ Created
a
command
line
opLon
for
ffmpeg.exe
to
enable
usage
of
HW
assisted
decode

!  Make
use
of
DirectX(R)
9
to
OpenCLTM
interop
APIs
available
in
OpenCL1.2TM

‒  This
ensures
the
decoded
frame
is
retained
in
GPU
memory
and
passed
on
to
OpenCLTM
filter

6
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

FFMPEG
–
OUR
WORK

!  Introduced
OpenCLTM
in
ffmpeg

‒  Created
OpenCLTM

infrastructure
in
libavuLl
to
enable
usage
of
OpenCLTM

in
ffmpeg

!  AcceleraLon
of
Video
processing
filters
on
GPU
using
OpenCLTM

‒  Added
OpenCLTM

implementaLon
for
the
following
filters
in
libavfilter

‒  Deshake
-‐
This
filter
helps
remove
camera
shake
from
hand-‐holding
a
camera,
moving
on
a
vehicle,
etc.

‒  Unsharp
-‐
Sharpen
or
blur
the
input
video

‒  Scale
-‐
Scale
(resize)
the
input
video

‒  Denoise
-‐
High
precision/quality
3d
denoise
filter.
This
filter
aims
to
reduce
image
noise
producing
smooth
images

‒  Yadif
-‐
Deinterlace
the
input
video

‒  Lnterlace
-‐
temporal
field
interlacing

‒  Gradfun
-‐
Fix
the
banding
arLfacts
introduced
by
truncaLon
to
8bit
color
depth

!  OpLmizaLon
of
ffmpeg
pipeline
to
run
decode,
filters
&
encode
in
parallel

7
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

FFMPEG
–
PERFORMANCE

" Performance
numbers
of
transcode
pipeline
using
ffmpeg
on
A10-‐6800K
APU

Accelerated
ffmpeg

55

60

57

Original
ffmpeg
(CPU)

FPS

50

29

40

30

22

20

10

1.3

0

8
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

23

16

1.2

FFMPEG
–
STATUS

!  Ffmpeg
2.0
contains
OpenCL
work

‒  OpenCL
framework
in
libavuLl

‒  Deshake
and
unsharp
OpenCL
implementaLons
in
libavﬁlter

!  DXVA2
patch
is
under
review

!  Further
OpLmizaLons
and
tuning
in
progress
for
other
ﬁlters.

9
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

FFMPEG
–
CHALLENGES

!  Introducing
OpenCL
into
ffmpeg

‒  Reviewers
were
not
well
versed
with
OpenCL

!  Retaining
data
on
GPU
memory
in
the
pipeline

‒  Ffmpeg
soNware
architectural
changes
needed
for
this

!  RecompilaLon
of
kernels
on
every
run

‒  Ffmpeg
does
not
allow
saving
compiled
binary
files
on
local
machine

!  Ffmpeg
soNware
needs
pipeline
level
opLmizaLons
to
take
benefit
of
heterogeneous
placorm

10
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

FFMPEG
–
FUTURE
WORK

!  Add
support
for
HW
assisted
encode
(H.264)

‒  AMD
is
going
to
give
out
C++
API
to
access
HW
Encoder
called
AMF

‒  More
details
available
in
the
talk
tomorrow

Innova'ng
with
AMD
Mul'media
Technologies
(MM-‐4095)

!  OpLmize
OpenCL
implementaLon
of
ﬁlters
for
beier
performance

!  Explore
using
HSA
features
to
boost
performance

!  OpLmize
memory
transfers

‒  Retain
buﬀers
on
device
memory
across
Decode,
Filter
and
Encode
modules

11
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

WHAT
IS
HANDBRAKE?

!  Open
Source
Video
Transcoder

!  Converts
videos
from
most
popular
format

!  Selectable
output
format
and
bitrates

!  Video
Resizing

!  Video
Filters

‒ Deinterlacing

‒ Decomb

‒ Deblock

‒ Grayscale

‒ Cropping

13
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

CURRENT
ENHANCEMENTS

!  Hardware
Video
Decode

‒ Input
video
decoded
via
DXVA2

‒ ULlizes
UVD
on
AMD
GPUs
and
APUs

!  OpenCL™
accelerated
Video
ResoluLon
changes

‒ Video
Frames
are
resized
using
OpenCL
kernels

‒ Example:
1080p
converted
to
720p

14
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

IMPROVING
OPENCL
SCALING

!  The
OpenCL
Scaling
Enhancement
was
under-‐performing

!  IdenLﬁed
Issues:

‒ Image
format
conversion

‒ Buﬀer
staging

‒ Separable
Scaling
using
two
kernels

15
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

OPENCL
SCALING
IMPROVEMENTS

Reduce
Memory
Copies:

!  Modify
the
exisLng
HandBrake
buffer
system

!  IdenLfy
which
buffers
will
contain
video
data
(vs.
audio,
capLons,
etc.)

!  Video
buffers
are
allocated
out
of
pinned
Host
Memory

!  Non-‐OpenCL
aware
code
writes
data
to
the
correct
place

!  Kernels
can
directly
read/write
the
buffers
via
Zero
Copy

16
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

OPENCL
SCALING
IMPROVEMENTS

Switch
to
a
Single
Kernel:

!  Eliminate
the
two
kernel
approach

!  Process
blocks
of
data
rather
than
lines

!  Support
HandBrake
naLve
image
packing

!  Use
LDS
to
further
reduce
Global
Memory
accesses

17
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

RESULTS

!  The
single
kernel
completes
quickly

!  No
extra
memory
copies
are
required

!  Kernel
execuLon
Lme
to
scale
one
frame
(1080p
-‐>
720p)*

‒ AMD
A10-‐6800K
–
2.4
ms

‒ AMD
HD7750
–
1.0
ms

!  ApplicaLon
Performance
on
A10-‐6800K

Feature

Performance
(FPS)

Improvement
over
SW

SoNware

36.08

0.0

Scaling

39.64

9.9%

UVD

40.53

12.3%

Scaling
+
UVD

44.95

23.9%

*
All
Lmes
measured
on
a
development
system

18
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

THANK
YOU

QuesLons

19
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

DISCLAIMER
&
ATTRIBUTION

The
informaLon
presented
in
this
document
is
for
informaLonal
purposes
only
and
may
contain
technical
inaccuracies,
omissions
and
typographical
errors.

The
informaLon
contained
herein
is
subject
to
change
and
may
be
rendered
inaccurate
for
many
reasons,
including
but
not
limited
to
product
and
roadmap

changes,
component
and
motherboard
version
changes,
new
model
and/or
product
releases,
product
differences
between
differing
manufacturers,
soNware

changes,
BIOS
flashes,
firmware
upgrades,
or
the
like.
AMD
assumes
no
obligaLon
to
update
or
otherwise
correct
or
revise
this
informaLon.
However,
AMD

reserves
the
right
to
revise
this
informaLon
and
to
make
changes
from
Lme
to
Lme
to
the
content
hereof
without
obligaLon
of
AMD
to
noLfy
any
person
of

such
revisions
or
changes.

AMD
MAKES
NO
REPRESENTATIONS
OR
WARRANTIES
WITH
RESPECT
TO
THE
CONTENTS
HEREOF
AND
ASSUMES
NO
RESPONSIBILITY
FOR
ANY

INACCURACIES,
ERRORS
OR
OMISSIONS
THAT
MAY
APPEAR
IN
THIS
INFORMATION.

AMD
SPECIFICALLY
DISCLAIMS
ANY
IMPLIED
WARRANTIES
OF
MERCHANTABILITY
OR
FITNESS
FOR
ANY
PARTICULAR
PURPOSE.
IN
NO
EVENT
WILL
AMD
BE

LIABLE
TO
ANY
PERSON
FOR
ANY
DIRECT,
INDIRECT,
SPECIAL
OR
OTHER
CONSEQUENTIAL
DAMAGES
ARISING
FROM
THE
USE
OF
ANY
INFORMATION

CONTAINED
HEREIN,
EVEN
IF
AMD
IS
EXPRESSLY
ADVISED
OF
THE
POSSIBILITY
OF
SUCH
DAMAGES.

ATTRIBUTION

©
2013
Advanced
Micro
Devices,
Inc.
All
rights
reserved.
AMD,
the
AMD
Arrow
logo
and
combinaLons
thereof
are
trademarks
of
Advanced
Micro
Devices,

Inc.
in
the
United
States
and/or
other
jurisdicLons.

SPEC

is
a
registered
trademark
of
the
Standard
Performance
EvaluaLon
CorporaLon
(SPEC).
Other

names
are
for
informaLonal
purposes
only
and
may
be
trademarks
of
their
respecLve
owners.

20
|

PRESENTATION
TITLE

|

November
19,
2013

|

CONFIDENTIAL

Optimizing FFMPEG and Handbrake for Hardware Acceleration

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Optimizing FFMPEG and Handbrake for Hardware Acceleration

Similar to Optimizing FFMPEG and Handbrake for Hardware Acceleration (20)

More from AMD Developer Central

More from AMD Developer Central (20)

Recently uploaded

Recently uploaded (20)

Optimizing FFMPEG and Handbrake for Hardware Acceleration