SlideShare a Scribd company logo
2D composition Engine
Agenda:
• Architecture Overview
BB_2DHWA Feature Summary
• Block Copy/Draw Operations
• Rotation (90/180/270 degrees) and Mirror/Flip operations
• Scaling (1/16x ~ 16x)
• Color Space and format Conversion
• Chroma Up/Down sampling
• ROP2/3 operations
• Alpha Blending/Compositing (Porter Duff Compositing)
• Destination Clipping
• Source Pattern Repeat
Image Attributes….
source base address
SrcWidth
SrcHeight
(SrcXoffset,
SrcYoffset)
SurfWidth
SurfHeight
Stride_Size
pattern SrcPatHeight
SrcPatWidth
Data Types
•LUT/MONO-1/2/4/8
•YUV (420_2,422,444)
•RGB (aRGB16/24,32)
•Component Ordering
•Pre-multiplied
•Embedded Alpha
DMA Attributes
• Base Address
• Width/Height
• Stride
• Offsets
Operation Commands (SRC)
• CSC/CHRUS
• VC-remapping
• Color Expand
• Scaling
• Rotation
• Pattern RepeatOperation Commands (DST)
• Blending/Compositing
• ROP2/3
• Clipping
• Color Fill
• CSC/CHRDS
BB_2DHWA Operation Overview
SRC-1
Image Data
SRC-2
Image Data
Alpha
(Remote/Mask)
Color Space Conv
Color Expand
Scaling
Rotate
SRC-dst
Image
Color Space Conv
Color Expand
Scaling
Rotate
SRC-dst
Image
Blending
(Compositing)
ROP-2/3
ColorFill
Clipping
Color Space Conv
DST
Image
Data
DST
Image
Data
BB_2DHWA SRC  DST types
Any Source data types  Non-sub-byte and non-LUT Dest data types
Architecture Block Diagram
bb_2dhwa
bb_2dhwa_dp_core
bb_2dhwa_dp_cntl
L4 I/F (ocp2mmr)
bb_2dhwa_vpdma
uv read
R-client
ListMgr
L3I/F
bufbuf
BIMCDMA
ARB
pmem
packed data
R-client
SCR
vbusp_m vbusp_s
bb_2dhwa_dp_dst
alpha
444
to
422/
420
Color Red
&
Dithering
C
S
C
ROP/Blend
src1src2
smem
vbusp_s
dst
cfg
alpha
R-client
uv read
R-client
uv read
W-client
packed data
W-client cfg
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
packed data
R-client
bb_2dhwa_clkc_int
INTC
vbusp_s
CLK/RST
L3
L4
VPDMA
DP_SRC
DP_SRC DP_DST
DP_CORE
BB_2DHWA Architecture Block Diagram
bb_2dhwa
bb_2dhwa_dp_core
bb_2dhwa_dp_cntl
L4 I/F (ocp2mmr)
bb_2dhwa_vpdma
uv read
R-client
ListMgr
L3I/F
bufbuf
BIMCDMA
ARB
pmem
packed data
R-client
SCR
vbusp_m vbusp_s
bb_2dhwa_dp_dst
alpha
444
to
422/
420
Color Red
&
Dithering
C
S
C
ROP/Blend
src1src2
smem
vbusp_s
dst
cfg
alpha
R-client
uv read
R-client
uv read
W-client
packed data
W-client cfg
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
bb_2dhwa_dp_src
ROT
rmem
CSC
Color
Exp
V
Scaler
H
Scaler
SLmem
SAmem
cmem
420to422
YC_aligner
422to444
Lmem
VC1Range
packed data
R-client
bb_2dhwa_clkc_int
INTC
vbusp_s
CLK/RST
L3
L4
VPDMA
DP_SRC
DP_SRC DP_DST
DP_CORE
VPDMA FW
Initialization
List Start
Descriptor
DownLoad
Descriptor
Copied
Client
Configuration
DMA
Read Req
Src Data
Processing
Dest Data
Generation
DMA
Write Req
(List) Cmd Done
IRQ
BB_2DHWA External Interfaces
MMR
Interconnect
HP
Interconnect
Interrupt
Interconnect
Clock/Reset
Interconnect
DFT
Interconnect
Memory BIST
Interconnect
_mmr_slv
_vpdma_mst
intr
l3_clk/clkdiv
l4_clk/clkdiv
rst_main_arst_n
dft
gpi
gpo
BB_2DHWA
Core Processing Unit (dp_src)
Cmem
Color
Exp
Rotate
Engine
Rmem
V
Scaler
H
Scaler
SLmem SAmem
422
to
444
Lmem
YC_aligner
420to422
u/v
y, yuv
(a)rgb, bm
8
32
cmem_mux rmem_mux
clut_loader
argb
32
vpi_invpi_in
VC-1rangemap
uv_2x
CSC
(clut 32)
Core Processing Unit (dp_dst)
dp_dst_src_gen
src1_pipe_fifo src2_pipe_fifo
blend_pd
rop_engine
alpha_pipe_fifo
Color_Red csc
Clip_Cntl
(dst_col_fill)
chr_ds
vpi_out_y vpi_out_uv
argb 32 argb 32 alpha-1/8/32
rgb
yuv444, yuv422,
y(420), mono-8
32 8
32 32 32
u/v(420)
Terminologies
• Tile Mode
• Vslice Mode
• Chroma Expansion
Tile Mode (Rotation)
90d rotate + scale
vpi i/f
reverse blocked
reverse raster order
vpdma (src)
90d rotate
with mirror-y
scale
to
32x32 blk
vpdma
(dst)
Tile Mode (Rotation Modes)
scale + 90d rotate
vpdma (src)
90d rotate
with mirror-y
scale
vpdma
(dst)
scale + 90d rotate + mirror-y
vpi i/f
(to core)
vpdma
(dst)
scale + 90d rotate + mirror-x
vpdma (src)
90d rotate
with mirror-y
scale
vpdma (src)
90d rotate
with mirror-y
scale
vpdma
(dst)
vpi i/f
normal blocked
reverse ROW
raster order
scale
vpdma
(dst)
vpdma
TB-RL tile read
TB-RL row ordering
90d rotate
with mirror-y
scale
scale +270 rotate
Scan Order Determination FlowChart
Any Src
90/270
Rotated?
src
flip/mirror?
Rot_mir_
mode
LtUp
90
(mode 1)
270
(mode 3)
yes
overlapped
copy
x or y
axis?
LtDn RtUp
y-axis
x-axis
RtDn
copy dir
RtDn RtUp LtDn
U | UR | UL | L
D | DR | DL
R
UpRt (Tile)
RtUp
DnLt (Tile)
LtDn
no
LtUp(Tile)
LtUp
180 Rot?
yes
yes
yes
Flip (only)
RtDn (Tile)
RtDn
0
(mode 0)
x or y
axis
LtDn(Tile)
LtDn
RtUp(Tile)
RtUp
no
yes (modes 4 & 5)
x-axisy-axis
no
no
No
180
(mode 2)
DnRt(Tile)
RtDn
UpLt(Tile)
LtUp
90+mx
(mode 6)
90+my
(mode 7)
Vslice Mode
YUV420 Source Data
Or
Any Data (scale_en)
> 1020 pixels wide
src2
Vslice_tar_w
Src2_in_w
Src1_in_w
Chroma Expansion
Over-fetching extra chroma pixels and/or lines to perform
proper 420422 and/or 422444 chroma upsampling across
tile/vslice boundaries
Key Functional Processing Units
•Scaler
•Rotation Engine
•Porter-Duff Compositing Engine
•ROP engine
Scaler
L_buf(mem) for vs
or
P_buf(reg) for vs
x
accum line buf (mem)-vs
accum pix buf (reg)-hs
+
phase_in
phase_out
x
scale_f
in
out
cntl
cfg
rdy/req rdy/req
weighted
blending
Scaler (Vertical Scaler)
L_buf(mem)
x
accum (mem)
+
fin fout
x
scale_f
in
Outsrc_row+1
src_row
src_row
frag_delta_v
frag_outfrag_in
frag_in_c
-
a
b
ab
abs(a-b)
intensity
RND
/SAT
out_valid
L_buf(mem)
1
upscaling
1
1
zero
first_row_pix 0
out_valid
1
8.0
8.0
1.24
frag_out_c
5.24
8.4
12.4
12.4
12.4
5.24
8.0
5.24
8.4
8.0
5.24
Inv_Scale_f
src_row inc
tar_row inc
a-bb-aone
out_valid = (a-b)>0 or
last_row_pix & (frag_∆_v < frag_∆_thresh)
- x +
8.0s
RND/
SAT
5.24
1.24
1.24 5.24
5.24
cmp
frag_delta_thresh
0.24
scale_factor_c
5.13
1.13
5.13
5.13
9.13s RND/
SAT
TRUNC TRUNC TRUNC
1
init
1 init
Rotation Engine
1 2 3 4input tile
r_mem
data read out
rotated
output tile
Write in rotated order
(addr + 32 pixel location)
Write in un-rotated order
(addr + 1 pixel location) Write in rotated order
Read already rotated data
(addr + 1 pixel location)
Read out in rotated order
(addr + 32 pixel location)
1 2 3 4input tile
r_mem
data read out
rotated
output tile
Write in rotated order
(addr + 32 pixel location)
Write in un-rotated order
(addr + 1 pixel location) Write in rotated order
Read already rotated data
(addr + 1 pixel location)
Read out in rotated order
(addr + 32 pixel location)
Porter-Duff Compositing
Out = a*S + (1-a)*D Simple Blending
PorterDuff_Rule Selection
0x0 : CLEAR
0x1 : SRC
0x2 : DST
0x3 : SRC_OVER
0x4 : DST_OVER
0x5 : SRC_IN
0x6 : DST_IN
0x7 : SRC_OUT
0x8 : DST_OUT
0x9 : SRC_ATOP
0xA : DST_ATOP
0xB : XOR
0xC: PLUS
Porter-Duff Compositing
Porter-Duff Compositing
Porter-Duff Compositing
Porter-Duff Compositing Engine
+
16
16
1 / 255
X
88
16
X
8
24
X
88
16
X
8
16
24
0 1 1 0
+
2424
24
1 / 255
1 / 255
/
816
8
0 1
Csrc
Fsrc Asrc
SRC1
non_PreMult
SRC2
non_PreMult
Cdst
Fdst
2424
8
8
8
Cout
Non-Pre_Mult
16
16
((x<<8) + x + 256) >> 16
1 / 255 estimation
8
1 0
Csrc Cdst
Cout_simple_src
0 1
Cout_simple_no_div
0 (pd_CLEAR)
1 (Cout_simple_no_div or Cout_simpe_p2p)
2 (dst_pre_mult)
3 (else)
Pre_Mult
X
88
16
(Asrc*Csrc)
0xFF Asrc_mod
0 1
src_alpha_modulated &
~cout_simple
X
88
Fsrc'
Pre_ModAsrc
Pre_ModAdst
X
88
16
(Adst*Cdst)
255 Adst_mod
0 1
X
88
Fdst'
0 1
8'h0
pd_clear
dst_alpha_modulated &
~cout_simple
Adst
Aout selection - based on
data pipeline delay
0 1
cout_simple_p2p
dst=npdst=p
or (Csrc*Asrc Cdst*Adst)
src=np src=p
Dst Non-Pre_Mult
1 / 255 1 / 255
x 255 x 255
1 0
cout_simple_src
cout_simple_p2np
0 1
1 0
cout_simple_src
cout_simple_p2np0 1
cout_simple_p2np 1 0
16
16
clip
clip
αout
8'h0
1 2 3 0
BB_2DHWA ROP

More Related Content

Viewers also liked

4th. # 1
4th. # 14th. # 1
4th. # 1
mslauralozano
 
2016 03 upm
2016 03 upm2016 03 upm
2016 03 upm
Johannes Keizer
 
Comunicado prensa planes_viales
Comunicado prensa planes_vialesComunicado prensa planes_viales
Comunicado prensa planes_viales
alcaldia municipal
 
Otero barnes lesson 5-7 (2)
Otero barnes   lesson 5-7 (2)Otero barnes   lesson 5-7 (2)
Otero barnes lesson 5-7 (2)
Mii Otero Barnes
 
SEO Your Resume Bypass HR with Keyword Optimization
SEO Your Resume Bypass HR with Keyword OptimizationSEO Your Resume Bypass HR with Keyword Optimization
SEO Your Resume Bypass HR with Keyword Optimizationayman diab
 
Book315
Book315Book315
Book315
gdzoff
 
Antavilla School Olimpiadas 2013
Antavilla School  Olimpiadas 2013Antavilla School  Olimpiadas 2013
Antavilla School Olimpiadas 2013
AntavillaSchool
 
Spec template and mapping to derivatives of a product
Spec template and mapping to derivatives of a product Spec template and mapping to derivatives of a product
Spec template and mapping to derivatives of a product
Manageware
 
Profil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
Profil Pasangan Calon : Benyamin Sudarmadi - Haji MustangidProfil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
Profil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
Arwan idsn
 
The top 250 most difficult sat words
The top 250 most difficult sat wordsThe top 250 most difficult sat words
The top 250 most difficult sat words
Jhong Montefalcon
 
Tipos de erupciones volcanicas
Tipos de erupciones volcanicas Tipos de erupciones volcanicas
Tipos de erupciones volcanicas
Estheruqui03
 
Jardines japoneses
Jardines japonesesJardines japoneses
Jardines japoneses
Andrys Hdez
 

Viewers also liked (12)

4th. # 1
4th. # 14th. # 1
4th. # 1
 
2016 03 upm
2016 03 upm2016 03 upm
2016 03 upm
 
Comunicado prensa planes_viales
Comunicado prensa planes_vialesComunicado prensa planes_viales
Comunicado prensa planes_viales
 
Otero barnes lesson 5-7 (2)
Otero barnes   lesson 5-7 (2)Otero barnes   lesson 5-7 (2)
Otero barnes lesson 5-7 (2)
 
SEO Your Resume Bypass HR with Keyword Optimization
SEO Your Resume Bypass HR with Keyword OptimizationSEO Your Resume Bypass HR with Keyword Optimization
SEO Your Resume Bypass HR with Keyword Optimization
 
Book315
Book315Book315
Book315
 
Antavilla School Olimpiadas 2013
Antavilla School  Olimpiadas 2013Antavilla School  Olimpiadas 2013
Antavilla School Olimpiadas 2013
 
Spec template and mapping to derivatives of a product
Spec template and mapping to derivatives of a product Spec template and mapping to derivatives of a product
Spec template and mapping to derivatives of a product
 
Profil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
Profil Pasangan Calon : Benyamin Sudarmadi - Haji MustangidProfil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
Profil Pasangan Calon : Benyamin Sudarmadi - Haji Mustangid
 
The top 250 most difficult sat words
The top 250 most difficult sat wordsThe top 250 most difficult sat words
The top 250 most difficult sat words
 
Tipos de erupciones volcanicas
Tipos de erupciones volcanicas Tipos de erupciones volcanicas
Tipos de erupciones volcanicas
 
Jardines japoneses
Jardines japonesesJardines japoneses
Jardines japoneses
 

Similar to 2DCompsitionEngine

Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
mistercteam
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
Narann29
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
Thomas Goddard
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
Shien-Chun Luo
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
Graham Wihlidal
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
AMD Developer Central
 
The State of the GeoServer project
The State of the GeoServer projectThe State of the GeoServer project
The State of the GeoServer project
GeoSolutions
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
National Cheng Kung University
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
AMD Developer Central
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
AMD
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
ssuser866937
 
State of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NAState of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NA
GeoSolutions
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Akihiro Hayashi
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem porting
Eric Lin
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Dhaval Kaneria
 
Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2
Jean-Philippe BEMPEL
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
Wolfgang Engel
 
An Introduction to NV_path_rendering
An Introduction to NV_path_renderingAn Introduction to NV_path_rendering
An Introduction to NV_path_rendering
Mark Kilgard
 
Graph computation
Graph computationGraph computation
Graph computation
Sigmoid
 
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMsRailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
Lourens Naudé
 

Similar to 2DCompsitionEngine (20)

Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
2020 icldla-updated
2020 icldla-updated2020 icldla-updated
2020 icldla-updated
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
The State of the GeoServer project
The State of the GeoServer projectThe State of the GeoServer project
The State of the GeoServer project
 
Qemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System EmulationQemu JIT Code Generator and System Emulation
Qemu JIT Code Generator and System Emulation
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
State of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NAState of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NA
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
COSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem portingCOSCUP 2020 RISC-V 32 bit linux highmem porting
COSCUP 2020 RISC-V 32 bit linux highmem porting
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2Understanding low latency jvm gcs V2
Understanding low latency jvm gcs V2
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
An Introduction to NV_path_rendering
An Introduction to NV_path_renderingAn Introduction to NV_path_rendering
An Introduction to NV_path_rendering
 
Graph computation
Graph computationGraph computation
Graph computation
 
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMsRailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
 

More from Shereef Shehata

Windows_Scaling_2X_Speedup
Windows_Scaling_2X_SpeedupWindows_Scaling_2X_Speedup
Windows_Scaling_2X_Speedup
Shereef Shehata
 
2D_block_scaling_Software
2D_block_scaling_Software2D_block_scaling_Software
2D_block_scaling_Software
Shereef Shehata
 
2D_BLIT_software_Blackness
2D_BLIT_software_Blackness2D_BLIT_software_Blackness
2D_BLIT_software_Blackness
Shereef Shehata
 
CIECAM02_Color_Management
CIECAM02_Color_ManagementCIECAM02_Color_Management
CIECAM02_Color_Management
Shereef Shehata
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
Shereef Shehata
 
log_algorithm
log_algorithmlog_algorithm
log_algorithm
Shereef Shehata
 
Temporal_video_noise_reduction
Temporal_video_noise_reductionTemporal_video_noise_reduction
Temporal_video_noise_reduction
Shereef Shehata
 
Shereef_Color_Processing
Shereef_Color_ProcessingShereef_Color_Processing
Shereef_Color_Processing
Shereef Shehata
 
Inertial_Sensors
Inertial_SensorsInertial_Sensors
Inertial_Sensors
Shereef Shehata
 
magentometers
magentometersmagentometers
magentometers
Shereef Shehata
 
Shereef_MP3_decoder
Shereef_MP3_decoderShereef_MP3_decoder
Shereef_MP3_decoder
Shereef Shehata
 
Fusion_Class
Fusion_ClassFusion_Class
Fusion_Class
Shereef Shehata
 
Gyroscope_sensors
Gyroscope_sensorsGyroscope_sensors
Gyroscope_sensors
Shereef Shehata
 
Block_Scaler_Control
Block_Scaler_ControlBlock_Scaler_Control
Block_Scaler_Control
Shereef Shehata
 
2D_BitBlt_Scale
2D_BitBlt_Scale2D_BitBlt_Scale
2D_BitBlt_Scale
Shereef Shehata
 
xvYCC_RGB
xvYCC_RGBxvYCC_RGB
xvYCC_RGB
Shereef Shehata
 
The_Mismatch_Noise_Cancellation_Architecture
The_Mismatch_Noise_Cancellation_ArchitectureThe_Mismatch_Noise_Cancellation_Architecture
The_Mismatch_Noise_Cancellation_Architecture
Shereef Shehata
 
Architectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsArchitectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_Datapaths
Shereef Shehata
 
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAsHigh_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
Shereef Shehata
 

More from Shereef Shehata (19)

Windows_Scaling_2X_Speedup
Windows_Scaling_2X_SpeedupWindows_Scaling_2X_Speedup
Windows_Scaling_2X_Speedup
 
2D_block_scaling_Software
2D_block_scaling_Software2D_block_scaling_Software
2D_block_scaling_Software
 
2D_BLIT_software_Blackness
2D_BLIT_software_Blackness2D_BLIT_software_Blackness
2D_BLIT_software_Blackness
 
CIECAM02_Color_Management
CIECAM02_Color_ManagementCIECAM02_Color_Management
CIECAM02_Color_Management
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
log_algorithm
log_algorithmlog_algorithm
log_algorithm
 
Temporal_video_noise_reduction
Temporal_video_noise_reductionTemporal_video_noise_reduction
Temporal_video_noise_reduction
 
Shereef_Color_Processing
Shereef_Color_ProcessingShereef_Color_Processing
Shereef_Color_Processing
 
Inertial_Sensors
Inertial_SensorsInertial_Sensors
Inertial_Sensors
 
magentometers
magentometersmagentometers
magentometers
 
Shereef_MP3_decoder
Shereef_MP3_decoderShereef_MP3_decoder
Shereef_MP3_decoder
 
Fusion_Class
Fusion_ClassFusion_Class
Fusion_Class
 
Gyroscope_sensors
Gyroscope_sensorsGyroscope_sensors
Gyroscope_sensors
 
Block_Scaler_Control
Block_Scaler_ControlBlock_Scaler_Control
Block_Scaler_Control
 
2D_BitBlt_Scale
2D_BitBlt_Scale2D_BitBlt_Scale
2D_BitBlt_Scale
 
xvYCC_RGB
xvYCC_RGBxvYCC_RGB
xvYCC_RGB
 
The_Mismatch_Noise_Cancellation_Architecture
The_Mismatch_Noise_Cancellation_ArchitectureThe_Mismatch_Noise_Cancellation_Architecture
The_Mismatch_Noise_Cancellation_Architecture
 
Architectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_DatapathsArchitectural_Synthesis_for_DSP_Structured_Datapaths
Architectural_Synthesis_for_DSP_Structured_Datapaths
 
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAsHigh_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
High_Level_Synthesis_of_DSP_Archiectures_Targeting_FPGAs
 

2DCompsitionEngine

  • 3. BB_2DHWA Feature Summary • Block Copy/Draw Operations • Rotation (90/180/270 degrees) and Mirror/Flip operations • Scaling (1/16x ~ 16x) • Color Space and format Conversion • Chroma Up/Down sampling • ROP2/3 operations • Alpha Blending/Compositing (Porter Duff Compositing) • Destination Clipping • Source Pattern Repeat
  • 4. Image Attributes…. source base address SrcWidth SrcHeight (SrcXoffset, SrcYoffset) SurfWidth SurfHeight Stride_Size pattern SrcPatHeight SrcPatWidth Data Types •LUT/MONO-1/2/4/8 •YUV (420_2,422,444) •RGB (aRGB16/24,32) •Component Ordering •Pre-multiplied •Embedded Alpha DMA Attributes • Base Address • Width/Height • Stride • Offsets Operation Commands (SRC) • CSC/CHRUS • VC-remapping • Color Expand • Scaling • Rotation • Pattern RepeatOperation Commands (DST) • Blending/Compositing • ROP2/3 • Clipping • Color Fill • CSC/CHRDS
  • 5. BB_2DHWA Operation Overview SRC-1 Image Data SRC-2 Image Data Alpha (Remote/Mask) Color Space Conv Color Expand Scaling Rotate SRC-dst Image Color Space Conv Color Expand Scaling Rotate SRC-dst Image Blending (Compositing) ROP-2/3 ColorFill Clipping Color Space Conv DST Image Data DST Image Data
  • 6. BB_2DHWA SRC  DST types Any Source data types  Non-sub-byte and non-LUT Dest data types
  • 7. Architecture Block Diagram bb_2dhwa bb_2dhwa_dp_core bb_2dhwa_dp_cntl L4 I/F (ocp2mmr) bb_2dhwa_vpdma uv read R-client ListMgr L3I/F bufbuf BIMCDMA ARB pmem packed data R-client SCR vbusp_m vbusp_s bb_2dhwa_dp_dst alpha 444 to 422/ 420 Color Red & Dithering C S C ROP/Blend src1src2 smem vbusp_s dst cfg alpha R-client uv read R-client uv read W-client packed data W-client cfg bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range packed data R-client bb_2dhwa_clkc_int INTC vbusp_s CLK/RST L3 L4 VPDMA DP_SRC DP_SRC DP_DST DP_CORE
  • 8. BB_2DHWA Architecture Block Diagram bb_2dhwa bb_2dhwa_dp_core bb_2dhwa_dp_cntl L4 I/F (ocp2mmr) bb_2dhwa_vpdma uv read R-client ListMgr L3I/F bufbuf BIMCDMA ARB pmem packed data R-client SCR vbusp_m vbusp_s bb_2dhwa_dp_dst alpha 444 to 422/ 420 Color Red & Dithering C S C ROP/Blend src1src2 smem vbusp_s dst cfg alpha R-client uv read R-client uv read W-client packed data W-client cfg bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range bb_2dhwa_dp_src ROT rmem CSC Color Exp V Scaler H Scaler SLmem SAmem cmem 420to422 YC_aligner 422to444 Lmem VC1Range packed data R-client bb_2dhwa_clkc_int INTC vbusp_s CLK/RST L3 L4 VPDMA DP_SRC DP_SRC DP_DST DP_CORE VPDMA FW Initialization List Start Descriptor DownLoad Descriptor Copied Client Configuration DMA Read Req Src Data Processing Dest Data Generation DMA Write Req (List) Cmd Done IRQ
  • 9. BB_2DHWA External Interfaces MMR Interconnect HP Interconnect Interrupt Interconnect Clock/Reset Interconnect DFT Interconnect Memory BIST Interconnect _mmr_slv _vpdma_mst intr l3_clk/clkdiv l4_clk/clkdiv rst_main_arst_n dft gpi gpo BB_2DHWA
  • 10. Core Processing Unit (dp_src) Cmem Color Exp Rotate Engine Rmem V Scaler H Scaler SLmem SAmem 422 to 444 Lmem YC_aligner 420to422 u/v y, yuv (a)rgb, bm 8 32 cmem_mux rmem_mux clut_loader argb 32 vpi_invpi_in VC-1rangemap uv_2x CSC (clut 32)
  • 11. Core Processing Unit (dp_dst) dp_dst_src_gen src1_pipe_fifo src2_pipe_fifo blend_pd rop_engine alpha_pipe_fifo Color_Red csc Clip_Cntl (dst_col_fill) chr_ds vpi_out_y vpi_out_uv argb 32 argb 32 alpha-1/8/32 rgb yuv444, yuv422, y(420), mono-8 32 8 32 32 32 u/v(420)
  • 12. Terminologies • Tile Mode • Vslice Mode • Chroma Expansion
  • 13. Tile Mode (Rotation) 90d rotate + scale vpi i/f reverse blocked reverse raster order vpdma (src) 90d rotate with mirror-y scale to 32x32 blk vpdma (dst)
  • 14. Tile Mode (Rotation Modes) scale + 90d rotate vpdma (src) 90d rotate with mirror-y scale vpdma (dst) scale + 90d rotate + mirror-y vpi i/f (to core) vpdma (dst) scale + 90d rotate + mirror-x vpdma (src) 90d rotate with mirror-y scale vpdma (src) 90d rotate with mirror-y scale vpdma (dst) vpi i/f normal blocked reverse ROW raster order scale vpdma (dst) vpdma TB-RL tile read TB-RL row ordering 90d rotate with mirror-y scale scale +270 rotate
  • 15. Scan Order Determination FlowChart Any Src 90/270 Rotated? src flip/mirror? Rot_mir_ mode LtUp 90 (mode 1) 270 (mode 3) yes overlapped copy x or y axis? LtDn RtUp y-axis x-axis RtDn copy dir RtDn RtUp LtDn U | UR | UL | L D | DR | DL R UpRt (Tile) RtUp DnLt (Tile) LtDn no LtUp(Tile) LtUp 180 Rot? yes yes yes Flip (only) RtDn (Tile) RtDn 0 (mode 0) x or y axis LtDn(Tile) LtDn RtUp(Tile) RtUp no yes (modes 4 & 5) x-axisy-axis no no No 180 (mode 2) DnRt(Tile) RtDn UpLt(Tile) LtUp 90+mx (mode 6) 90+my (mode 7)
  • 16. Vslice Mode YUV420 Source Data Or Any Data (scale_en) > 1020 pixels wide src2 Vslice_tar_w Src2_in_w Src1_in_w
  • 17. Chroma Expansion Over-fetching extra chroma pixels and/or lines to perform proper 420422 and/or 422444 chroma upsampling across tile/vslice boundaries
  • 18. Key Functional Processing Units •Scaler •Rotation Engine •Porter-Duff Compositing Engine •ROP engine
  • 19. Scaler L_buf(mem) for vs or P_buf(reg) for vs x accum line buf (mem)-vs accum pix buf (reg)-hs + phase_in phase_out x scale_f in out cntl cfg rdy/req rdy/req weighted blending
  • 20. Scaler (Vertical Scaler) L_buf(mem) x accum (mem) + fin fout x scale_f in Outsrc_row+1 src_row src_row frag_delta_v frag_outfrag_in frag_in_c - a b ab abs(a-b) intensity RND /SAT out_valid L_buf(mem) 1 upscaling 1 1 zero first_row_pix 0 out_valid 1 8.0 8.0 1.24 frag_out_c 5.24 8.4 12.4 12.4 12.4 5.24 8.0 5.24 8.4 8.0 5.24 Inv_Scale_f src_row inc tar_row inc a-bb-aone out_valid = (a-b)>0 or last_row_pix & (frag_∆_v < frag_∆_thresh) - x + 8.0s RND/ SAT 5.24 1.24 1.24 5.24 5.24 cmp frag_delta_thresh 0.24 scale_factor_c 5.13 1.13 5.13 5.13 9.13s RND/ SAT TRUNC TRUNC TRUNC 1 init 1 init
  • 21. Rotation Engine 1 2 3 4input tile r_mem data read out rotated output tile Write in rotated order (addr + 32 pixel location) Write in un-rotated order (addr + 1 pixel location) Write in rotated order Read already rotated data (addr + 1 pixel location) Read out in rotated order (addr + 32 pixel location) 1 2 3 4input tile r_mem data read out rotated output tile Write in rotated order (addr + 32 pixel location) Write in un-rotated order (addr + 1 pixel location) Write in rotated order Read already rotated data (addr + 1 pixel location) Read out in rotated order (addr + 32 pixel location)
  • 22. Porter-Duff Compositing Out = a*S + (1-a)*D Simple Blending PorterDuff_Rule Selection 0x0 : CLEAR 0x1 : SRC 0x2 : DST 0x3 : SRC_OVER 0x4 : DST_OVER 0x5 : SRC_IN 0x6 : DST_IN 0x7 : SRC_OUT 0x8 : DST_OUT 0x9 : SRC_ATOP 0xA : DST_ATOP 0xB : XOR 0xC: PLUS Porter-Duff Compositing
  • 24. Porter-Duff Compositing Engine + 16 16 1 / 255 X 88 16 X 8 24 X 88 16 X 8 16 24 0 1 1 0 + 2424 24 1 / 255 1 / 255 / 816 8 0 1 Csrc Fsrc Asrc SRC1 non_PreMult SRC2 non_PreMult Cdst Fdst 2424 8 8 8 Cout Non-Pre_Mult 16 16 ((x<<8) + x + 256) >> 16 1 / 255 estimation 8 1 0 Csrc Cdst Cout_simple_src 0 1 Cout_simple_no_div 0 (pd_CLEAR) 1 (Cout_simple_no_div or Cout_simpe_p2p) 2 (dst_pre_mult) 3 (else) Pre_Mult X 88 16 (Asrc*Csrc) 0xFF Asrc_mod 0 1 src_alpha_modulated & ~cout_simple X 88 Fsrc' Pre_ModAsrc Pre_ModAdst X 88 16 (Adst*Cdst) 255 Adst_mod 0 1 X 88 Fdst' 0 1 8'h0 pd_clear dst_alpha_modulated & ~cout_simple Adst Aout selection - based on data pipeline delay 0 1 cout_simple_p2p dst=npdst=p or (Csrc*Asrc Cdst*Adst) src=np src=p Dst Non-Pre_Mult 1 / 255 1 / 255 x 255 x 255 1 0 cout_simple_src cout_simple_p2np 0 1 1 0 cout_simple_src cout_simple_p2np0 1 cout_simple_p2np 1 0 16 16 clip clip αout 8'h0 1 2 3 0