Engineer new post -hangzhou wumu technology co.,ltd.The Design of Human-Mach...
AMulti-coreSoftwareHardwareCo-DebugPlatform_Final
1. A Multi-core Software/Hardware Co-debug Platform with ARM CoreSight
TM
,
On-chip Test Architecture and AXI/AHB Bus Monitor
Alan P. Su, Jiff Kuo, †
Kuen-Jong Lee, ‡
Ing-Jer Huang, §
Guo-An Jian, §
Cheng-An Chien, §
Jiun-In Guo, and ‡
Chien-Hung Chen
Global Unichip Corp., †
EE Dept. National Cheng Kung University,
‡
CSE Dept. National Sun Yat-Sen University, §
CS Dept. National Chung Cheng University
{alan.su, jiff.kuo}@globalunichip.com, †
kjlee@mail.ncku.edu.tw, ‡
ijhuang@cse.nsysu.edu.tw, §
jiguo@cs.ccu.edu.tw
Abstract
Multi-core system is becoming the next generation embedded
design platform. Heterogeneous and homogeneous processor
cores integrated in Multiple Instruction Multiple Data
(MIMD) System-on-a-Chip (SoC) to provide complex
services, e.g. smart phones, is coming up in the horizon.
However, distributed programming is a difficult problem in
such systems. Today, only in very few MIMD SoC designs we
can find comprehensive multi-core software/hardware
co-debug capability that can stop at not only software but
also hardware breakpoints to inspect data and system status
for identifying bugs. In this work we have integrated various
debug mechanisms so that the entire multi-core SoC is able
to iterate unlimited times of software and hardware breaks
for data and status inspections and stepping forward to
resume execution till next break point. This debug
mechanism is realized with a chip with four ARM1176 cores
and ARM CoreSightTM
on-chip debug and trace system, a
Field Programmable Gate Array (FPGA) loaded with
on-chip test architecture and bus monitor, and software
debug platform to download system trace and processor core
data for inspection and debug control.
Key contributions of this work are (1) a development of
multi-clock multi-core software/hardware co-debug platform
and (2) the exercise of a multi-core program debugging to
visualize the physical behavior of race conditions.
1. Multi-core Programming and Debugging
Multi-core system is becoming the next generation embedded
design platform. Heterogeneous or homogeneous processor
cores integrated in a System-on-a-Chip (SoC) to build small
form factor platforms and provide complex services, e.g. smart
phones, are coming up in the horizon. Smart phones provide
various domains of applications in the fashion of distributed
computing and thus the multi-core architecture is generally a
Multiple Instruction Multiple Data (MIMD) type design [1] to
deliver different software with wide range of resource and
performance requirements. However, unlike parallel
programming on homogeneous and Single Instruction Multiple
Data (SIMD) architectures [1], where the same program runs
on multiple processor cores to process different sets of data,
the distributed programming is an extremely difficult problem
in MIMD architectures. Today, only in very few MIMD SoC
designs we may find comprehensive multi-core
software/hardware co-debug capability [2]. Ideally, the
architecture not only needs to support software but also
hardware breaks and visibility.
Figure 1 gives a simple example of SoC described in [3]
to illustrate software complexity faced in multi-core MIMD
designs. Figure 1(a) is the target system specification described
in a task graph. AP1 fetches encoded image data from an input
source, pre-processes the data, delivers to AP2 for
post-processing and then sends the decoded image for display.
Figure 1(b) is a MIMD dual core implementation of the given
task graph. By design, AP1 runs on an ARM core and fetches
the encoded image data stored on a USB flash memory
through the USB port. After the pre-processing, AP1 stores the
data to a shared SRAM, then notifies DSP core the “data write
complete” message to execute AP2 by issuing an interrupt
using an OS system call. The DSP core receives the interrupt
which triggers the Interrupt Service Routine (ISR) to initiate
AP2 to read data from the shared SRAM, post process it and
send to the frame buffer of the LCD Display.
(a) Task Graph
(b) A Dual Core Implementation
Figure 1, An example of dual core system implementation
However let us consider a race condition scenario.
Assume the data passing between AP1 and AP2 is not
controlled by a mutual exclusive mechanism to guarantee the
AP1-write-before-AP2-read order. The scenario goes that AP1
writes off the last block of data to the shared SRAM and
immediately issues the “data write complete” interrupt to the
DSP core. Let us also assume that bus AHB0 has a lower
priority than bus AHB1 on the Inter Connection Module (ICM)
arbiter and the data write from AP1 is blocked due to other
AHB1 requests issued by the DSP core. The interrupt request
is served in the highest priority by the DSP core and thus the
“data write complete” interrupt triggers AP2 earlier than the
last AP1 data being stored on the shared SRAM. Since there is
no mutual exclusive mechanism in place to prevent AP2 from
reading data before it is ready, the race condition thus occurs.
Imagine for the programmer to debug the problem, s/he needs
to have the debug controllability and visibility into ARM and
DSP cores to track programs executions: the visibility into
ICM to see how it serves AHB0 and ABH1, and the timing
view to learn that the interrupt happens before the AP1 data
1
2. being written to the shared memory to identify the root cause
of the race condition.
To support the debug capability needed to identify the
race condition described above, the multi-core SoC in question
needs to have a way to set a breakpoint at the end of the AP1
data write to break the complete SoC. The debug mechanism
then allows the designer to
1. inspect AP1 and AP2 programs,
2. view ARM and DSP cores status and data,
3. check components AHB0, AHB1, ICM and shared
SRAM status and data, and
4. step the SoC through the execution to see the
interactions among programs, cores, components and
busses to visualize the race condition
In this work we integrated various sub-systems to
complete a multi-core software/hardware co-debug platform to
deliver above said features. We realized the platform by
implementing a quad-ARM1176 SoC with CoreSightTM
[4] to
hook up with an on-chip test mechanism and AHB/AXI bus
monitor. To validate the platform implemented, a multi-core
programming exercise was also conducted to develop a 3D
image application on this co-debug platform.
In Section 2 we discuss ARM CoreSightTM
. Section 3
describes a multi-clock on-chip test architecture that has the
capabilities to set hardware breakpoints and break, view
functional unit control register data, cycle step and resume.
Section 4 illustrates an AHB/AXI bus monitor that is a
Verification Intellectual Property (VIP) capable of alerting
erroneous AHB/AXI transactions and conducting trace dump.
Section 5 shows the integration of the multi-core
software/hardware co-debug platform with ARM CoreSightTM
,
multi-clock on-chip test architecture and AHB/AXI bus
monitor. Section 6 illustrates the exercise in multi-core 3D
image application programming and debugging using the
co-debug platform developed and Section 7 finalizes this work
with conclusion and future research.
2. ARM CoreSightTM
Figure 2, ARM CoreSightTM
debugging environment
ARM CoreSightTM
is an on-chip component developed by
ARM to support multi-core cross triggering, which allows a
core on hitting a breakpoint to break all other cores. It is done
by a general Cross Trigger Matrix (CTM) and individual Cross
Trigger Interface (CTI) on each core. ARM has developed CTI
for ARM9, ARM11 and Cortex families. CTI is used for debug
control and ARM core status and register viewing.
CoreSightTM
also supports trace dump. Each core dumps
its trace through its own Embedded Trace Macrocell (ETM)
onto the Advanced Microcontroller Bus Architecture (AMBA)
Trace Bus (ATB) and to trace port through the Trace Port
Interface Unit (TPIU). The trace dump can further provide
complete core information for debug purposes.
DS-5, the ARM debugging tool for CoreSightTM
and
beyond, controls program debug and trace dump through
DSTREAM, the In-Circuit Emulator (ICE) of CoreSightTM
, via
the Joint Test Action Group (JTAG) port, Debug Access Port
(DAP), Debug APB and into ETM and CTI.
CoreSightTM
does not restrict its support only to ARM
core families. By following ETM and CTI protocols, one can
also develop the ETM and CTI for other cores like the DSP in
Figure 2. This external core is thus controlled by DS-5 and
integrated into the debug environment. This is how we hook
the on-chip test architecture introduced in next section with
CoreSightTM
.
3. On-Chip Debug Architecture
Following the Moore’s Law, the integrated circuit (IC)
technology doubles its gate density every eighteen months. At
28nm technology the gate density has reached 4.2M gates per
mm2
. With such a high capacity we can start to consider
putting self testing ability onto the chip. The development of
on-chip test architecture is studied in [5, 6]. We leverage this
on-chip test architecture also for debug purposes in this work.
The side band test bus and test port can be used for component
core register inspection [7]. By adding multiple clock gating
and stepping mechanism, we can implement hardware break,
component register data viewing and cycle stepping to support
hardware debug capability.
3.1 Overall On-Chip Debug Architecture
Figure 3, Overall architecture of SoC debug platform
Figure 3 shows the overall architecture of the on-chip
debug platform which consists of both software and hardware
components. The embedded processor (ARM 1176) is
employed to execute the software program through the JTAG
port and ICE with the debugger tool in PC-host. The
instruction memory is used to store the instructions to be
executed while the CUD (Core Under Debug) data memory is
used to store the required data for Intellectual Property (IP)
2
3. application and the operational results of the IP.
The IP cores are wrapped with the IEEE 1500 wrappers
[8] that support core-level testing with parallel scan capability.
The Test Access Mechanism Controller (TAM Controller or
TAMC) generates debug control signals to control the debug
procedure for IP cores. It also buffers the traced data and stores
them to a local memory. The dedicated test bus connects the
wrapped CUDs with the TAMC for the transfer of the control
signals and the traced data.
To integrate the debug platform with the ARM CoreSight
on-chip debug and trace architecture, a customized CTI
module and an AHB-APB bridge are added to this platform.
The CTI module can deliver a debug request signal (DBGRQ)
to let the TAM Controller enter into the debug mode. During
the debug mode, the TAM Controller stops the CUD when
hitting the break point and dumps the contents of the CUD to
the local memory. It can also compare the obtained data with
the golden data retrieved from external or embedded memory.
These operations are controlled by a debug tool called
DASTEP which is stored in the PC-host. The user can thus
examine the test results immediately. After finishing the debug
function, the TAM Controller delivers an acknowledge signal
(DBGACK) to the CTI module. The bus bridge is needed
because the CTI module is compliant with AMBA APB
protocol.
The software is composed of a user-provided application
program and a debug program. The application program
executes the functional operation of the system. The debug
program contains the setup date to initialize the TAM
Controller, which is also generated by DASTEP.
3.2 Multiple Clock Gating and Stepping
The main issue to gate and step multiple clocks is clock
synchronization. As an example of two cores with 100MHz
and 125MHz clocks respectively, we can find synchronous
positive edges every 0.5 milliseconds. When we gate at the
first synchronous positive edge, every synchronous step has to
be 0.5 milliseconds away. Too many events can happen in this
period of time and the resolution is too low for meaningful
hardware/software debug. We have investigated this problem
by carefully examining the relationship between the clock rates
of interactive cores and are now able to identify much more
cycles that can be “safely” stopped and resumed. Thus instead
of breaking at synchronous positive edges and stepping the
least common multiple of clock frequencies, we gate clocks at
the identified safe instances that are usually the same as or just
a few cycles away from the break points. With this breaking
mechanism, even though all the clocks may have different
phase shifts toward its last positive edge, they can be resumed
synchronously and continued correctly without any glitches.
3.4 DASTEP and Debug Procedure
DASTEP is the control Graphical User Interface (GUI) of
the on-chip debug architecture. It can set hardware breakpoints
on components and view control register values and cycle step
the system. DASTEP also can dump and view component trace
data.
Using DASTEP to run debugging procedure, first we
should determine which CUD at which clock domain is to be
observed. Then we set the cycle-based hardware breakpoint
and wait for the traced data. The traced data will be transferred
to the PC-host through the UART mechanism when the
breakpoint matches. The traced data can be displayed and
compared to the golden data. In the following, the debug
procedure using the GUI is described in detail.
Figure 4 shows the overview of the graphic user interface
in DASTEP. By clicking the MCD hardware Breakpoint item
under the Debug Platform entry, the setup window as shown
in Figure 5 appears, which allows the user to select the cores to
be debugged (○1 ), to set the first breakpoint (○2 ), and to select
the master clock domain for the reference of the break point
cycle (○3 ). The user can then click “Apply” to enter the debug
information, “Run” to start the debug session or “Cancel” to
cancel the setup information.
Figure 4, Overview of graphic user interface
Figure 5, Setup window of MCD hardware breakpoint
insertion
After starting the debug session, the PC-host is waiting for
receiving the traced data. Once the breakpoint occurs, the
display window will display the traced data in a control and
trace window as shown in Figure 6. The user can now examine
the traced data in the window (○1 ). Four operations are
supported here (○3 ): “Browse & Save” to store the traced
results, “Select Register” to select the registers to be displayed,
“Terminate Debug Mode” to continue the functional
operation of the CUD, and “Cancel” to cancel the control and
display window.
After examining the information of the current breakpoint,
the user can then continue for the next breakpoint set-up (○2 )
by entering the next breakpoint cycle in the “Next Breakpoint
Cycle” column, clicking “Run” to resume the system and let it
stop at the next breakpoint, clicking “Receive” to wait and
receive the traced data, or clicking “Single Step” to continue
the debug session in a cycle by cycle manner.
It is worth mentioning that an open source software,
Gtk-wave, is employed to help show the trace data with
waveform-based display, as shown in Figure 7.
3
4. Figure 6, Control and display window
Figure 7, Waveform-based displays of traced data
4. AHB/AXI Bus Monitor
The bus monitor consists of a protocol checker, a bus tracer
and a trace memory. Figure 8 shows the bus monitor
architecture in the red block modules. The protocol checker
detects real-time bus protocol error or inefficiency. The bus
tracer captures on-chip bus signals at many levels of
abstraction and performs real time compression. The trace
memory is used to store compressed traces. The protocol check
and the bus tracer could collaborate with each other. For
example, when the protocol checker detects a bus protocol
error, it triggers the bus tracer to start/stop monitor activity and
store the trace data into the trace memory.
In [9] an AHB bus monitor is developed. Later the
technology evolved and the AXI bus is also supported. These
two works developed a hardware VIP to help verifying
components with AHB and/or AXI interfaces. Figure 9 shows
the AXI trace verification method. In the simulation
environment, the AXI VIP produces AXI interconnection cycle
accurate behavior. The AXI tracer passively captures signals
from the VIP, and compresses the trace data stored in the trace
memory. The bus analyzer decompresses trace result. We
compared trace data between simulation direct dump data and
decompression trace result to verify the AXI tracer. Similarly,
the AXI protocol checker verification is also based on the AXI
VIP [10]. This monitor did AXI rule checking and reported an
error message when an AXI violation was found. A circular
buffer then dumped its data which was a bus trace around 1000
cycles before the violation.
The bus monitor has been available for both AHB and AXI
buses [9-10]. Figure 8 shows the AXI monitor integrated into
the SoC debug platform. The AXI bus is the center of the SoC.
The AHB bus is used as a debug bus for the SoC. The PCI
interface is a communication channel between the debug bus
of the SoC and the debug software running on a PC. The debug
commands from the debug software are translated into AHB
master commands by a PCI2AHB transactor to configure and
access the AXI monitor components. Once an error has been
signaled, the debug architecture thus notifies CoreSightTM
and
breaks the whole SoC. The user can then cycle step and view
program, processor core, component and bus data and status to
identify the problem.
PCI Interface of PC-host
PCI 2 AHB Transactor
Master
I/F
Slave
I/F
Debug Bus(Based on AHB Bus)
ARM
926EJ-S
Wrapper
AXI
Checker
AXI
Tracer
Trace
Memory
Wrapper Wrapper Wrapper
SMCLMC
SRAM
PCI I/F
ROM
ROM
I/F
AXI Interconnect
ARM 1126
Wrapper
CUD
Wrapper
Trigger
Event
Memory
Wrapper
TAMC I/F
TAM
Controller
UART
Figure 8, Integration bus monitor in the SoC debug platform
Figure 9 is the bus monitor analyzer software running on
the debug PC to configure and access the bus monitor. There
are four windows showing (1) a multi-resolution waveform
viewer, (2) an access control signal analyzer, (3) an
address/data timing distribution analyzer, and (4) a bus state
transition analyzer.
(1)Multi-Resolution WaveformViewer (2)Access Control Signals Analyzer
(3)Address/DataTiming Distribution (4)Bus Transition ModelAnalyzer
Figure 9, Bus monitor analyzer
5. Multi-core Software/Hardware Co-debug
Platform
The multi-core software/hardware co-debug platform
4
5. described in previous sections was only the hardware side of
the system and integration with the software debugger is also
required. Our first step was to connect to CoreSightTM
using
DS-5 and DSTREAM. We then set breakpoints in ARM1176
programs and break all four ARM1176 and on-chip test
architecture once a breakpoint was hit. With DS-5 we can
inspect program data. With DASTEP we can view component
control register data and with AHB/AXI bus monitor we can
dump and view bus trace data. The final step is to use
DASTEP to set hardware breakpoints and break four
ARM1176 program executions for debug purposes as well. A
user friendly multi-core software/hardware co-debug platform
is thus completed.
6. Experiment
We use a 3D depth map generation project to verify the
co-debug platform. The front-end of the 3D depth map
generation is a high profile H.246 decoder [11, 12]. The 3D
depth map generation [13, 14] transforms 2D video images
into 3D view with depth map for 3D video viewing. The
development is started with single threaded C++ code and
moved into MIMD programs. Through the help of the
multi-core co-debug platform we realize the need for a
hardware implemented H.264 decoder to cope with 3D depth
map generation using all four ARM1176 to play a high profile
video in real time.
6.1 Algorithm of 3D Depth Map Generation
Figure 10 shows the 3D depth map generation algorithm.
It generates the depth maps in good quality for most 2D
images. In addition, the processing steps in the proposed
algorithm have been optimized for reducing its complexity
while preserving good quality. The encapsulated low
complexity techniques are introduced below.
Figure 10, Proposed depth map generating algorithm
We first use Sobel mask to get the edge information of the
input image for detecting vanishing lines in the next step. We
optimize the Sobel mask formula to reduce about 65%
computational complexity with quality results. Then, we use
the 5×5 Hough transform to detect vanishing lines. After
Hough transform, we classify the input images into three types,
which are Normal (with vanishing point), Scenery (with
sky/mountain), and Close-up. By the proposed classification
method, we use different methods to generate the depth map
with good quality.
In Normal type, we calculate the intersection point of
vanishing lines. After calculating all intersection points of
vanishing lines, we use an 8x8 region to group the nearest
points in the image which is also called the vanishing region
(VR). According to the position of VR, we generate the
Gradient Depth Map (GDM) according to the distance between
every pixel and the VR for the “normal” type of images. For
the “Scenery” type, we define that the top of the image is the
VR to generate the GDM. In Scenery type, we assign a static
GDM since the sky or mountain is always in the top of the
image. In Close-up type, we only adopt a block-based contrast
filtering to classify the background and foreground objects.
Finally, Joint Bilateral Filtering (JBF) is used to
post-process the merged depth map by strengthening the edge
information of the objects related to the original image in the
proposed algorithm. We optimize each step in the proposed 3D
depth map generation algorithm and achieve about 90% of
complexity reduction in terms of execution time as compared
to the original ones.
6.2 Parallelization of 3D Depth Map Generation
For the sake of realizing the 3D depth map generation on
the multi-core platform, we propose a parallel 3D video
playing system as shown in Figure 11. In this system, we use
one thread to perform decoding of H.264 video, three threads
to perform 3D depth map generating, and one thread to collect
the depth map from each 3D depth map generator. At the front
end of the proposed system, H.264 decoder will decode the
bit-stream and produce video. In the following, the decoded
video is delivered to each 3D depth map generator frame by
frame. Finally, the pseudo display collects all the depth maps
in order.
H.264 Decoder
3D Depth Map
Generator
3D Depth Map
Generator
3D Depth Map
Generator
3D Depth Map
Generator
Pseudo Display
File
H.264
Bitstream
FIFO
FIFO
FIFO
FIFO
FIFO
FIFO
FIFO
FIFO
Figure 11, Proposed parallel 3D video playing system
In order to ensure the correctness for the execution of the
proposed parallel 3D video playing system, we establish a
synchronization mechanism among the threads. For this reason,
we use a synchronized FIFO to connect any two threads once
if one of them has to deliver data to another. As shown in
Figure 12, the proposed synchronized FIFO is essentially a
circular FIFO carried out based on a producer-consumer
mechanism. The front end of the synchronized FIFO is
connected to the thread that plays the producer while the rear
end of the synchronized FIFO is connected to the thread that
plays the consumer. At the producer end, data can be written to
the FIFO anytime except the FIFO is full. Similarly, data can
be read from the FIFO at the consumer end anytime except the
FIFO is empty. Once any thread is not permitted to access the
FIFO, it has to wait until getting the permission. With such a
synchronization mechanism, we can easily realize the
synchronization in the proposed parallel 3D video playing
system.
In the following, we make a description of how the
proposed synchronized FIFO achieves the synchronization
between two threads. Figure 13 shows the pseudo code for the
synchronization at the producer end of the synchronized FIFO.
At first, the thread at the producer end checks if the FIFO is
full. When the FIFO is full, the thread has to wait until the
5
6. FIFO is not full. After getting the access permission, the thread
starts to write its data to the FIFO. Finally, the thread calls a
confirmation function that will update the information
recorded in the FIFO and issue a notification signal to the
thread at the consumer end of the FIFO. Similarly, Figure 14
shows the pseudo code for the synchronization at the consumer
end of the synchronized FIFO. The thread at the consumer end
performs almost the same steps as mentioned for the producer
end except it checks if FIFO is empty rather than full.
Figure 12, Proposed synchronized FIFO
Figure 13, Pseudo code for the synchronization at the producer
end of the synchronized FIFO
Figure 14, Pseudo code for the synchronization at the
consumer end of the synchronized FIFO
6.3 Performance Evaluation
In this section, we discuss about the performance
improvement for the proposed parallel 3D video playing
system. Figure 15 shows the performance for different
configurations. The test video we use is in CIF resolution and
contains 300 frames in total. Under the configuration of using
one thread for H.264 decoding and three threads for 3D depth
map generating, the processing speed of the proposed system
can achieve 27.75 fps 3D video display in CIF resolution.
Figure 15, Performance evaluation for the proposed parallel
3D video playing system
7. Conclusion and Future Work
In this work we have built a generic multi-core
software/hardware co-debug platform, which is the framework
to design future multi-core SoC with MIMD programming and
debug support. The hardware system designed in this work is a
prototype and not a stand alone multi-core SoC. We like to
deploy the platform as an IP onto a commercial multi-core
SoC. Also with this platform we like to discover never before
seen physical multi-core programming issues like race
condition and deadlock. The outcome of this research can be
applied to validate many distributed computation theories and
enhance algorithms to solve many more problems.
Reference
[1] M. Flynn, "Some Computer Organizations and Their
Effectiveness". IEEE Trans. Computer. C-21: 948, 1972
[2] A. Mayer, H. Siebert and C. Lipsky, “Multicore Debug
Solution IP,” an IPextreme white paper 2007,
http://www.ip-extreme.com/downloads/MCDS_whitepap
er_070523.pdf
[3] A. Su, “Application of ESL Synthesis on GSM Edge
algorithm for base station,” Proc. ASP-DAC’10, January
2010, pp. 732~737
[4] CoreSight
TM
Components Technical Reference Manual,
http://infocenter.arm.com/help/topic/com.arm.doc.ddi031
4h/DDI0314H_coresight_components_trm.pdf
[5] K-J Lee, C-Y Chu and Y-T Hong, “An Embedded
Processor Based SOC Test Platform,” Proc. International
Symposium on Circuits and Systems, pp. 2983~2986,
2005
[6] W-C Huang, C-Y Chang and K-J Lee, “Toward
Automatic Synthesis of SoC Test Platform,” Proc. VLSI
Design, Automation and Test, pp. 1~4, 2007
[7] K-J Lee, S-Y Liang and A. Su, “A Low-Cost SoC Debug
Platform Based on On-Chip Test Architecture,” Proc.
SOC Conference, 00. 161~164, 2009
[8] IEEE, “1500-2005 IEEE Standard Testability Method for
Embedded Core-Based Integrated Circuits,” E-ISBN
0-7381-4694-3, print ISBN 0-7381-4693-5, IEEE 2005.
[9] Y-T Lin, W-C Shiue and I-J Huang, “A Multi-resolution
AHB Bus Tracer for Real-time Compression of
Forward/Backward Traces in a Circular Buffer,” Proc.
DAC'08, pp. 862~865, 2008
[10] C-H Chen, J-C Ju, and I-J Huang, “A Synthesizable AXI
Protocol Checker for SoC Integration,” IEEE
International SoC Design Conference (ISOCC'10),
Incheon, Korea, Nov. 2010.
[11] Y-C Yang, and J-I Guo, “A High Throughput H.264/AVC
High Profile CABAC Decoder for HDTV Applications,”
IEEE Transactions on Circuits and Systems for Video
Technology, vol. 19, no. 9, pp. 1395-1399, September
2009
[12] K Xu, T-M Liu, J-I Guo, and C-S Choy, “Methods for
Power/Throughput/Area Optimization of H.264/AVC
Decoding,” Journal of Signal Processing Systems, Vol. 60,
No. 1, pp. 131-145, July 2010
[13] C-A Chien, C-Y Chang, J-S Lee, J-H Chang, and J-I Guo,
“Low Complexity 3D Depth Map Generation for Stereo
Applications,” Proc. 2010 VLSI Design/CAD
Symposium, Kaohsiung, Taiwan, August 3-6, 2010.
[14] C-A Chien, C-Y Chang, J-S Lee, J-H Chang and J-I
Guo, ”Low Complexity 3D Depth Map Generation for
Stereo Applications,” Proc. ICCE’11, Jan. 9-12, Las
Vegas, USA, 2011
6