3. What is DSP
What is [a] DSP? In brief, DSPs are processors or microcomputers
whose hardware, software, and instruction sets are optimized for high-
speed numeric processing applications— an essential for processing
digital data representing analog signals in real time.
. What a DSP does is straightforward. When acting as a digital filter,
for example, the DSP receives digital values based on samples of a
signal, calculates the results of a filter function operating on these
values, and provides digital values that represent the filter output. The
DSP’s high-speed arithmetic and logical hardware is programmed to
rapidly execute algorithms modeling the filter transformation.
5. Key Difference
The combination of design elements—arithmetic operators, memory
handling, instruction set, parallelism, data addressing—that provide
this ability forms the key difference between DSPs and other kinds of
processors.
The real-time signal comes to the DSP as a train of individual samples
from an analog-to-digital converter (ADC). To do filtering in real-time,
the DSP must complete all the calculations and operations required
for processing each sample (usually updating a process involving
many previous samples) before the next sample arrives. To perform
high-order filtering of real-world signals having significant frequency
content calls for really fast processors
6. General features of DSP
Efficient ALU and MAC Units (Multiple)
Harward or Super-Harward Architecture
Extended Precision in Computational Units
Hardware Looping
Efficient and fast peripherals
Circular Buffering
High Speeds of operation
7. Continued….
Fast Multipliers
Multiple Execution Units
Efficient Memory Access
-Harward and Super Harward architectures
Data Format
-Fixed point and Floating point
Zero Overhead Looping
Streaming I/O
Specialized Instruction Sets
8. Introduction
Blackfin processors embody a new type of 16/32-bit
embedded processor designed specifically to meet the
computational demands and power constraints of today’s
embedded audio, video, automotive,
industrial/instrumentation, and communications
applications
Blackfin processors combine a 32-bit RISC instruction set,
dual 16-bit multiply accumulate (MAC) digital signal
processing functionality, and 8-bit video processing
performance.
15. Features
Up to 600Mhz High Performance Processor
2 16 bit MAC’s, 2 40-bit ALU’s, 4 8-bit Video ALU’s and a 40bit
Shifter
0.85 V to 1.30 V core VDD with on-chip voltage regulation
1.8 V, 2.5 V, and 3.3 V compliant I/O
Up to 148K bytes of on-chip memory which can be used as a
cache or SRAM and having both data and code banks
External Memory controller with glue less support for SDRAM,
SRAM , flash and ROM
Multiple booting Options from SPI and Parallel Flash
16. Peripherals and Units
Dynamic Power management Unit
Direct Memory Access
SPI interface
Parallel Port Interface
Serial Port Controllers
UART
Programmable Flags
Timers and RTC
EBIU(External Bus Interface Unit)
17. Core
the Blackfin processor core contains two 16-bit
multipliers, two 40-bit accumulators, two 40-bit ALUs,
four video ALUs, and a 40-bit shifter. The computation
units process 8-bit, 16-bit, or 32-bit data from the register
file.
The compute register file contains eight 32-bit registers.
When performing compute operations on 16-bit operand
data, the register file operates as 16 independent 16-bit
registers.
The ALUs perform a traditional set of arithmetic and
logical operations on 16-bit or 32-bit data.
18. Each MAC can perform a 16-bit by 16-bit multiply in each cycle,
accumulating the results into the 40-bit accumulators. Signed
and unsigned formats, rounding, and saturation are supported.
The 40-bit shifter can perform shifts and rotates and is used to
support normalization, field extract, and field deposit
instructions.
The program sequencer controls the flow of instruction execu-
tion, including instruction alignment and decoding. For
program flow control, the sequencer supports PC relative and
19. Hardware is provided to support zero-over-head
looping. The architecture is fully interlocked,
meaning that the programmer need not manage the
pipeline when executing instructions with data
dependencies.
20. Operating Modes
The architecture provides three modes of operation:
user mode,
supervisor mode
emulation mode.
User mode has restricted access to certain system
resources, thus providing a protected software
environment, while supervisor mode has unrestricted
access to the system and core resources
Emulation Mode is used for Testing Purposes only
23. Booting
The Process of
loading of internal
memories of the
processor by using
external memories
by using itself is
called Booting
The processor is
having 2 Boot pins
BMODE0,1 so, it will
support 4 Boot
Modes
Which are shown in
side window
24. Dynamic Power Management Unit
The Processor has 3 Power Domains
VDDEXT(Peripherals)
VDDINT(Core)
VDDRTC(RTC)
And it has 2 CLOCK domains
Peripherals will work with SCLK and the Core will work
with CCLK
The Processor has Internal PLL by using which we can
get multiple frequency of operations by just changing
register values
25. The dynamic power management feature of the ADSP-
BF531/ ADSP-BF532/ADSP-BF533 processor allows both the
proces-sor’s input voltage (VDDINT) and clock frequency
(fCCLK) to be dynamically controlled.
Different Applications require Different Clock speeds,
According to the Clock speed the VDDINT will be reduced
thereby reducing overall power dissipation
Because of this feature Blackfin processors are used in Low
Power applications
26. Different Modes Available
Different
applications
requires different
types of modes.
These power modes
will offer different
levels of power
savings.
Hibernate Mode will
be having High
power saving where
as Full-On will be
having less power
savings with more
performance
44. Building and Running the Project
Build the project by
performing one of these
actions.
• Click the Build Project
button or
• From the Project menu,
choose Build Project.
Or Click the Rebuild All
button ( ) to build the
project.
The C source file opens in an
editor window, and execution
halts at the main ()
At the End we will be seeing
“Build completed
successfully.”
Press F5 to run the project
47. Loader file Settings
Choose :
boot mode as flash/
PROM, Boot
Format as ASCII
and Output width
as 16 bit.
Choose a folder
for an output file .
After changing the
options again
Rebuild All
48.
49. C Language
Advantages:
C is much cheaper to develop. ( encourages experimentation )
C is much cheaper to maintain.
C is comparatively portable.
Disadvantages
ANSI C is not designed for DSP.
DSP processor designs usually expect assembly in key areas.
DSP applications continue to evolve. ( faster than ANSI Standard C )
50. Missing operations provided by software emulation
(floating point!)
C is more machine-dependent than you might think
for example: is a “short” 16 or 32 bits?
Can be a poor match for DSP – accumulators? SIMD?
Fractions?
Not really a mathematical focus. Systems
programming language
51. Increasing C Performance
Process of Performance Tuning is a Specialization of
the program for the particular hardware
Work at the Higher level first
Improve the algorithm
Make sure that algorithm suits to Architecture
Look at Machine capabilities
May have specialized instructions
52. Linear Profiling tools
Using the compiler Optimization (Automatic Compiler Optimization)
Optimizing the algorithm for the Hardware
Using the Pipeline viewer
Using the Compiler Libraries given which are already optimized
routines for the Hardware
fractional builtins
fract types fract16 and fract32
ETSI(European Telecommunications Standards Institute's fract
functions)
Fractional Arithmetic is 100 times faster than floating point arithmetic
53. Arrays and Pointers
Arrays are easier to analyse.
void va_ind(int a[], int b[], int out[], int n) {
int i;
for (i = 0; i < n; ++i)
out[i] = a[i] + b[i];
}
Pointers are closer to the hardware.
void va_ptr(int a[], int b[], int out[], int n) {
Int i,
for (i = 0; i < n; ++i)
*out++ = *a++ + *b++
}
Which produces the fastest code?
Mostly no difference Start with Array if performance not sufficient use
Pointers
54. Avoid Loop Carried dependencies
Bad: Scalar dependency.
for (i = 0; i < n; ++i) x = a[i] - x;
Value used from previous iteration. So iterations cannot
be overlapped.
Bad: Array dependency.
for (i = 0; i < n; ++i) a[i] = b[i] * a[c[i]];
Value may be from previous iteration. So iterations
cannot be overlapped.
55. Avoid Loop Carried dependencies
Using Hardware Loops
Word align your data
32-bit loads help keep compute units busy
32-bit references must be at 4 byte boundaries
Top-level arrays are allocated on 4 byte boundaries
Only pass the address of first element of arrays
Write loops that process input arrays an element at a
time
56. Use of the tools “volatile” and “const”
Volatile:
Volatile is essential for hardware or interrupt-related
data
Data is not changed by the Program it will be changed
by the hardware and used by the program
Const:
It will remove wrong access of the memory and
changing the memory contents
57. Use of Circular addressing
Use of the Key word “asm”
Replace Conditionals with Min, Max and Abs
Avoid jump statements
Avoid Division Statements: Using Shift by 2