This document discusses DSP architectures and their suitability for digital signal processing. It describes the basic components of a processor and how DSP processors are optimized for common DSP operations like multiplication, addition, delays, and array handling. It explains features like parallel multiply-add units, specialized register structures, and efficient memory addressing modes. Finally, it covers different memory architectures for DSPs, including the Harvard architecture and modified von Neumann architecture, which allow multiple simultaneous memory accesses needed for DSP algorithms.
Presents features of ARM Processors, ARM architecture variants and Processor families. Further presents, ARM v4T architecture, ARM7-TDMI processor: Register organization, pipelining, modes, exception handling, bus architecture, debug architecture and interface signals.
Harvard Arch,Multiplier and multiplier Accumulator,Single Cycle MAC Unit,Modified Bus Structure and Memory Access scheme in PDSP,SIMD,VLIW Arch,CICS Vs RISC Vs VLIW,Pipelining
Describes ARM7-TDMI Processor Instruction Set. Explains classes of ARM7 instructions, syntax of data processing instructions, branch instructions, load-store instructions, coprocessor instructions, thumb state instructions.
found this one in one of my abandoned folders. AC(students from JUCSE need no introduction but for others you should never want to know him :-O) assigned this task to me and 3 of my fellow classmates to create a presentation on this uninteresting and weird topic. We pulled it off however :P
Presents features of ARM Processors, ARM architecture variants and Processor families. Further presents, ARM v4T architecture, ARM7-TDMI processor: Register organization, pipelining, modes, exception handling, bus architecture, debug architecture and interface signals.
Harvard Arch,Multiplier and multiplier Accumulator,Single Cycle MAC Unit,Modified Bus Structure and Memory Access scheme in PDSP,SIMD,VLIW Arch,CICS Vs RISC Vs VLIW,Pipelining
Describes ARM7-TDMI Processor Instruction Set. Explains classes of ARM7 instructions, syntax of data processing instructions, branch instructions, load-store instructions, coprocessor instructions, thumb state instructions.
found this one in one of my abandoned folders. AC(students from JUCSE need no introduction but for others you should never want to know him :-O) assigned this task to me and 3 of my fellow classmates to create a presentation on this uninteresting and weird topic. We pulled it off however :P
Keypad is a common interface with any microcontroller. This presentation gives details of keypad can be interfaced with 8051. The key pressed may be dispalyed on LCD/7 segment/LED displays.
Salient Features:
The magnitude response is nearly constant(equal to 1) at lower frequencies
There are no ripples in passband and stop band
The maximum gain occurs at Ω=0 and it is H(Ω)=1
The magnitude response is monotonically decreasing
As the order of the filter ‘N’ increases, the response of the filter is more close to the ideal response
Keypad is a common interface with any microcontroller. This presentation gives details of keypad can be interfaced with 8051. The key pressed may be dispalyed on LCD/7 segment/LED displays.
Salient Features:
The magnitude response is nearly constant(equal to 1) at lower frequencies
There are no ripples in passband and stop band
The maximum gain occurs at Ω=0 and it is H(Ω)=1
The magnitude response is monotonically decreasing
As the order of the filter ‘N’ increases, the response of the filter is more close to the ideal response
Signal processing and Wireless communication basics and advanced techniques in that area are presented.
contact Us:+91 9360212155
Mail:embeddedplusproject@gmail.com
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
Read this week's top 5 news updates in deep learning and AI: Gartner predicts top 10 strategic technology trends for 2018; Oracle adds GPU Accelerated Computing to Oracle Cloud Infrastructure; chemistry and physics Nobel Prizes are awarded to teams supported by GPUs; MIT uses deep learning to help guide decisions in ICU; and portfolio management firms are using AI to seek alpha.
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
Check out these exercises: http://de.slideshare.net/nicolayludwig/8-cpp-stack-automaticmemoryandstaticmemory-38510742
- Introducing CPU Registers
- Function Stack Frames and the Decrementing Stack
- Function Call Stacks, the Stack Pointer and the Base Pointer
- C/C++ Calling Conventions
- Stack Overflow, Underflow and Channelling incl. Examples
- How variable Argument Lists work with the Stack
- Static versus automatic Storage Classes
- The static Storage Class and the Data Segment
W8_1: Intro to UoS Educational ProcessorDaniel Roggen
Introduction to the University of Sussex Educational Processor
In this unit we introduce the UoS educational processor and it's key components, including register banks, ALU, IO and memory interfaces, etc.
Unit duration: 50mn.
License: LGPL 2.1
digital signal processing
Computer Architectures for signal processing
Harvard Architecture, Pipelining, Multiplier
Accumulator, Special Instructions for DSP, extended
Parallelism,General Purpose DSP Processors,
Implementation of DSP Algorithms for var
ious operations,Special purpose DSP
Hardware,Hardware Digital filters and FFT processors,
Case study and overview of TMS320
series processor, ADSP 21XX processor
Unit-1_Digital Computers, number systemCOA[1].pptxVanshJain322212
Data representation: Number System, Big Endian and Little Endian, r complement and r-1 complement arithmetic, Unsigned and Signed number representation, Signed Arithmetic- Addition, Subtraction, Multiplication (Booth Algorithm), Division, Barrel Shifter, Fixed and Floating point representation. Block Diagram for Digital Computers: CPU (Registers, ALU, Clock, Control unit), Memory, Memory hierarchy; Different types of memory in brief: Primary (RAM-Static and Dynamic, ROM, DDR2, DDR3, DDR4, NAND Flash, NOR Flash (Samsung memory datasheet) I/O subsystems, Common Bus System (External and Internal Bus: Address Bus, Data Bus and Control Bus); Computer Organization; Computer Architecture; Introduction to Vonn Neumann and Harvard Architecture, Micro operations (Arithmetic, Logical and Shift micro operations using online simulators), Arithmetic Logic and Shift unit (ALU).
Chap2 - ADSP 21K Manual - Processor and Software OverviewSethCopeland
This is a sample of a manual I developed while at Wideband for a software math and science digital signal processing library for the Analog Devices ADSP-21K. It contains the overview section and a good amount of technical discussion useful for the programmer to understand about the processor and its register interface before commencing programming. This was a useful product feature because there was complete disclosure on Wideband\'s part to make manual useful at the programmer\'s discretion.
8 bit Microprocessor with Single Vectored InterruptHardik Manocha
SoC consists of instruction memory, main memory and microprocessor unit. Instructions are fetched using PC and as per the instruction, main memory and register memory are accessed. 8 bit data bus is built. Working on developing programs to look for microprocessor operation.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Basic Processor Structure
• Here we see a very simple processor structure - such as
might be found in a small 8-bit microprocessor.
12 DEC 01 ECSE 6620 - Jason Stripinis2(jasonstripinis@eng
3. Basic Processor Functions
• ALU
– Arithmetic Logic Unit - this circuit takes two operands on the
inputs (labeled A and B) and produces a result on the output
(labeled Y).
– The operations will usually include, as a minimum:
• add, subtract
• and, or, not
• shift right, shift left
• ALUs in more complex processors will execute many more
instructions.
12 DEC 01 ECSE 6620 - Jason Stripinis3(jasonstripinis@eng
4. Basic Processor Functions
• Register File
– A set of storage locations (registers) for storing temporary results.
Early machines had just one register (accumulator). Modern RISC
processors will have at least 32 registers.
• Instruction Register
– The instruction currently being executed by the processor is stored
here.
• Control Unit
– The control unit decodes the instruction in the instruction register
and sets signals which control the operation of most other units of
the processor. For example, the operation code (opcode) in the
instruction will be used to determine the settings of control signals
for the ALU which determine which operation (+,-,^,v,~,shift,etc)
it performs.
12 DEC 01 ECSE 6620 - Jason Stripinis4(jasonstripinis@eng
5. Basic Processor Functions
• Clock
– The vast majority of processors are synchronous, that is, they use a
clock signal to determine when to capture the next data word and
perform an operation on it. In a globally synchronous processor, a
common clock needs to be routed (connected) to every unit in the
processor.
• Program counter
– The program counter holds the memory address of the next
instruction to be executed. It is updated every instruction cycle to
point to the next instruction in the program. Branch instructions
change the program counter by other than a simple increment.
12 DEC 01 ECSE 6620 - Jason Stripinis5(jasonstripinis@eng
6. Basic Processor Functions
• Memory Address Register
– This register is loaded with the address of the next data word to be
fetched from or stored into main memory.
• Address Bus
– Transfers addresses to memory and memory-mapped peripherals.
It is driven by the processor acting as a bus master.
• Data Bus
– Carries data to and from the processor, memory and peripherals. It
will be driven by the data source, i.e. processor, memory, etc.
• Multiplexed Bus
– To limit device pin counts and bus complexity, some processors
MUX address and data onto the same bus, with an adverse affect
on performance.
12 DEC 01 ECSE 6620 - Jason Stripinis6(jasonstripinis@eng
7. DSP Implementations
• DSP Algorithm
– Series of mathematical operations that are applied to process a
sequence of digital signals sampled from the real (analog) world
• Application examples
– Filtering
– FFT
– Noise cancellation
– Spectral Processing
12 DEC 01 ECSE 6620 - Jason Stripinis7(jasonstripinis@eng
8. Why is special architecture good for
digital signal processing?
• DSPs are tailored to run DSP algorithms efficiently.
• Special functions to handle DSP algorithm demands:
– Unique data access patterns
• Streams of data requiring high bandwidth
• Low data repetition but high code repetition
– Math operation focus (“number cruncher”)
– Real-time constraints
– Power and size constraints
– Cost requirement
– Attention to numeric effects (limited fixed point error)
12 DEC 01 ECSE 6620 - Jason Stripinis8(jasonstripinis@eng
9. DSP Functional Characteristics
• Typically require a few specific operations
• Consider a FIR Filter :
This requires:
–additions & multiplications
–delays
–array handling
12 DEC 01 ECSE 6620 - Jason Stripinis9(jasonstripinis@eng
10. DSP Typical Operations
• Additions & Multiplications
– fetch two operands
– perform the addition or multiplication (or both)
– store the result
• Delays
– store the result for later use
• Array Handling
– fetch values from consecutive memory locations
– copy data from register to register
12 DEC 01 ECSE 6620 - Jason Stripinis10
(jasonstripinis@eng
11. DSP Typical Operations
• To perform these basic operations most DSPs:
– have a parallel multiply and add
– have multiple memory accesses (to fetch two operands and store the
result)
– have sufficient registers to hold data temporarily
– efficient address generation for array handling
– special features such as delays or circular addressing
12 DEC 01 ECSE 6620 - Jason Stripinis11
(jasonstripinis@eng
12. DSP Arithmetic Logic Unit
• Most DSP operations require additions and multiplications
together. So DSP processors usually have parallel
hardware adders and multipliers which can be used with a
single instruction:
12 DEC 01 ECSE 6620 - Jason Stripinis12
(jasonstripinis@eng
13. Register Structure
• Delays require that intermediate values be held for later
use.
• For example, when keeping a running total - the total can
be kept within the processor to avoid wasting repeated
reads from and writes to memory.
• For this reason DSP processors have lots of registers which
can be used to hold intermediate values.
• Registers may be fixed-point or floating-point.
12 DEC 01 ECSE 6620 - Jason Stripinis13
(jasonstripinis@eng
14. Memory Addressing
• Array handling requires that data can be fetched efficiently
from consecutive memory locations.
• For this reason DSP processors have address registers
which are used to hold addresses and can be used to
generate the next needed address efficiently.
• Usually, the next needed address can be generated during
the data fetch or store operation, and with no overhead.
12 DEC 01 ECSE 6620 - Jason Stripinis14
(jasonstripinis@eng
15. Memory Addressing
• Example DSP address generation operations:
Instruction Name Description
read the data pointed to by the address in
*rP register indirect
register rP
having read the data, postincrement the address
*rP++ postincrement
pointer to point to the next value in the array
having read the data, postdecrement the address
*rP-- postdecrement pointer to point to the previous value in the
array
having read the data, postincrement the address
*rP++rI register postincrement pointer by the amount held in register rI to point
to rI values further down the array
having read the data, postincrement the address
*rP++rIr bit reversed pointer to point to the next value in the array, as
if the address bits were in bit reversed order
12 DEC 01 ECSE 6620 - Jason Stripinis15
(jasonstripinis@eng
16. Memory Architectures for DSP
• For arithmetic the DSP needs to fetch two operands in a
single instruction cycle.
• Since we also need to store the result and to read the
instruction itself more than two memory accesses per
instruction cycle are needed.
• Even the simplest DSP operation - an addition involving
two operands and a store of the result to memory - requires
four memory accesses (three to fetch the two operands and
the instruction, plus a fourth to write the result)
12 DEC 01 ECSE 6620 - Jason Stripinis16
(jasonstripinis@eng
17. Memory Architectures for DSP
• DSP processors usually support multiple memory accesses
in the same instruction cycle.
• It is not possible to access two different memory addresses
simultaneously over a single memory bus.
• There are two common methods to achieve multiple
memory accesses per instruction cycle:
• Harvard architecture
• modified von Neumann architecture
12 DEC 01 ECSE 6620 - Jason Stripinis17
(jasonstripinis@eng
18. Memory Architectures for DSP
(Harvard Architecture)
• The Harvard architecture has two separate physical
memory buses, allowing two simultaneous memory
accesses.
• The true Harvard architecture dedicates one bus for
fetching instructions, with the other available to fetch
operands.
• This is inadequate for DSP operations, which usually
involve at least two operands. So DSP Harvard
architectures usually permit the 'program' bus to be used
also for access of operands.
12 DEC 01 ECSE 6620 - Jason Stripinis18
(jasonstripinis@eng
19. Memory Architectures for DSP
(Harvard Architecture)
• Note that it is often necessary to fetch three things - the
instruction plus two operands - and the Harvard
architecture is inadequate to support this.
• So DSP Harvard architectures often also include a cache
memory which can be used to store instructions which will
be reused, leaving both Harvard buses free for fetching
operands.
• The Harvard architecture plus cache - is sometimes called
an extended Harvard architecture or Super Harvard
ARChitecture (SHARC).
12 DEC 01 ECSE 6620 - Jason Stripinis19
(jasonstripinis@eng
20. Memory Architectures for DSP
(Harvard Architecture)
• The Harvard architecture requires two memory buses. This
makes it expensive to bring off the chip - for example a
DSP using 32 bit words and with a 32 bit address space
requires at least 64 pins for each memory bus - a total of
128 pins if the Harvard architecture is brought off the chip.
This results in very large chips, which are difficult to
design into a circuit.
12 DEC 01 ECSE 6620 - Jason Stripinis20
(jasonstripinis@eng
21. Memory Architectures for DSP
(von Neumann Architecture)
• The von Neumann architecture uses only a single memory
bus. This is relatively cheap, requiring less pins that the
Harvard architecture, and simple to use because the
programmer can place instructions or data anywhere
throughout the available memory.
• But it does not permit multiple memory accesses.
• The modified von Neumann architecture allows multiple
memory accesses per instruction cycle by running the
memory clock faster than the instruction cycle.
12 DEC 01 ECSE 6620 - Jason Stripinis21
(jasonstripinis@eng
22. Memory Architectures for DSP
(von Neumann Architecture)
• Each instruction cycle is divided into multiple 'machine
states' and a memory access can be made in each machine
state, permitting a multiple memory accesses per
instruction cycle.
• The modified von Neumann architecture permits all the
memory accesses needed to support addition or
multiplication: fetch of the instruction; fetch of the two
operands; and storage of the result.
12 DEC 01 ECSE 6620 - Jason Stripinis22
(jasonstripinis@eng
23. Why use a special architecture for
digital signal processing?
The Answers
Unique data access patterns Bit reversed addressing (FFT)
Streams of data requiring high Multiple access memory
bandwidth architecture
Low data repetition but high Eliminate data cache (save $$)
code repetition
Math operation focus MAC instruction
Vector processing unit
Real-time constraints Zero-overhead loops
Power and size constraints Limited addition function
units (unlike GPP)
Cost requirement On-board peripherals (SOC)
Attention to numeric effects ALU with 16-bit operands and
(limited fixed point error) 32-bit result
12 DEC 01 ECSE 6620 - Jason Stripinis23
(jasonstripinis@eng
24. DSP Generations
• 1st Generation (1979-1982)
– Transition from experimental signal processors
• 2nd Generation (1985-1986)
– Move from co-processor to stand-alone processor
• 3rd Generation (1987-1989)
– Major hardware improvements to speed
• 4th Generation (1990-1996)
– More on-chip integration (ADC, DAC, memory, multi-processor)
• 5th Generation (1997-)
12 DEC 01 ECSE 6620 - Jason Stripinis24
(jasonstripinis@eng
25. DSP Generations
1st Generation (1979-1982)
• Primarily targeted at digital filtering
• Specialized co-processor for signal processing
• NMOS (n-Channel Metal Oxide Semi) fabrication
• 16-bit fixed point
• fast multiplier (and adder)
• Harvard architecture
• Specialized Instruction set
12 DEC 01 ECSE 6620 - Jason Stripinis25
(jasonstripinis@eng
26. DSP Generations
1st Generation (1979-1982)
• Example = Texas Instruments TMS32010
– 16-bit fixed point
– Harvard architecture
– two Address registers
– one A register (adder)
– one P register (multiplier)
– one T register (data shift on delay line)
– No zero-overhead loop
– Specialized Instruction set
– MAC Time 400 ns (<100 ns today)
– 50 ms per 1024-FFT
12 DEC 01 ECSE 6620 - Jason Stripinis26
(jasonstripinis@eng
27. DSP Generations
1st Generation (1979-1982)
• Example = Texas Instruments TMS32010
12 DEC 01 ECSE 6620 - Jason Stripinis27
(jasonstripinis@eng
28. DSP Generations
2nd Generation (1985-1986)
• Move from co-processor to stand-alone processor
• CMOS (Complementary Metal Oxide Semi) fabrication
• Double the speed of first generation
• Advances in memory architecture (more internal RAM)
• better pipelining of functional units
• address generators (bit-reversing)
• Zero-overhead loop HW
• Limited floating point in SW
12 DEC 01 ECSE 6620 - Jason Stripinis28
(jasonstripinis@eng
29. DSP Generations
2nd Generation (1985-1986)
• Example = Texas Instruments TMS32020 (1985)
– 16-bit fixed point
– Harvard architecture
– Improved TMS32010
– RPTS allows pipelined instruction performed in single cycle
– Specialized Instruction set
– MAC Time 200 ns
– 10 ms per 1024-FFT
12 DEC 01 ECSE 6620 - Jason Stripinis29
(jasonstripinis@eng
30. DSP Generations
3rd Generation (1987-1989)
• Increased floating point support
– 32-bit floating point hardware DSPs released
– Floating point emulation on fixed point processors
– IEEE754 support
• Hardware enhancements (large speed increase)
– dense CMOS fabrication
– on chip DMA
– instruction caches
– increased clock rates (first cores above 10 MHz)
• Increased complexity of SW
12 DEC 01 ECSE 6620 - Jason Stripinis30
(jasonstripinis@eng
31. DSP Generations
3rd Generation (1987-1989)
• Example = Motorola DSP56001 (1988)
– 24-bit data, instructions
– 24-bit fixed point
– 3 memory spaces (P, X, Y)
– parallel moves
– circular addressing
– MAC Time 75 ns (21 ns today)
– ~3 ms per 1024-FFT
• Other Examples:
– AT&T DSP16A
– Analog Devices ADSP-2100
– TI TMS320C50
12 DEC 01 ECSE 6620 - Jason Stripinis31
(jasonstripinis@eng
32. DSP Generations
4th Generation (1990-1996)
• Hardware integration
– ADC
– DAC
– more memory
– multiple DSPs on one chip
• Decreasing power consumption
– 5.0 VDC → 3.3 VDC → 3.0 VDC → 2.7 VDC
• GPPs start to get DSP functions
– SIMD
– Leads to Intel introducing MMX (MultiMedia eXtensions) for x86
12 DEC 01 ECSE 6620 - Jason Stripinis32
(jasonstripinis@eng
33. DSP Generations
4th Generation (1990-1996)
• Example = TI TMS320C541 (1995)
– Enhanced architecture
– Low voltage (3.3 VDC)
– More on-chip memory
– Application specific functional units
– MAC Time 20 ns (10 ns today)
– ~1 ms per 1024-FFT
• Example = TI TMS320C80
– multiple processors per chip
12 DEC 01 ECSE 6620 - Jason Stripinis33
(jasonstripinis@eng
34. The GPP Option
• High-performance general-purpose processors for PCs and
workstations are increasingly suitable for some DSP
applications.
• E.g., Intel MMX Pentium, Motorola/IBM PowerPC 604e
• These processors achieve excellent to outstanding floating
and/or fixed-point DSP performance via:
– Very high clock rates (200-500 MHz)
– Superscalar architectures
– Single-cycle multiplication and arithmetic operations
– Good memory bandwidth
– Branch prediction
– In some cases, single-instruction, multiple-data (SIMD) ops
12 DEC 01 ECSE 6620 - Jason Stripinis34
(jasonstripinis@eng
35. DSP Generations
5th Generation (1997-)
• Not the classic DSP architectures
– SIMD (Single Instruction Multiple Data stream) instructions
– VLIW (Very Long Instruction Words) allows RISC processing
• High parallelism
• Increased clock speeds
• No longer application specific functional units (no MAC FU)
• Low voltage (2.5 VDC or less, even 1.2 VDC cores)
• MAC Time 3 ns (but can be power hungry)
• GPPs start to get DSP functions
– Intel introduces MMX (MultiMedia eXtensions) for x86 in 1997
• Increased integration
– MCU and DSP cores on same chip
– MCU functions/ports added to DSPs
12 DEC 01 ECSE 6620 - Jason Stripinis35
(jasonstripinis@eng
36. DSP Generations
5th Generation (1997-)
• SIMD (Single Instruction Multiple Data) instructions
– Enhance throughput by allowing parallelism
– Requires multiple functional units and wider buses
– May support multiple data widths (different functional groups)
– Example = DSP16000
WAS SIMD
12 DEC 01 ECSE 6620 - Jason Stripinis36
(jasonstripinis@eng
37. DSP Generations
5th Generation (1997-)
• VLIW (Very Long Instruction
Words)
– Instruction Level Parallelism (ILP) can
be a major performance gain
• Superscalar implementation requires
larger die and more power to
dynamically pipeline instructions
– VLIW can be used to statically pipeline
instructions at compile time (or even by
hand!)
– VLIW instruction words have fixed
"slots" for instructions that map to the
functional units available.
12 DEC 01 ECSE 6620 - Jason Stripinis37
(jasonstripinis@eng
38. DSP Generations
5th Generation (1997-)
• VLIW Advantages
– huge theoretical pay off
• less than 1 ns per MAC!
• Less than 75 ns per 1024-FFT
• VLIW Drawbacks
– Can be very difficult to program and debug
– High power consumption if VLIW is not filled
– Code size dramatically increases requiring more program memory
12 DEC 01 ECSE 6620 - Jason Stripinis38
(jasonstripinis@eng
39. DSP Generations
5th Generation (1997-)
• VLIW Example = TI TMS320C6201
32-bit Functional Units
Lx = ALU
Sx = Branching
and shifting
Mx = Multiplier
Dx = Data Store
12 DEC 01 ECSE 6620 - Jason Stripinis39
(jasonstripinis@eng
40. DSP Generational Development
• DSP processor performance has increased by a factor of
about 400x over the past 20 years
400
350
300
250
200
150 MAC (ns)
100
50
0
1st 2nd 3rd 4th 5th
Gen Gen Gen Gen Gen
• DSP architectures will be increasingly specialized for
applications, especially communications applications
• General-purpose processors will become viable for many
DSP applications
12 DEC 01 ECSE 6620 - Jason Stripinis40
(jasonstripinis@eng