SlideShare a Scribd company logo
1 of 28
Outline
 What is a “Soft” Processor
 What is the NIOS II?
 Architecture for NIOS II, what are the
implications
• TigerSHARC VS. NIOS II
• Pipeline Issues
• Issues related to FIR
 Hardware acceleration, using FPGA
logic
What’s is a “Soft”
Processor?
 Processor implemented in VHDL, Verilog,
etc., and downloaded onto FPGA hardware
 Can implement many parallel processors
on one FPGA
 Can use addition FPGA resources on the
same chip that is not part of the processor
core.
 NIOS II is a “Soft” Processor
Why “Soft” Processor?
 Higher level of design reuse
 Reduced obsolescence risk
 Simplified design update or change
 Increased design implementation
options
 Lower latency between processor and
FPGA components
What is NIOS II?
 Software-defined processor
 The processor core is loaded onto
FPGA
 Programmed using ‘normal’
programming tools (C, asm), not
hardware description languages
 Can use the rest of the FPGA hardware
for accelerating parts of the code
How Is NIOS II
Implemented
 The custom FPGA logic that interacts
with the processor is implemented in
Altera Quartus II
 The Avalon Interface bus (common
instruction/data bus) is implemented in
Quartus II
 The architecture is generated in Quartus
II and used for programming in Eclipse
IDE
NIOS II IDE
 Coding is implemented in Eclipse rather than
VisualDSP.
The Different NIOS II Cores
 There are 3 cores available from Altera
 NIOSII/e: Economical Core
 NIOSII/s: Standard Core
 NIOSII/f: Fast Core
What’s the Difference between
the Cores?
An LE is equivalent to a 8-1 NAND gate + 1 D-Flip Flop
An ALM is equivalent to 2 LE’s
Comparison of TigerSHARC and
NIOS II architecture
TigerSHARC Architecture
NIOS II Architecture
-thirty two 32-bit general registers, six 32-bit control registers
-variable cache based on how much FPGA space you have
-ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is
not separate like TigerSHARC
Avalon Interface
-separate address, data and control lines
-up to 1024-bit data width transfer, can be set to any width (not power of 2)
-one transfer per clock cycle.
NIOS II/f pipeline
 Six stages
 One instruction can be dispatched and/or
retired pre cycle
 Dynamic branch prediction: 2-bit branch
history table (no BTB like in TigerSHARC)
NIOS II/f pipeline
The pipeline stalls for:
• Multi-cycle instructions
• Cache misses
• Data dependencies (2 cycles between
calculating and using result)
Mispredicted branch penalty: 3 cycles
Hardware multiply
 Can use different options for multiplier
(at the processor design stage)
 No h/w multiply (saves FPGA gates)
○ Speed depends on algorithm
 Use embedded multipliers (if FPGA has
those)
○ 1-5 cycles (depends on FPGA)
 Implement multipliers on FPGA gates
○ 11 cycles
 Division 4-66 cycles on hardware
Compare to TigerSHARC
 No support for parallel instructions
 No support for SIMD operations
 Multicycle instructions stall the pipeline
All the above limitations can be overcome
by using FPGA space unoccupied by the
processor itself
Comparison of NIOS II and
TigerSHARC on an FIR Algorithm
Integer FIR algorithm
int coeff[]={1, 2, 3, 4, 5, 6, 7, 8};
int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0};
int output[8];
int i=0, j=0, k=0;
for(k=0; k<8; k++) output[k] =0;
for( j =0; j< 8; j++)
{
for( i= 0; i< 8; i++)
{
output[j] += data1[i]*coeff[7-i];
}
}
Speed analysis
0 movi r4,8 i = 8
1 Loop: ldw r2,0(r6) load data
2 ldw r3,0(r7) load coefficient
3 addi r4,r4,-1 i--
4 addi r6,r6,4 coeffPt++
5 mul r2,r2,r3 data = data * coeff
6 addi r7,r7,-4 dataPt--
7 stall data stall – waiting for multiplication
result
8 add r5,r5,r2 output += data
9 bne
r4,zero,0x10002a0
will mispredict 2 times in the
beginning, and 1 time in the end of
the loop (waste 3 cycles each time)
Speed analysis
 9 cycles per iteration except the first two
(branch predicted not taken) and the last
(branch predicted taken) – those will be
9+3=12 cycles
 1 data stall – can remove by moving
instruction from line 4 to 7
 Speed: 8 cycles * (N-3) + 11 cycles * 3 =
8*(N-3)+33 cycles
 For 1024-tap FIR: 8201 cycles
 Clock cycle is 3 times longer (200MHz vs
600MHz)
Speed comparison
• 8201 NIOS II cycles equivalent to 24603
TigerSHARC cycles
• Lab3 timing:
– 56000 cycles Debug mode
– 13000 unoptimized ASM
– 4000 Optimized ASM
Worse than unoptimized assembly, but no
hardware acceleration used, so this is not
that bad
Hardware Acceleration
 Profiling tool in Eclipse can show how
long each function takes
 If function takes too long, it can be sped
up by
 Custom instructions
 Hardware Acceleration
 Hardware Acceleration is to take the
function and transform it into FPGA
circuitry
Hardware Acceleration
 Can be done using C2H compiler from Altera
 Trades off Logic Size for Speed up.
Table 1. User Application Results Example
Algorithm Speed Increase
(vs. Nios II CPU)
System fMAX
(Mhz)
System Resource
Increase (1)
Autocorrelation 41.0x 115 124%
Bit Allocation 42.3x 110 152%
Convolution Encoder 13.3x 95 133%
Fast Fourier Transform
(FFT)
15.0x 85 208%
High Pass Filter 42.9x 110 181%
Matrix Rotate 73.6x 95 106%
RGB to CMYK 41.5x 120 84%
RGB to YIQ 39.9x 110 158%
Conclusion
 “Soft” Processors such as the NIOSII
offers another alternative in the
embedded system scene.
 The NIOSII offers the advantage of
added configurability, and customization
that blur the line between FPGAs and
DSPs
References
[1] http://www.fpgajournal.com/articles/behere.htm
Describes an FPGA-DSP project based on Altera Nios
[2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html
Official Nios II page
[3] http://www.hunteng.co.uk/dsp-fpga.htm
DSP or FPGA? What is better when?
[4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf
Article from Xilinx about FPGA DSPs
[5] http://www.niosforum.com
Community forum for NIOS
[6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf
NIOSII Processor Handbook –Altera Corporation
[7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdf
Avalon Memory-Mapped Interface Specifications – Altera Corporation
[8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html
ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded
DRAM

More Related Content

What's hot

DPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway ApplicationDPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway ApplicationMichelle Holley
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haimharryvanhaaren
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoCMacpaul Lin
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialDalton Valadares
 
Raspberry pi-3 b-v1.2-schematics
Raspberry pi-3 b-v1.2-schematicsRaspberry pi-3 b-v1.2-schematics
Raspberry pi-3 b-v1.2-schematicshacguest
 
Ins and Outs of GPIO Programming
Ins and Outs of GPIO ProgrammingIns and Outs of GPIO Programming
Ins and Outs of GPIO ProgrammingICS
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel CrashdumpMarian Marinov
 
Linux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewLinux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewRajKumar Rampelli
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelSUSE Labs Taipei
 
SFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARM
SFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARMSFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARM
SFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARMLinaro
 
ARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation WorkshopARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation WorkshopSaumil Shah
 

What's hot (20)

Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
DPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway ApplicationDPDK IPSec Security Gateway Application
DPDK IPSec Security Gateway Application
 
FPGA Introduction
FPGA IntroductionFPGA Introduction
FPGA Introduction
 
TRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch HaimTRex Traffic Generator - Hanoch Haim
TRex Traffic Generator - Hanoch Haim
 
Linux Internals - Part III
Linux Internals - Part IIILinux Internals - Part III
Linux Internals - Part III
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build Tutorial
 
Raspberry pi-3 b-v1.2-schematics
Raspberry pi-3 b-v1.2-schematicsRaspberry pi-3 b-v1.2-schematics
Raspberry pi-3 b-v1.2-schematics
 
Ins and Outs of GPIO Programming
Ins and Outs of GPIO ProgrammingIns and Outs of GPIO Programming
Ins and Outs of GPIO Programming
 
Zephyr Project - West Overview
Zephyr Project - West OverviewZephyr Project - West Overview
Zephyr Project - West Overview
 
Linux Kernel Crashdump
Linux Kernel CrashdumpLinux Kernel Crashdump
Linux Kernel Crashdump
 
Linux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewLinux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver Overview
 
Linux-Internals-and-Networking
Linux-Internals-and-NetworkingLinux-Internals-and-Networking
Linux-Internals-and-Networking
 
CUDA
CUDACUDA
CUDA
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
I2c drivers
I2c driversI2c drivers
I2c drivers
 
SFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARM
SFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARMSFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARM
SFO15-205: OP-TEE Content Decryption with Microsoft PlayReady on ARM
 
ARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation WorkshopARM IoT Firmware Emulation Workshop
ARM IoT Firmware Emulation Workshop
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 

Similar to NIOS II Processor.ppt

The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmAngie Lee
 
Introduction to FPGA, VHDL
Introduction to FPGA, VHDL  Introduction to FPGA, VHDL
Introduction to FPGA, VHDL Amr Rashed
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
Synopsys User Group Presentation
Synopsys User Group PresentationSynopsys User Group Presentation
Synopsys User Group Presentationemlawgr
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA OverviewMetalMath
 
Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA OverviewPremier Farnell
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08Neil Pittman
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA accelerationMarco77328
 
FPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and HowFPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and HowDESMOND YUEN
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems DesignReza Sameni
 

Similar to NIOS II Processor.ppt (20)

Fpg as 11 body
Fpg as 11 bodyFpg as 11 body
Fpg as 11 body
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft Processor
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 Algorithm
 
Introduction to FPGA, VHDL
Introduction to FPGA, VHDL  Introduction to FPGA, VHDL
Introduction to FPGA, VHDL
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
Synopsys User Group Presentation
Synopsys User Group PresentationSynopsys User Group Presentation
Synopsys User Group Presentation
 
H344250
H344250H344250
H344250
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Smart logic
Smart logicSmart logic
Smart logic
 
4_BIT_ALU
4_BIT_ALU4_BIT_ALU
4_BIT_ALU
 
FPGA Overview
FPGA OverviewFPGA Overview
FPGA Overview
 
Cyclone II FPGA Overview
Cyclone II FPGA OverviewCyclone II FPGA Overview
Cyclone II FPGA Overview
 
emips_overview_apr08
emips_overview_apr08emips_overview_apr08
emips_overview_apr08
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
FPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and HowFPGAs for Supercomputing: The Why and How
FPGAs for Supercomputing: The Why and How
 
Introduction to EDA Tools
Introduction to EDA ToolsIntroduction to EDA Tools
Introduction to EDA Tools
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Digital Systems Design
Digital Systems DesignDigital Systems Design
Digital Systems Design
 
Convolution
ConvolutionConvolution
Convolution
 

Recently uploaded

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

NIOS II Processor.ppt

  • 1.
  • 2. Outline  What is a “Soft” Processor  What is the NIOS II?  Architecture for NIOS II, what are the implications • TigerSHARC VS. NIOS II • Pipeline Issues • Issues related to FIR  Hardware acceleration, using FPGA logic
  • 3. What’s is a “Soft” Processor?  Processor implemented in VHDL, Verilog, etc., and downloaded onto FPGA hardware  Can implement many parallel processors on one FPGA  Can use addition FPGA resources on the same chip that is not part of the processor core.  NIOS II is a “Soft” Processor
  • 4. Why “Soft” Processor?  Higher level of design reuse  Reduced obsolescence risk  Simplified design update or change  Increased design implementation options  Lower latency between processor and FPGA components
  • 5. What is NIOS II?  Software-defined processor  The processor core is loaded onto FPGA  Programmed using ‘normal’ programming tools (C, asm), not hardware description languages  Can use the rest of the FPGA hardware for accelerating parts of the code
  • 6. How Is NIOS II Implemented  The custom FPGA logic that interacts with the processor is implemented in Altera Quartus II  The Avalon Interface bus (common instruction/data bus) is implemented in Quartus II  The architecture is generated in Quartus II and used for programming in Eclipse IDE
  • 7.
  • 8. NIOS II IDE  Coding is implemented in Eclipse rather than VisualDSP.
  • 9. The Different NIOS II Cores  There are 3 cores available from Altera  NIOSII/e: Economical Core  NIOSII/s: Standard Core  NIOSII/f: Fast Core
  • 10. What’s the Difference between the Cores? An LE is equivalent to a 8-1 NAND gate + 1 D-Flip Flop An ALM is equivalent to 2 LE’s
  • 11. Comparison of TigerSHARC and NIOS II architecture
  • 13. NIOS II Architecture -thirty two 32-bit general registers, six 32-bit control registers -variable cache based on how much FPGA space you have -ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is not separate like TigerSHARC
  • 14. Avalon Interface -separate address, data and control lines -up to 1024-bit data width transfer, can be set to any width (not power of 2) -one transfer per clock cycle.
  • 15. NIOS II/f pipeline  Six stages  One instruction can be dispatched and/or retired pre cycle  Dynamic branch prediction: 2-bit branch history table (no BTB like in TigerSHARC)
  • 16. NIOS II/f pipeline The pipeline stalls for: • Multi-cycle instructions • Cache misses • Data dependencies (2 cycles between calculating and using result) Mispredicted branch penalty: 3 cycles
  • 17.
  • 18. Hardware multiply  Can use different options for multiplier (at the processor design stage)  No h/w multiply (saves FPGA gates) ○ Speed depends on algorithm  Use embedded multipliers (if FPGA has those) ○ 1-5 cycles (depends on FPGA)  Implement multipliers on FPGA gates ○ 11 cycles  Division 4-66 cycles on hardware
  • 19. Compare to TigerSHARC  No support for parallel instructions  No support for SIMD operations  Multicycle instructions stall the pipeline All the above limitations can be overcome by using FPGA space unoccupied by the processor itself
  • 20. Comparison of NIOS II and TigerSHARC on an FIR Algorithm
  • 21. Integer FIR algorithm int coeff[]={1, 2, 3, 4, 5, 6, 7, 8}; int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0}; int output[8]; int i=0, j=0, k=0; for(k=0; k<8; k++) output[k] =0; for( j =0; j< 8; j++) { for( i= 0; i< 8; i++) { output[j] += data1[i]*coeff[7-i]; } }
  • 22. Speed analysis 0 movi r4,8 i = 8 1 Loop: ldw r2,0(r6) load data 2 ldw r3,0(r7) load coefficient 3 addi r4,r4,-1 i-- 4 addi r6,r6,4 coeffPt++ 5 mul r2,r2,r3 data = data * coeff 6 addi r7,r7,-4 dataPt-- 7 stall data stall – waiting for multiplication result 8 add r5,r5,r2 output += data 9 bne r4,zero,0x10002a0 will mispredict 2 times in the beginning, and 1 time in the end of the loop (waste 3 cycles each time)
  • 23. Speed analysis  9 cycles per iteration except the first two (branch predicted not taken) and the last (branch predicted taken) – those will be 9+3=12 cycles  1 data stall – can remove by moving instruction from line 4 to 7  Speed: 8 cycles * (N-3) + 11 cycles * 3 = 8*(N-3)+33 cycles  For 1024-tap FIR: 8201 cycles  Clock cycle is 3 times longer (200MHz vs 600MHz)
  • 24. Speed comparison • 8201 NIOS II cycles equivalent to 24603 TigerSHARC cycles • Lab3 timing: – 56000 cycles Debug mode – 13000 unoptimized ASM – 4000 Optimized ASM Worse than unoptimized assembly, but no hardware acceleration used, so this is not that bad
  • 25. Hardware Acceleration  Profiling tool in Eclipse can show how long each function takes  If function takes too long, it can be sped up by  Custom instructions  Hardware Acceleration  Hardware Acceleration is to take the function and transform it into FPGA circuitry
  • 26. Hardware Acceleration  Can be done using C2H compiler from Altera  Trades off Logic Size for Speed up. Table 1. User Application Results Example Algorithm Speed Increase (vs. Nios II CPU) System fMAX (Mhz) System Resource Increase (1) Autocorrelation 41.0x 115 124% Bit Allocation 42.3x 110 152% Convolution Encoder 13.3x 95 133% Fast Fourier Transform (FFT) 15.0x 85 208% High Pass Filter 42.9x 110 181% Matrix Rotate 73.6x 95 106% RGB to CMYK 41.5x 120 84% RGB to YIQ 39.9x 110 158%
  • 27. Conclusion  “Soft” Processors such as the NIOSII offers another alternative in the embedded system scene.  The NIOSII offers the advantage of added configurability, and customization that blur the line between FPGAs and DSPs
  • 28. References [1] http://www.fpgajournal.com/articles/behere.htm Describes an FPGA-DSP project based on Altera Nios [2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html Official Nios II page [3] http://www.hunteng.co.uk/dsp-fpga.htm DSP or FPGA? What is better when? [4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf Article from Xilinx about FPGA DSPs [5] http://www.niosforum.com Community forum for NIOS [6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf NIOSII Processor Handbook –Altera Corporation [7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdf Avalon Memory-Mapped Interface Specifications – Altera Corporation [8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded DRAM

Editor's Notes

  1. Intro: Traditionally we have a dsp, and it interacts with other modules, usual other asics. Then we have SOCs, integrate other logics to improve latency. Now we have FPGAs, added reconfiguration. Well, we want to integrate that too. SOPCs: system on a programmable chip. This is what the NIOS II is suppose to do. What happens when we want to integrate a dsp on an sopc system. (we have a thing called a hard processor)
  2. Yay outline! Basically, the concept, how it looks like in software
  3. Similar to how a verilog wire circuit can be put on a fpga to allow for high configurability, a soft processor is a processor implemented on a fpga. This is different than a hard processor, which is a processor implemented in hardware. Soft processor is a logical schematic (software) that can be loaded onto any fpga. So a soft processor isn’t really a processor, but just a schematic (or code like software). This gives it all the advantages of software such as giving updates and improving the development cycle. Well, why do you want to do this? Isn’t an fpga slower clocked, high power consumption…
  4. No, not more power hungry because it can be better customized for the application, slower clocked doesn’t mean slower, it means more has to be done in a cycle, and an fpga allows the developer to customize it to make instructions finish in one cycle. Plus you get all the other advantages.
  5. It is a special schematic designed by altera that interacts very well with other altera IP mega blocks.
  6. Well, if the processor is in software, how do you write programs for it? So are you basically writing software for software? Doesn’t this seem somewhat redundant? Yes, exactly, it does seem a bit redundant. But it is the current model of soft processor right now, perhaps there will be a better programming environment for it later. What you need to do is write the processor (bus and fpga logic) in software first using quartus, make an emulation file, and use that to write your dsp program in ecilipse. (there is no hardware optimizer, like an assembler optimizer)
  7. Here is what it looks like for quartus. You need to define the schematic. At the top you have your clock source. The middle is your avalon interface, and the bottom is your FPGA logic.
  8. Here is your NIOS II IDE environment. Now you take your emulated file and program for it like VDSP. So if the processor is in software, does that mean you can do simulation analysis, and not hardware like in the labs? No… you can run the generated processor on an FPGA and have this connect to the FPGA when it runs.
  9. So exactly, what does altera give you as the basic architecture for you to customize? 3 cores of different features. Here are the specs…
  10. Notice it is very similar to a MIPS processor we learned in other classes.
  11. Print off sheet to list the architecture features
  12. Print sheet to list of architecture All the ports on the right actually share one bus, the avalon archtecture.
  13. -separate address, data and control lines. No need to decode data for address. -up to 1024-bit data width transfer, can be set to any width (not power of 2) -synchronous operation -dynamic bus sizing: this means no design consideration when address items that have different bus widths. -one transfer per clock cycle. -The Avalon Interface is basically an interface that creates a common interface from different interfaces of the all the memory and peripheral components of the system. Are there bus issues because it’s one common interface? No… it’s a special inteface. With dedicated memory ports.
  14. Cost Vs. Performance: niosII package $495for a year + $150 for cyclone II fpga, C2H is $3000/computer TigerSharc VDSP is $3500/computer + $750 for evaluation board tigerSHARC