0
 
<ul><li>An Introduction to 3L Diamond </li></ul><ul><li>on </li></ul><ul><li>Sundance Hardware </li></ul>Some slides have ...
What is 3L Diamond? <ul><li>Diamond is a set of tools and other components that work together with the TI C compiler and l...
Why Diamond? <ul><li>The first response of many people when offered Diamond is: </li></ul><ul><li>“ We do not need any ext...
The Hardware <ul><li>The structure of Sundance hardware is a good place to start. </li></ul><ul><li>Sundance provides modu...
A Sundance Module
Typical Hardware
Scaling <ul><li>Sundance hardware scales: </li></ul><ul><ul><li>There are no shared resources </li></ul></ul><ul><ul><li>A...
How to Develop Applications <ul><li>Given hardware like this, the first thought will be that Code Composer from TI is idea...
Code Composer Studio <ul><li>A good platform for single-processor work. </li></ul><ul><li>No real support for multiprocess...
Building with CCS
Problem: Specification <ul><li>You have to divide your application into separate programs for each processor. </li></ul><u...
How do you load the application? <ul><li>You have to load using JTAG </li></ul><ul><li>JTAG is very slow (0.2MB/s) </li></...
Problem: Loading <ul><li>Customers need CCS (or load from ROM) </li></ul><ul><li>Difficult to develop your own host progra...
Problem: Host integration <ul><li>Host communication is with JTAG. </li></ul><ul><ul><li>very slow </li></ul></ul><ul><ul>...
Problem: Communication <ul><li>How do the processors communicate? </li></ul><ul><ul><li>No support for Sundance peripheral...
Problem: Message routing <ul><li>If two processors want to exchange data but there is no direct connection between them, t...
Problem: Deadlock <ul><li>A problem with all message routing systems is deadlocking. </li></ul><ul><li>This is when sendin...
Deadlock prevention options <ul><li>Use a proven deadlock-free system. </li></ul><ul><li>Make the user stop the program an...
Problem: The Cache <ul><li>There are problems with cache coherency </li></ul><ul><ul><li>The cache cannot maintain coheren...
Why loading may fail <ul><li>JTAG loading assumes the cache is clear. </li></ul><ul><li>This is not true with Sundance har...
Problem: Making changes <ul><li>How do you change the network? </li></ul><ul><ul><li>Rewrite sections of your code </li></...
Problem: Changing Devices <ul><li>How do you change processors? </li></ul><ul><ul><li>different device addresses </li></ul...
Problem: Choosing devices <ul><ul><li>Comports </li></ul></ul><ul><ul><li>Sundance Digital Bus (SDB) </li></ul></ul><ul><u...
Before you start coding… <ul><li>Be certain you know how to partition the problem. </li></ul><ul><li>Be certain you know h...
The advantage of CCS <ul><li>You have complete control of everything… </li></ul><ul><li>… because you have to do everythin...
CCS: Summary <ul><li>CCS works well with single processors </li></ul><ul><li>It was not designed for multiple processors <...
Diamond <ul><li>Originally designed in 1987 </li></ul><ul><ul><li>tried and tested </li></ul></ul><ul><ul><li>proven model...
Some advantages of Diamond <ul><li>Easy to use </li></ul><ul><li>Gives you flexibility: late binding </li></ul><ul><ul><li...
What Diamond is not <ul><li>Diamond is not a compiler </li></ul><ul><ul><li>we use the standard TI compiler and linker </l...
Building with Diamond <ul><li>You partition the application into tasks: </li></ul><ul><ul><li>modularity determined by the...
Building with CCS
Building with Diamond
With Diamond… <ul><li>The application is in a single file. </li></ul><ul><ul><li>Nothing can get lost. </li></ul></ul><ul>...
Diamond… <ul><li>is designed for multiprocessor systems. </li></ul><ul><li>has its own small, efficient microkernel. </li>...
Sundance TIMs
Dual-Processor Module Identical to two separate modules; there are no shared resources.
The Diamond Model <ul><li>Diamond builds applications from independent  tasks  that send data to other tasks using  channe...
CSP Communicating Sequential Processes Forget about processors
A Diamond application is… <ul><li>Tasks </li></ul><ul><ul><li>complete C programs </li></ul></ul><ul><ul><li>start at a  m...
Channels <ul><li>Many possible implementations </li></ul><ul><ul><li>memcpy – between tasks on one processor </li></ul></u...
The Hardware
A Sundance Network
Ideal Hardware <ul><li>No shared resources </li></ul><ul><ul><li>Simplifies hardware </li></ul></ul><ul><ul><li>Simplifies...
Tasks & Channels
Map onto hardware
A simple task
A simple task <ul><li>#include <chan.h> </li></ul><ul><li>INPUT_PORT(0, DATA_IN) </li></ul><ul><li>OUTPUT_PORT(0, DATA_OUT...
Team Working <ul><li>Tasks are self-contained </li></ul><ul><li>They are developed separately </li></ul><ul><li>Communicat...
Design Flow <ul><li>Network </li></ul><ul><ul><li>Tasks </li></ul></ul><ul><ul><li>Channels  </li></ul></ul>
Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul>Sources
Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul>Tasks
Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul><ul><li>Configuration ...
Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul><ul><li>Configuration ...
Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul><ul><li>Configuration ...
Running an application
Demonstration Hardware SMT365 SMT370 SMT374 SMT361 Only the SMT365 and the SMT361 will be used in the examples.
A Correlator Example
Code Each Task <ul><li>OUTPUT_PORT(2, COR_DATA)  </li></ul><ul><li>INPUT_PORT (1, COR_RESULT) </li></ul><ul><li>. . .  </l...
Configuration <ul><li>Write a configuration file to: </li></ul><ul><li>Describe the hardware </li></ul><ul><ul><li>process...
Task names <ul><li>TASK  example2  </li></ul><ul><li>TASK  mainctrl  </li></ul><ul><li>TASK  disp_raw  </li></ul><ul><li>T...
Task ports <ul><li>TASK  example2  INS=3  OUTS=7 </li></ul><ul><li>TASK  mainctrl  INS=1  OUTS=1  </li></ul><ul><li>TASK  ...
Task stack & heap <ul><li>TASK  example2  INS=3  OUTS=7  DATA=500K </li></ul><ul><li>TASK  mainctrl  INS=1  OUTS=1  DATA=2...
Task starting priorities <ul><li>TASK  example2  urgent  INS=3  OUTS=7  DATA=500K </li></ul><ul><li>TASK  mainctrl  INS=1 ...
Channel creation <ul><li>!  channel  output port  input port </li></ul><ul><li>!  =======  ===========  ========== </li></...
The processor & placement <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>… </li></ul><ul><li>PLACE  mainctrl  Root <...
Processor types <ul><li>Diamond supports all of the Sundance TIMs. The  ProcType  utility will display them all. </li></ul>
A note about memory <ul><li>With CCS you need to: </li></ul><ul><ul><li>specify memory explicitly. </li></ul></ul><ul><ul>...
Building & Running <ul><li>Compile each task with the command:  3L C </li></ul><ul><li>Link each task with the command:  3...
Making it run faster
Use a second processor We shall use TIM1 (SMT365) and TIM4 (SMT361) connected by comports 0 & 3 respectively.
Demonstration Hardware SMT365 SMT370 SMT374 SMT361
Use a second processor <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>… </li></ul><ul><li>PLACE  mainctrl  Root </li...
Use a second processor <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR Node   SMT361 </li></ul><ul><li>… </...
Use a second processor <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR Node  SMT361 </li></ul><ul><li>WIRE ...
Use a second processor <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR Node  SMT361 </li></ul><ul><li>WIRE ...
Notes <ul><li>The two tasks have not changed in any way. </li></ul><ul><li>Their connections have not changed. </li></ul><...
Making it go even faster
Use the FPGA on the SMT365 <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR F  FPGA </li></ul><ul><li>… </li...
The FPGA is already being used <ul><li>The FPGA is also used to support functions on the SMT365 DSP. </li></ul><ul><li>Att...
Use the FPGA <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR F  FPGA  ATTACH=Root </li></ul><ul><li>… </li>...
Use the FPGA <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR F  FPGA  ATTACH=Root </li></ul><ul><li>WIRE W1...
Use the FPGA <ul><li>PROCESSOR Root  SMT365_8_1 </li></ul><ul><li>PROCESSOR F  FPGA  ATTACH=Root </li></ul><ul><li>WIRE W1...
FPGA Tasks <ul><li>Placing a task on an FPGA instructs the configurer to look for an FPGA version of the task. </li></ul><...
Building with FPGA <ul><li>The configurer will construct a Xilinx project for the FPGA </li></ul><ul><li>It will call the ...
Conclusion <ul><li>Diamond does a lot of the work for you. </li></ul><ul><li>Diamond allows you to change your mind and al...
 
Upcoming SlideShare
Loading in...5
×

Overview

242

Published on

xx

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
242
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Transcript of "Overview"

    1. 2. <ul><li>An Introduction to 3L Diamond </li></ul><ul><li>on </li></ul><ul><li>Sundance Hardware </li></ul>Some slides have extra information as notes.
    2. 3. What is 3L Diamond? <ul><li>Diamond is a set of tools and other components that work together with the TI C compiler and linker to support applications using multiprocessor hardware. </li></ul><ul><li>Sundance hardware is well-suited Diamond’s way of dealing with multiprocessors and the combination provides the most rapid way to get your application running efficiently. </li></ul>
    3. 4. Why Diamond? <ul><li>The first response of many people when offered Diamond is: </li></ul><ul><li>“ We do not need any extra software. </li></ul><ul><li>Code Composer Studio provides everything we need to write multiprocessor applications.” </li></ul><ul><li>Is this really true? </li></ul>
    4. 5. The Hardware <ul><li>The structure of Sundance hardware is a good place to start. </li></ul><ul><li>Sundance provides modular hardware that allows you to build complex multiprocessor systems. </li></ul><ul><li>Modules include an FPGA that is used to implement interprocessor links that allow pairs of processors to communicate. These include comports and SDBs. </li></ul>
    5. 6. A Sundance Module
    6. 7. Typical Hardware
    7. 8. Scaling <ul><li>Sundance hardware scales: </li></ul><ul><ul><li>There are no shared resources </li></ul></ul><ul><ul><li>Adding processors adds communication </li></ul></ul><ul><ul><li>No contention for shared memory or busses </li></ul></ul>
    8. 9. How to Develop Applications <ul><li>Given hardware like this, the first thought will be that Code Composer from TI is ideal for developing applications. </li></ul><ul><li>We shall now investigate this thought. </li></ul>
    9. 10. Code Composer Studio <ul><li>A good platform for single-processor work. </li></ul><ul><li>No real support for multiprocessors. </li></ul><ul><ul><li>CCS is really a single-processor system </li></ul></ul><ul><ul><li>You have to treat each processor separately. </li></ul></ul><ul><li>You build separate programs for each processor as follows: </li></ul>
    10. 11. Building with CCS
    11. 12. Problem: Specification <ul><li>You have to divide your application into separate programs for each processor. </li></ul><ul><ul><li>Modularity should be driven by the program structure. </li></ul></ul><ul><ul><li>You should not use the hardware structure. </li></ul></ul><ul><li>Difficult to use several developers: </li></ul><ul><ul><li>only one program for each processor </li></ul></ul><ul><li>Difficult to test components </li></ul><ul><ul><li>hard to make each processor work in isolation </li></ul></ul>
    12. 13. How do you load the application? <ul><li>You have to load using JTAG </li></ul><ul><li>JTAG is very slow (0.2MB/s) </li></ul><ul><li>You have all the parts of your application as separate .out files, one for each processor. </li></ul><ul><li>You have to load these, one at a time. </li></ul><ul><ul><li>it is very easy to load the wrong processor </li></ul></ul><ul><ul><li>it is very easy to forget to load a processor </li></ul></ul><ul><ul><li>instructions for your users are complicated </li></ul></ul>
    13. 14. Problem: Loading <ul><li>Customers need CCS (or load from ROM) </li></ul><ul><li>Difficult to develop your own host program </li></ul><ul><li>You can’t use JTAG from a program. </li></ul><ul><li>You must use a separate mechanism to allow processors to communicate. </li></ul><ul><ul><li>This means you have to maintain two, unrelated networks: </li></ul></ul><ul><ul><ul><li>JTAG chain for loading </li></ul></ul></ul><ul><ul><ul><li>I/O network for communication </li></ul></ul></ul>
    14. 15. Problem: Host integration <ul><li>Host communication is with JTAG. </li></ul><ul><ul><li>very slow </li></ul></ul><ul><ul><li>very difficult to add your own host code </li></ul></ul><ul><li>Need to use other devices </li></ul><ul><ul><li>need to write host driver code </li></ul></ul><ul><ul><li>how to start the host code & DSP code? </li></ul></ul>
    15. 16. Problem: Communication <ul><li>How do the processors communicate? </li></ul><ul><ul><li>No support for Sundance peripherals </li></ul></ul><ul><ul><li>Need to write device drivers </li></ul></ul><ul><ul><ul><li>Learn device details </li></ul></ul></ul><ul><ul><ul><li>Manage EDMA </li></ul></ul></ul><ul><ul><ul><li>Deal with EDMA coherency problems </li></ul></ul></ul><ul><ul><ul><li>Manage interrupts </li></ul></ul></ul><ul><ul><ul><li>Learn the tricks to make them run fast </li></ul></ul></ul>
    16. 17. Problem: Message routing <ul><li>If two processors want to exchange data but there is no direct connection between them, the data will have to be routed through intermediate nodes. </li></ul><ul><li>How do you do this? </li></ul><ul><li>How do you construct routing tables? </li></ul><ul><ul><li>by hand? </li></ul></ul><ul><ul><li>build in knowledge of the processor network? </li></ul></ul>
    17. 18. Problem: Deadlock <ul><li>A problem with all message routing systems is deadlocking. </li></ul><ul><li>This is when sending data from one processor to another has to wait for data to be transmitted between another pair of processors, but that transmission needs to wait for the first to complete! </li></ul>
    18. 19. Deadlock prevention options <ul><li>Use a proven deadlock-free system. </li></ul><ul><li>Make the user stop the program and change parameters each time a deadlock happens. </li></ul><ul><li>Hope it never happens. </li></ul><ul><li>The most common technique is: </li></ul><ul><li>Be completely unaware deadlock can happen. </li></ul>
    19. 20. Problem: The Cache <ul><li>There are problems with cache coherency </li></ul><ul><ul><li>The cache cannot maintain coherence between: </li></ul></ul><ul><ul><ul><li>external memory </li></ul></ul></ul><ul><ul><ul><li>EDMA transfers </li></ul></ul></ul><ul><li>Transfers must handle cache coherency </li></ul><ul><ul><li>you cannot turn the cache off </li></ul></ul><ul><ul><li>cache errors are very hard to find </li></ul></ul><ul><li>You have to sort out all these problems. </li></ul>
    20. 21. Why loading may fail <ul><li>JTAG loading assumes the cache is clear. </li></ul><ul><li>This is not true with Sundance hardware. After reset, a bootloader is loaded from ROM and executed. This initialises the processor and configures the FPGA to implement the inter-processor communication links. </li></ul><ul><li>The code for the bootloader gets into the cache. JTAG loads behind the cache, leading to inconsistencies that prevent programs running. </li></ul>
    21. 22. Problem: Making changes <ul><li>How do you change the network? </li></ul><ul><ul><li>Rewrite sections of your code </li></ul></ul><ul><ul><li>Are there enough EDMA channels? </li></ul></ul><ul><ul><ul><li>only 4 external interrupt lines for synchronisation </li></ul></ul></ul><ul><ul><ul><li>what if you use more than 4 devices? </li></ul></ul></ul><ul><ul><ul><ul><li>host comport (2 devices) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>comport to another processor (2 devices) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>SDB to another processor (2 devices) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>that is already 6 devices </li></ul></ul></ul></ul>
    22. 23. Problem: Changing Devices <ul><li>How do you change processors? </li></ul><ul><ul><li>different device addresses </li></ul></ul><ul><ul><li>different memory sizes </li></ul></ul><ul><ul><li>different memory addresses </li></ul></ul><ul><ul><li>different initialisation requirements </li></ul></ul><ul><li>With CCS: rewrite sections of your code. </li></ul>
    23. 24. Problem: Choosing devices <ul><ul><li>Comports </li></ul></ul><ul><ul><li>Sundance Digital Bus (SDB) </li></ul></ul><ul><ul><li>Rocket I/O </li></ul></ul><ul><li>You need to learn how to use them. </li></ul><ul><li>You need to write & maintain device drivers. </li></ul><ul><li>You need to change your code to use them. </li></ul>
    24. 25. Before you start coding… <ul><li>Be certain you know how to partition the problem. </li></ul><ul><li>Be certain you know how much memory you need. </li></ul><ul><li>Be certain you know which modules you need. </li></ul><ul><li>Be certain of the system topology. </li></ul><ul><li>… because it will be very hard to change. </li></ul>
    25. 26. The advantage of CCS <ul><li>You have complete control of everything… </li></ul><ul><li>… because you have to do everything yourself </li></ul><ul><li>… and this takes a lot of time and experience. </li></ul>
    26. 27. CCS: Summary <ul><li>CCS works well with single processors </li></ul><ul><li>It was not designed for multiple processors </li></ul><ul><li>You have to do all the hard work </li></ul><ul><li>Knowledge gets built into the application: </li></ul><ul><ul><li>processor types </li></ul></ul><ul><ul><li>memory layout </li></ul></ul><ul><ul><li>I/O devices being used </li></ul></ul><ul><ul><li>connections between processors </li></ul></ul><ul><li>It is very hard to make significant changes. </li></ul>
    27. 28. Diamond <ul><li>Originally designed in 1987 </li></ul><ul><ul><li>tried and tested </li></ul></ul><ul><ul><li>proven model </li></ul></ul><ul><li>Designed for multiprocessor systems </li></ul><ul><li>Designed for simplicity </li></ul><ul><li>Designed for efficiency </li></ul><ul><ul><li>during development </li></ul></ul><ul><ul><li>during execution </li></ul></ul>
    28. 29. Some advantages of Diamond <ul><li>Easy to use </li></ul><ul><li>Gives you flexibility: late binding </li></ul><ul><ul><li>easy to change topology </li></ul></ul><ul><ul><li>easy to change modules </li></ul></ul><ul><li>Reduces housekeeping </li></ul><ul><ul><li>memory usually allocated for you </li></ul></ul><ul><ul><li>interrupts handled for you </li></ul></ul><ul><ul><li>loading managed for you </li></ul></ul><ul><ul><li>communication details managed for you </li></ul></ul><ul><ul><li>processor issues handled for you </li></ul></ul>
    29. 30. What Diamond is not <ul><li>Diamond is not a compiler </li></ul><ul><ul><li>we use the standard TI compiler and linker </li></ul></ul><ul><li>Diamond is not a simulator or an interpreter </li></ul><ul><ul><li>real, optimised code is generated </li></ul></ul><ul><li>Diamond is not DSP/BIOS </li></ul><ul><ul><li>it has it’s own optimised kernel, designed for multiprocessor operation </li></ul></ul><ul><ul><li>it does not have or need a large API </li></ul></ul>
    30. 31. Building with Diamond <ul><li>You partition the application into tasks: </li></ul><ul><ul><li>modularity determined by the needs of the application; you ignore processors here. </li></ul></ul><ul><li>Diamond adds an extra configuration step. </li></ul><ul><li>The configurer: </li></ul><ul><ul><li>can see the whole application </li></ul></ul><ul><ul><li>can optimise communication and device access. </li></ul></ul><ul><ul><li>builds a single output file; nothing can get lost. </li></ul></ul><ul><ul><li>arranges to load from this single file. </li></ul></ul>
    31. 32. Building with CCS
    32. 33. Building with Diamond
    33. 34. With Diamond… <ul><li>The application is in a single file. </li></ul><ul><ul><li>Nothing can get lost. </li></ul></ul><ul><ul><li>You cannot get loading wrong. </li></ul></ul><ul><ul><li>Loading is easy </li></ul></ul><ul><ul><ul><li>load from the host </li></ul></ul></ul><ul><ul><ul><li>no need for ROM during development </li></ul></ul></ul><ul><ul><ul><li>development is fast </li></ul></ul></ul>
    34. 35. Diamond… <ul><li>is designed for multiprocessor systems. </li></ul><ul><li>has its own small, efficient microkernel. </li></ul><ul><li>has a small but effective API. </li></ul><ul><li>is optimised for target hardware: </li></ul><ul><ul><li>it knows about different modules </li></ul></ul><ul><ul><li>it automatically inserts optimised device drivers </li></ul></ul><ul><ul><li>it handles interrupts </li></ul></ul><ul><ul><li>it handles memory and the cache </li></ul></ul><ul><li>is very good at communication </li></ul><ul><li>leaves you free to concentrate on your code. </li></ul>
    35. 36. Sundance TIMs
    36. 37. Dual-Processor Module Identical to two separate modules; there are no shared resources.
    37. 38. The Diamond Model <ul><li>Diamond builds applications from independent tasks that send data to other tasks using channels . </li></ul><ul><li>This model is based upon CSP: Communicating Sequential Processes. </li></ul>
    38. 39. CSP Communicating Sequential Processes Forget about processors
    39. 40. A Diamond application is… <ul><li>Tasks </li></ul><ul><ul><li>complete C programs </li></ul></ul><ul><ul><li>start at a main function </li></ul></ul><ul><ul><li>fully linked (but relocatable) </li></ul></ul><ul><ul><li>input & output ports for connecting channels </li></ul></ul><ul><ul><ul><li>unlimited number of ports </li></ul></ul></ul><ul><ul><li>Multi-threaded </li></ul></ul><ul><li>Channels </li></ul><ul><ul><li>data transfer mechanisms </li></ul></ul><ul><ul><li>transfer data from one task to one other </li></ul></ul><ul><ul><li>blocking: both ends wait for completion </li></ul></ul>
    40. 41. Channels <ul><li>Many possible implementations </li></ul><ul><ul><li>memcpy – between tasks on one processor </li></ul></ul><ul><ul><li>I/O - between adjacent processors </li></ul></ul><ul><ul><ul><li>comports </li></ul></ul></ul><ul><ul><ul><li>SDBs </li></ul></ul></ul><ul><ul><ul><li>Rapid IO links </li></ul></ul></ul><ul><ul><li>Routed I/O – between remote processors </li></ul></ul><ul><ul><ul><li>software routing </li></ul></ul></ul><ul><ul><ul><li>guaranteed deadlock-free </li></ul></ul></ul><ul><ul><ul><li>any task can communicate with any other task </li></ul></ul></ul><ul><li>Diamond will choose the best implementation. </li></ul>
    41. 42. The Hardware
    42. 43. A Sundance Network
    43. 44. Ideal Hardware <ul><li>No shared resources </li></ul><ul><ul><li>Simplifies hardware </li></ul></ul><ul><ul><li>Simplifies software </li></ul></ul><ul><ul><li>Scales: more processors = more power </li></ul></ul><ul><li>Connected by communication links </li></ul><ul><ul><li>Add processors = add bandwidth </li></ul></ul><ul><li>Designing multiprocessor hardware: </li></ul><ul><ul><li>Speak to 3L first. </li></ul></ul>
    44. 45. Tasks & Channels
    45. 46. Map onto hardware
    46. 47. A simple task
    47. 48. A simple task <ul><li>#include <chan.h> </li></ul><ul><li>INPUT_PORT(0, DATA_IN) </li></ul><ul><li>OUTPUT_PORT(0, DATA_OUT) </li></ul><ul><li>main() </li></ul><ul><li>{ </li></ul><ul><li>int n; </li></ul><ul><li>for (;;) { </li></ul><ul><li>chan_in_word (&n, &DATA_IN); </li></ul><ul><li>chan_out_word(n+1, &DATA_OUT); </li></ul><ul><li>} </li></ul><ul><li>} </li></ul>
    48. 49. Team Working <ul><li>Tasks are self-contained </li></ul><ul><li>They are developed separately </li></ul><ul><li>Communication between tasks: </li></ul><ul><ul><li>is a contract </li></ul></ul><ul><ul><li>allows test systems to be built </li></ul></ul><ul><li>Ideal for team working </li></ul>
    49. 50. Design Flow <ul><li>Network </li></ul><ul><ul><li>Tasks </li></ul></ul><ul><ul><li>Channels </li></ul></ul>
    50. 51. Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul>Sources
    51. 52. Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul>Tasks
    52. 53. Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul><ul><li>Configuration File </li></ul>configuration file
    53. 54. Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul><ul><li>Configuration File </li></ul><ul><li>Configure </li></ul>application file
    54. 55. Design Flow <ul><li>Network </li></ul><ul><li>Code tasks </li></ul><ul><li>Compile & Link </li></ul><ul><li>Configuration File </li></ul><ul><li>Configure </li></ul><ul><li>Load & Run </li></ul>application file processor network
    55. 56. Running an application
    56. 57. Demonstration Hardware SMT365 SMT370 SMT374 SMT361 Only the SMT365 and the SMT361 will be used in the examples.
    57. 58. A Correlator Example
    58. 59. Code Each Task <ul><li>OUTPUT_PORT(2, COR_DATA) </li></ul><ul><li>INPUT_PORT (1, COR_RESULT) </li></ul><ul><li>. . . </li></ul><ul><li>main() </li></ul><ul><li>{ </li></ul><ul><li>printf(&quot;3L Diamond Example &quot;); </li></ul><ul><li>for (;;) { </li></ul><ul><li>. . . </li></ul><ul><li>chan_out_message(BYTES, Data, &COR_DATA); </li></ul><ul><li>chan_in_message(BYTES, Result, &COR_RESULT); </li></ul><ul><li>. . . </li></ul><ul><li>} </li></ul><ul><li>} </li></ul>
    59. 60. Configuration <ul><li>Write a configuration file to: </li></ul><ul><li>Describe the hardware </li></ul><ul><ul><li>processors </li></ul></ul><ul><ul><li>connections between processors </li></ul></ul><ul><li>Describe the software </li></ul><ul><ul><li>tasks </li></ul></ul><ul><ul><li>channels connecting tasks </li></ul></ul><ul><li>Map the software onto the hardware </li></ul><ul><ul><li>place tasks on processors </li></ul></ul>
    60. 61. Task names <ul><li>TASK example2 </li></ul><ul><li>TASK mainctrl </li></ul><ul><li>TASK disp_raw </li></ul><ul><li>TASK disp_cor </li></ul><ul><li>TASK UI </li></ul><ul><li>TASK correlator </li></ul>
    61. 62. Task ports <ul><li>TASK example2 INS=3 OUTS=7 </li></ul><ul><li>TASK mainctrl INS=1 OUTS=1 </li></ul><ul><li>TASK disp_raw INS=2 OUTS=0 </li></ul><ul><li>TASK disp_cor INS=2 OUTS=0 </li></ul><ul><li>TASK UI INS=1 OUTS=1 </li></ul><ul><li>TASK correlator INS=1 OUTS=1 </li></ul>
    62. 63. Task stack & heap <ul><li>TASK example2 INS=3 OUTS=7 DATA=500K </li></ul><ul><li>TASK mainctrl INS=1 OUTS=1 DATA=200K </li></ul><ul><li>TASK disp_raw INS=2 OUTS=0 DATA=200K </li></ul><ul><li>TASK disp_cor INS=2 OUTS=0 DATA=200K </li></ul><ul><li>TASK UI INS=1 OUTS=1 DATA=200K </li></ul><ul><li>TASK correlator INS=1 OUTS=1 DATA=32K </li></ul>
    63. 64. Task starting priorities <ul><li>TASK example2 urgent INS=3 OUTS=7 DATA=500K </li></ul><ul><li>TASK mainctrl INS=1 OUTS=1 DATA=200K </li></ul><ul><li>TASK disp_raw INS=2 OUTS=0 DATA=200K </li></ul><ul><li>TASK disp_cor INS=2 OUTS=0 DATA=200K </li></ul><ul><li>TASK UI urgent INS=1 OUTS=1 DATA=200K </li></ul><ul><li>TASK correlator priority=2 INS=1 OUTS=1 DATA=32K </li></ul><ul><li>! The starting priority is 1 unless explicitly stated. </li></ul>
    64. 65. Channel creation <ul><li>! channel output port input port </li></ul><ul><li>! ======= =========== ========== </li></ul><ul><li>CONNECT C1 UI[0] example2[0] </li></ul><ul><li>CONNECT C2 example2[5] mainctrl[0] </li></ul><ul><li>CONNECT C3 mainctrl[0] example2[2] </li></ul><ul><li>CONNECT C4 example2[0] disp_raw[0] </li></ul><ul><li>CONNECT C5 example2[1] disp_raw[1] </li></ul><ul><li>CONNECT C6 example2[2] correlator[0] </li></ul><ul><li>CONNECT C7 correlator[0] example2[1] </li></ul><ul><li>CONNECT C8 example2[3] disp_cor[0] </li></ul><ul><li>CONNECT C9 example2[4] disp_cor[1] </li></ul><ul><li>CONNECT C10 example2[6] UI[0] </li></ul>
    65. 66. The processor & placement <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    66. 67. Processor types <ul><li>Diamond supports all of the Sundance TIMs. The ProcType utility will display them all. </li></ul>
    67. 68. A note about memory <ul><li>With CCS you need to: </li></ul><ul><ul><li>specify memory explicitly. </li></ul></ul><ul><ul><li>know which “sections” are used by the compiler </li></ul></ul><ul><ul><li>allocate memory explicitly at the start </li></ul></ul><ul><li>Diamond can do all memory allocation </li></ul><ul><ul><li>available memory determined automatically </li></ul></ul><ul><ul><li>no linker command files </li></ul></ul><ul><ul><li>but, you can tell Diamond how to use memory </li></ul></ul><ul><ul><li>this is an optimisation once the code is working. </li></ul></ul><ul><ul><li>ignore it until the program’s needs are understood. </li></ul></ul>
    68. 69. Building & Running <ul><li>Compile each task with the command: 3L C </li></ul><ul><li>Link each task with the command: 3L T </li></ul><ul><li>Configure with the command: 3L A </li></ul><ul><li>Execute with the command: 3L X </li></ul>
    69. 70. Making it run faster
    70. 71. Use a second processor We shall use TIM1 (SMT365) and TIM4 (SMT361) connected by comports 0 & 3 respectively.
    71. 72. Demonstration Hardware SMT365 SMT370 SMT374 SMT361
    72. 73. Use a second processor <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    73. 74. Use a second processor <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR Node SMT361 </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    74. 75. Use a second processor <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR Node SMT361 </li></ul><ul><li>WIRE W1 Root[CP:0] Node[CP:3] </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    75. 76. Use a second processor <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR Node SMT361 </li></ul><ul><li>WIRE W1 Root[CP:0] Node[CP:3] </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Node </li></ul>
    76. 77. Notes <ul><li>The two tasks have not changed in any way. </li></ul><ul><li>Their connections have not changed. </li></ul><ul><li>No need to recompile them or relink them. </li></ul><ul><li>All we changed to move the tasks onto a second processor was the configuration file. </li></ul><ul><li>We just built a new application by running the configuration command again (3L A). </li></ul><ul><li>Loading the two processors is automatic . </li></ul>
    77. 78. Making it go even faster
    78. 79. Use the FPGA on the SMT365 <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR F FPGA </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    79. 80. The FPGA is already being used <ul><li>The FPGA is also used to support functions on the SMT365 DSP. </li></ul><ul><li>Attaching the FPGA to its processor allows the configurer to include all the necessary logic to support the needed functions. </li></ul>
    80. 81. Use the FPGA <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR F FPGA ATTACH=Root </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    81. 82. Use the FPGA <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR F FPGA ATTACH=Root </li></ul><ul><li>WIRE W1 Root[SDB:0] F[SDB_DEVICE:0] </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator Root </li></ul>
    82. 83. Use the FPGA <ul><li>PROCESSOR Root SMT365_8_1 </li></ul><ul><li>PROCESSOR F FPGA ATTACH=Root </li></ul><ul><li>WIRE W1 Root[SDB:0] F[SDB_DEVICE:0] </li></ul><ul><li>… </li></ul><ul><li>PLACE mainctrl Root </li></ul><ul><li>PLACE example2 Root </li></ul><ul><li>PLACE disp_raw Root </li></ul><ul><li>PLACE disp_cor Root </li></ul><ul><li>PLACE UI Root </li></ul><ul><li>PLACE correlator F </li></ul>
    83. 84. FPGA Tasks <ul><li>Placing a task on an FPGA instructs the configurer to look for an FPGA version of the task. </li></ul><ul><li>This can be written using: </li></ul><ul><ul><li>VHDL </li></ul></ul><ul><ul><li>Xilinx System Generator </li></ul></ul><ul><ul><li>Handel-C (Celoxica) </li></ul></ul><ul><ul><li>Any other method you like. </li></ul></ul>
    84. 85. Building with FPGA <ul><li>The configurer will construct a Xilinx project for the FPGA </li></ul><ul><li>It will call the Xilinx toold to build a complete bitstream. </li></ul><ul><li>The bitstream will be included in the single application file. </li></ul><ul><li>The FPGA will be configured automatically as the application is loaded. </li></ul>
    85. 86. Conclusion <ul><li>Diamond does a lot of the work for you. </li></ul><ul><li>Diamond allows you to change your mind and alter processors and topology. </li></ul><ul><li>Diamond gives a structured model for developing efficient applications. </li></ul><ul><li>The Diamond model is the same for any number and any combination of processors: DSP or FPGA. </li></ul><ul><li>Diamond simplifies developing multiprocessor applications. </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×