HIGH LEVEL OPTIMIZATIONS IN CODE1. Floating – point to Fixed – point conversion2. Simple loop transformations3. Loop tiling/blocking4. Loop Splitting5. Array Folding
CODE OPTIMIZATION1) Floating –point to Fixed –point Conversion:• Reduction in cycle count by75% and energyconsumption by 76% for anMPEG – 2 video compressionalgorithm.• Trade – off between cost ofimplementation and qualityof algorithm.• Done using Fixed – C datatypes.• E.g. a=fixed(5,4,s,wt,*b)fixed a,*b,c2) Array Folding:• Options for reducing storagerequirements of large arraysmust be explored sincememory space is limited inembedded systems.• Inter – array folding methodemploys sharing of memoryspace among arrays whichare not needed atoverlapping time intervals.• Limited sets of componentsneeded within an array canalso be taken as at a timeonly a subset of arrayelements is needed.
CODE OPTIMIZATION3) Loop tiling/blocking:• It is utmost essential to reuse“small” memories includingcaches and scratch – padmemories.• Blocked or tiled algorithmsimproves locality of references.• Innermost loop becomesrestricted as it accesses lessarray elements.• If a proper blocking factor isselected, the elements are stillin the cache when next iterationof the innermost loop starts.• Improves performance formatrix multiplications byreducing no. of memoryreferences using reuse factor.4) Loop Splitting:• Efficiency of algorithm improvesif loops are splitted and oneloop body handles the regularcases and a second one handlesthe exceptions.• Total number of cycles can besaved by splitting of nestedloops for various applicationsand target processors.• Cycle count can be reduced by75%.
CODE OPTIMIZATIONSimple LoopTransformationsLoopPermutationLoop Fusion,Loop FissionLoop Unrolling• Two loops caneither be mergedinto a single loop –Loop Fusion.• Single loop issplitted into twoloops – Loop Fission• Helps in reuse ofarray elements incache as nextiteration of theloop body willaccess an adjacentlocation inmemory.• Number of copies of theloop is called unrollingfactor (>2).• Reduces loop overhead(less branches perinstruction) & improvesspeed but increases codesize.• Restricted to loops withconstant no. ofiterations.
EMBEDDED C FOR HIGHPERFORMANCE DSP PROGRAMMING• Performance is the key to digital signalprocessing because it translates intoapplication – based end – user systems.• Changes in technological and economicrequirements make it more and moreexpensive to continue programming the DSPprocessor in assembly languages.• DSP architectures are not easy to programoptimally due to their non – orthogonality.
• Stronger error correction and encryptionalgorithms must be added to match up to theincreased complexity in DSP.• Communication protocols have become moresophisticated and require much more code toimplement.• Multiple protocol stacks have beenimplemented to be compatible with multipleservice providers.• In addition, backward compatibility with olderprotocols is also needed to stay synchronizedwith provider networks that are in a slowprocess of upgrading.
ENTERING WITH EMBEDDED C• Embedded C is designed to bridge theperformance mismatch between the signalprocessing algorithms, standard C and thearchitecture.• It is an extension of C language with theprimitives that are needed by signal processingapplications and that are commonly provided byDSP processors.• Maintainability and portability of code are thekey winners in this process.
REQUIREMENTS FOR I/O HARDWAREADDRESSING INTERFACE1. The device drive source code must beportable.2. The interface must not preventimplementations to produce machine codethat is as efficient as other methods.3. The design should permit encapsulation ofthe system dependent access method.
MEMORY MANAGEMENT IN ANAEROSPACE EMBEDDED CODE• Dynamic Allocation eases development byproviding system memory to applicationprocesses as needed at runtime and retrievingthe memory when it is no longer needed.• C’s runtime library function malloc() can exhibitwildly unpredictable performance and become abottleneck in multithread programs on multi coresystems.• Hence, dynamic memory allocation is forbiddenin a safety – critical embedded avionics code.
WHY NOT DYNAMIC MEMORYALLOCATION IN AVIONICS?• Dynamic memory is a poor – choice for amission – critical code as it is based on listallocator algorithms that organize memorypools into contiguous locations in a singlelinked list.• These list allocators allocates a memory usingmalloc() and de – allocates the memorylocation for reuse using free(). But it places aburden on the programmer to balance eachcall to malloc() with a corresponding call tofree().
THEN WHAT IS THE SOLUTION?• Customized memory allocation functions thatmore closely match specific allocationscenarios are used such as:1. Stack – based allocator2. Thread – local allocator3. In – Memory Database Systems (IMDS)• The performance, stability and predictabilityof the safety – critical code increases usingabove custom allocators.
STACK – BASED ALLOCATOR• In this algorithm, each allocation returns theaddress of the current position of the stackpointer and advances the pointer by the amountof the request.• When memory is no longer needed, the stackpointer is rewound.• Processing Overhead is reduced because there isno chain of pointers to manage nor are there anyallocation sizes or contiguous locations to track.• A memory leak can’t be accidentally introducedthrough improper de – allocation because theapplication does not have to track specificallocations.
THREAD – LOCAL ALLOCATOR• A custom thread – local allocator avoids conflictsby assigning a specific memory pool to eachthread.• The thread’s allocation is performed from thisblock without interference with other thread’srequests, thus enhancing performance andpredictability.• It uses a Pending Request List or PRL for eachthread to coordinate the release of memoryblocks that are freed by a thread other than theone that performed the original allocation.• Memory that is allocated and de - allocated bythe same thread requires no coordination, andtherefore no lock conflicts occur.
IN – MEMORY DATABASE SYSTEMS(IMDS)• Benefits of Custom memory allocators can alsobe harnessed by integrating third – partysoftware like IMDS.• IMDS manages application objects in RAM.• Memory allocation & de – allocation ofapplication objects is also done using malloc()and free().• With an IMDS, concurrency among multithreadsis maintained automatically via transactions.
APPLICATIONS IN MILITARY• A sensor object could represent either opticalsensors for tracking missile targets or biosensorsfor defense in chemical warfare or motionsensors to aid in navigating an aircraft.• This sensor object occupies memory from thememory pool and free() returns memory back tothe heap & space is relinquished for reuse whenthe code completes.• malloc() is responsible for memory fragmentationand for deciding the allocator type.
EMBEDDED C INFPGA SWITCHINGTECHNOLOGY• C algorithms can beapplied to programmable& flexible FPGAs usingultra – low latency.• Parallelism involvesunrolling a softwareprocess into multipleparallel hardwareprocesses.• Recently applied in WallStreet• Possesses potential usefor military purposes.