URF Poster

Texas Instruments Math Library Optimization
Tony Zhang1, Sean Murphy1, Jianzhong Xu2
University of Maryland, College Park1, Texas Instruments, Inc.2
Additional logo
graphics or
emblems
placed here
(optional)
SPRING 2016
URF PROGRAM
Introduction
The goal was to optimize the
MATHLIB standard math library for
the Texas Instruments C6600 family of
digital signal processors. The
functions in the standard library
include real and complex valued
functions, and all the programs are
written in the C language.
Process
A software simulator was used to execute
the programs, not real hardware. For each
function, there exists a single input version
and a vector input version. The single
input version of the program contains the
polynomial implementation. The vector
form then optimizes the single input
version to get more efficient calculations
and reduce the number of CPU cycles
used in calculations. Vector form is the
one that takes a vector of values as
opposed to the standard scalar form which
takes only one value. To perform these
optimizations, the TI C Compiler provides
extensions which would enable the
programmer to get computations and load
and store operations to be done in parallel.
Optimization Techniques
Optimization is defined as:
• the process of refinement in which code
being optimized executes faster and takes
fewer cycles.
Overall, we had to rewrite the C code such
that the compiler can do better optimization.
The compiler always tries to schedule up to
8 instructions in parallel. By writing compiler-
friendly C code, the complier can achieve
that objective more effectively.
One specific technique we used is to remove
conditional statements in a loop because it
prohibits software pipelining due to branch
operation. For example, in figure 2, the left-
hand code segment contains an if-else
statement. By removing the else statement,
we can help the compiler generate more
efficient code without changing the meaning
of the program.
Conclusion
Overall, although we did not reduce the
vector cycle count of the math library
functions by more than 10%, there were
some functions that we were able to
optimize thoroughly. More importantly, we
have learned numerous concepts related
to computer hardware, programming, and
signal processing.
Software Pipelining
Figure 1: Non-pipelining vs. Pipelining
Software pipelining utilizes parallel
architecture to decrease the cycles of a
piece of code. The TI C6000
architecture has 8 functional units that
can be used in parallel to complete up
to 8 instructions per cycle, and software
pipelining takes full advantage of that.
The 8 functional units include 2 data
units to load and store information, 2
branch units to handle loops, 2 multiply
units, and 2 add units. In practice,
pipelining can be achieved through the
compiler with a compilation flag ‘-o3’.
Figure 2: Non-optimized vs. Optimized Code
Source
http://software-dl.ti.com/trainingTTO/trainin
gTTO_public_sw/op6000/op6000_v1.51/o
p6000_student_guide_v1.51.pdf

URF Poster

More Related Content

What's hot

Similar to URF Poster

URF Poster