Texas Instruments Math Library Optimization
Tony Zhang1, Sean Murphy1, Jianzhong Xu2
University of Maryland, College Park1, Texas Instruments, Inc.2
Additional logo
graphics or
emblems
placed here
(optional)
SPRING 2016
URF PROGRAM
Introduction
The goal was to optimize the
MATHLIB standard math library for
the Texas Instruments C6600 family of
digital signal processors. The
functions in the standard library
include real and complex valued
functions, and all the programs are
written in the C language.
Process
A software simulator was used to execute
the programs, not real hardware. For each
function, there exists a single input version
and a vector input version. The single
input version of the program contains the
polynomial implementation. The vector
form then optimizes the single input
version to get more efficient calculations
and reduce the number of CPU cycles
used in calculations. Vector form is the
one that takes a vector of values as
opposed to the standard scalar form which
takes only one value. To perform these
optimizations, the TI C Compiler provides
extensions which would enable the
programmer to get computations and load
and store operations to be done in parallel.
Optimization Techniques
Optimization is defined as:
• the process of refinement in which code
being optimized executes faster and takes
fewer cycles.
Overall, we had to rewrite the C code such
that the compiler can do better optimization.
The compiler always tries to schedule up to
8 instructions in parallel. By writing compiler-
friendly C code, the complier can achieve
that objective more effectively.
One specific technique we used is to remove
conditional statements in a loop because it
prohibits software pipelining due to branch
operation. For example, in figure 2, the left-
hand code segment contains an if-else
statement. By removing the else statement,
we can help the compiler generate more
efficient code without changing the meaning
of the program.
Conclusion
Overall, although we did not reduce the
vector cycle count of the math library
functions by more than 10%, there were
some functions that we were able to
optimize thoroughly. More importantly, we
have learned numerous concepts related
to computer hardware, programming, and
signal processing.
Software Pipelining
Figure 1: Non-pipelining vs. Pipelining
Software pipelining utilizes parallel
architecture to decrease the cycles of a
piece of code. The TI C6000
architecture has 8 functional units that
can be used in parallel to complete up
to 8 instructions per cycle, and software
pipelining takes full advantage of that.
The 8 functional units include 2 data
units to load and store information, 2
branch units to handle loops, 2 multiply
units, and 2 add units. In practice,
pipelining can be achieved through the
compiler with a compilation flag ‘-o3’.
Figure 2: Non-optimized vs. Optimized Code
Source
http://software-dl.ti.com/trainingTTO/trainin
gTTO_public_sw/op6000/op6000_v1.51/o
p6000_student_guide_v1.51.pdf

URF Poster

  • 1.
    Texas Instruments MathLibrary Optimization Tony Zhang1, Sean Murphy1, Jianzhong Xu2 University of Maryland, College Park1, Texas Instruments, Inc.2 Additional logo graphics or emblems placed here (optional) SPRING 2016 URF PROGRAM Introduction The goal was to optimize the MATHLIB standard math library for the Texas Instruments C6600 family of digital signal processors. The functions in the standard library include real and complex valued functions, and all the programs are written in the C language. Process A software simulator was used to execute the programs, not real hardware. For each function, there exists a single input version and a vector input version. The single input version of the program contains the polynomial implementation. The vector form then optimizes the single input version to get more efficient calculations and reduce the number of CPU cycles used in calculations. Vector form is the one that takes a vector of values as opposed to the standard scalar form which takes only one value. To perform these optimizations, the TI C Compiler provides extensions which would enable the programmer to get computations and load and store operations to be done in parallel. Optimization Techniques Optimization is defined as: • the process of refinement in which code being optimized executes faster and takes fewer cycles. Overall, we had to rewrite the C code such that the compiler can do better optimization. The compiler always tries to schedule up to 8 instructions in parallel. By writing compiler- friendly C code, the complier can achieve that objective more effectively. One specific technique we used is to remove conditional statements in a loop because it prohibits software pipelining due to branch operation. For example, in figure 2, the left- hand code segment contains an if-else statement. By removing the else statement, we can help the compiler generate more efficient code without changing the meaning of the program. Conclusion Overall, although we did not reduce the vector cycle count of the math library functions by more than 10%, there were some functions that we were able to optimize thoroughly. More importantly, we have learned numerous concepts related to computer hardware, programming, and signal processing. Software Pipelining Figure 1: Non-pipelining vs. Pipelining Software pipelining utilizes parallel architecture to decrease the cycles of a piece of code. The TI C6000 architecture has 8 functional units that can be used in parallel to complete up to 8 instructions per cycle, and software pipelining takes full advantage of that. The 8 functional units include 2 data units to load and store information, 2 branch units to handle loops, 2 multiply units, and 2 add units. In practice, pipelining can be achieved through the compiler with a compilation flag ‘-o3’. Figure 2: Non-optimized vs. Optimized Code Source http://software-dl.ti.com/trainingTTO/trainin gTTO_public_sw/op6000/op6000_v1.51/o p6000_student_guide_v1.51.pdf