This document discusses several techniques for optimizing C code:
1) Code motion involves moving code that is executed repeatedly in loops outside of the loop if its return value remains constant, such as calling a function.
2) Loop unrolling repeats the code within loops multiple times to reduce the total number of iterations and associated overhead.
3) Inlining replaces function calls with copies of the function code to avoid call overhead for simple functions.
I am Josh U. I am a C++ Homework Expert at cpphomeworkhelp.com. I hold a Masters in Programming from St. Edward’s University, USA. I have been helping students with their homework for the past 5 years. I solve homework related to C++. Visit cpphomeworkhelp.com or email info@cpphomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C++ Homework.
I am Joe L. I am a Programming Assignment Expert at programminghomeworkhelp.com. I hold a Ph.D. in Programming from, University of Chicago, USA. I have been helping students with their homework for the past 10 years. I solve assignments related to Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Programming Assignments.
I am Gill K. I am an Operating System Assignment Expert at programminghomeworkhelp.com. I hold a PhD. in Programming at Manchester University, UK. I have been helping students with their homework for the past 6 years. I solve assignments related to Operating System Assignment.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.
You can also call on +1 678 648 4277 for any assistance with Operating System Assignment.
I am Josh U. I am a C++ Homework Expert at cpphomeworkhelp.com. I hold a Masters in Programming from St. Edward’s University, USA. I have been helping students with their homework for the past 5 years. I solve homework related to C++. Visit cpphomeworkhelp.com or email info@cpphomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C++ Homework.
I am Joe L. I am a Programming Assignment Expert at programminghomeworkhelp.com. I hold a Ph.D. in Programming from, University of Chicago, USA. I have been helping students with their homework for the past 10 years. I solve assignments related to Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Programming Assignments.
I am Gill K. I am an Operating System Assignment Expert at programminghomeworkhelp.com. I hold a PhD. in Programming at Manchester University, UK. I have been helping students with their homework for the past 6 years. I solve assignments related to Operating System Assignment.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.
You can also call on +1 678 648 4277 for any assistance with Operating System Assignment.
Presentation with a brief history of C, C++ and their ancestors along with an introduction to latest version C++11 and futures such as C++17. The presentation covers applications that use C++, C++11 compilers such as LLVM/Clang, some of the new language features in C++11 and C++17 and examples of modern idioms such as the new form compressions, initializer lists, lambdas, compile time type identification, improved memory management and improved standard library (threads, math, random, chrono, etc). (less == more) || (more == more)
Workshop slides from the Alt.Net Seattle 2011 workshop. Presented by Wes Dyer and Ryan Riley. Get the slides and the workshop code at http://rxworkshop.codeplex.com/
Slides from a brief presentation about the 'docase' notation that I did at Haskell Hackathon in Cambridge. The notation makes it easier to work with monads that have some additional operations (such as Par monad or Parsers).
Approaches and techniques for statically finding a multitude of issues in source code have been developed in the past. A core property of these approaches is that they are usually targeted towards finding only a very specific kind of issue and that the effort to develop such an analysis is significant. This strictly limits the number of kinds of issues that can be detected.
In this paper, we discuss a generic approach based on the detection of infeasible paths in code that can discover a wide range of code smells ranging from useless code that hinders comprehension to real bugs. Code issues are identified by calculating the difference between the control-flow graph that contains all technically possible edges and the corresponding graph recorded while performing a more precise analysis using abstract interpretation.
We have evaluated the approach using the Java Development Kit as well as the Qualitas Corpus (a curated collection of over 100 Java Applications) and were able to find thousands of issues across a wide range of categories.
I am Christopher Hemmingway. I am a Computer Science Assignment Expert at programminghomeworkhelp.com. I hold a Master's in Computer Science, Princeton University, Princeton. I have been helping students with their homework for the past 10 years. I solve assignments related to Computer Science.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Computer Science assignments.
The presentation shows major features of the new C++ standard (language and the library). The full list of new things is very broad, so I've categorized them to be easier to understand.
I am Gill H. I am a Programming Homework Expert at programminghomeworkhelp.com. I hold a Ph.D. in Programming from, the University of Texas, USA. I have been helping students with their homework for the past 10 years. I solve assignments related to Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Programming Homework.
I am Bernard. I am a C Assignment Expert at programminghomeworkhelp.com. I hold a Ph.D. in Programming from, University of Leeds, UK. I have been helping students with their homework for the past 9 years. I solve assignments related to C Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C Programming Assignments.
I am Blake H. I am a Software Construction Assignment Expert at programminghomeworkhelp.com. I hold a PhD. in Programming, Curtin University, Australia. I have been helping students with their homework for the past 10 years. I solve assignments related to Software Construction.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with Software Construction Assignments.
Presentation with a brief history of C, C++ and their ancestors along with an introduction to latest version C++11 and futures such as C++17. The presentation covers applications that use C++, C++11 compilers such as LLVM/Clang, some of the new language features in C++11 and C++17 and examples of modern idioms such as the new form compressions, initializer lists, lambdas, compile time type identification, improved memory management and improved standard library (threads, math, random, chrono, etc). (less == more) || (more == more)
Workshop slides from the Alt.Net Seattle 2011 workshop. Presented by Wes Dyer and Ryan Riley. Get the slides and the workshop code at http://rxworkshop.codeplex.com/
Slides from a brief presentation about the 'docase' notation that I did at Haskell Hackathon in Cambridge. The notation makes it easier to work with monads that have some additional operations (such as Par monad or Parsers).
Approaches and techniques for statically finding a multitude of issues in source code have been developed in the past. A core property of these approaches is that they are usually targeted towards finding only a very specific kind of issue and that the effort to develop such an analysis is significant. This strictly limits the number of kinds of issues that can be detected.
In this paper, we discuss a generic approach based on the detection of infeasible paths in code that can discover a wide range of code smells ranging from useless code that hinders comprehension to real bugs. Code issues are identified by calculating the difference between the control-flow graph that contains all technically possible edges and the corresponding graph recorded while performing a more precise analysis using abstract interpretation.
We have evaluated the approach using the Java Development Kit as well as the Qualitas Corpus (a curated collection of over 100 Java Applications) and were able to find thousands of issues across a wide range of categories.
I am Christopher Hemmingway. I am a Computer Science Assignment Expert at programminghomeworkhelp.com. I hold a Master's in Computer Science, Princeton University, Princeton. I have been helping students with their homework for the past 10 years. I solve assignments related to Computer Science.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Computer Science assignments.
The presentation shows major features of the new C++ standard (language and the library). The full list of new things is very broad, so I've categorized them to be easier to understand.
I am Gill H. I am a Programming Homework Expert at programminghomeworkhelp.com. I hold a Ph.D. in Programming from, the University of Texas, USA. I have been helping students with their homework for the past 10 years. I solve assignments related to Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with Programming Homework.
I am Bernard. I am a C Assignment Expert at programminghomeworkhelp.com. I hold a Ph.D. in Programming from, University of Leeds, UK. I have been helping students with their homework for the past 9 years. I solve assignments related to C Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C Programming Assignments.
I am Blake H. I am a Software Construction Assignment Expert at programminghomeworkhelp.com. I hold a PhD. in Programming, Curtin University, Australia. I have been helping students with their homework for the past 10 years. I solve assignments related to Software Construction.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with Software Construction Assignments.
Objectives Assignment 09 Applications of Stacks COS.docxdunhamadell
Objectives
Assignment 09: Applications of Stacks
COSC 2336: Data Structures and Algorithms Fall 2020
• More practice with recursion.
• Practice writing some template functions.
• Use stack ADT to implement given algorithms.
• Practice using Stack class container given as a library in a separate file. • Look at some common applications of stacks.
Description
In this assignment, you will be using the Stack abstract data type we developed for this unit and discussed in our lectures, to implement 4 functions that use a stack data type to accomplish their algorithms. The functions range from relatively simple, straight forward use of a stack, to a bit more complex. But in all 4 cases, you should only need to use the abstract stack interface functions push(), pop(), top(), and isEmpty() in order to successfully use our Stack type for this assignment and the function you are asked to write.
NOTE
You are to use the Stack ADT abstraction give to you for this assignment. If you are familiar with STL stack containers, you are not to use them for this assignment. Part of the assignment is to look over and learn the Stack ADT implementation we give you here based on our textbook Stack examples.
Setup
For this assignment you will be given the following files:
File Name
assg09-tests.cpp assg09-stackfun.hpp assg09-stackfun.cpp Stack.hpp
Stack.cpp
Description
Unit tests for the member functions
you are to write.
Header file where function prototypes for the functions you write using stacks should go. Implementaiton file, the implementation of the 4 functions you write for this assignment go here. Header file defining a Stack ADT for use in implementing the functions for this assignment. You will not make any modifications in this file, you are only going to be using the given Stack. Implementation file for the Stack ADT
template class. You also do not make any changes in this file either.
Set up a multi-file project to compile the .cpp source files and run them as shown for the class. The Makefile you were given should be usable to create a build project using the Atom editor as required in this class. You will only be adding code to the assg09-stackfun.[hpp|cpp] file in this assignment. The Stack.[hpp|cpp] file contains a Stack container. You are to use this Stack ADT for the 4 functions you are to write for this assignment.
1
The general approach you should take for this assignment, and all assignment is:
Set up your project with the given starting code. The files should compile and run, but either no tests will be run, or tests will run but be failing.
For this project, start by uncommenting the first TEST_CASE in the assg09-tests.cpp file. These are the unit tests to test the functionality of your doParenthesisMatch() function, the member function you are to implement.
AddthecorrectfunctionprototypeforthedoParenthesisMatch()memberfunctionintheassg09-stackfun.hpp header file. The prototyp.
I am Baddie K. I am a C++ Homework Expert at cpphomeworkhelp.com. I hold a Masters in Programming from The University of Queensland. I have been helping students with their homework for the past 9 years. I solve homework related to C++.
Visit cpphomeworkhelp.com or email info@cpphomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C++ Homework.
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2PVS-Studio
This is the second and last part of the large article about analysis of the Haiku operating system. In the first article, we discussed a variety of possible errors all of which one way or another deal with conditions. In this article, we will discuss the remaining analyzer warnings I have selected for you. The bug examples are grouped into several categories.
This article demonstrates capabilities of the static code analysis methodology. The readers are offered to study the samples of one hundred errors found in open-source projects in C/C++. All the errors have been found with the PVS-Studio static code analyzer.
Linux Kernel, tested by the Linux-version of PVS-StudioPVS-Studio
Since the release of the publicly available Linux-version of PVS-Studio, it was just a matter of time until we would recheck the Linux kernel. It is quite a challenge for any static code analyzer to check a project written by professionals from all around the world, used by people in various fields, which is regularly checked and tested by different tools. So, what errors did we manage to find in such conditions?
I am Bernard. I am a C Programming Assignment Help Expert at programminghomeworkhelp.com. I hold a Ph.D. in Computer Science from, University of Leeds, UK. I have been helping students with their homework for the past 12 years. I solve assignments related to C Programming.
Visit programminghomeworkhelp.com or email support@programminghomeworkhelp.com.You can also call on +1 678 648 4277 for any assistance with C Programming assignments.
100 bugs in Open Source C/C++ projects Andrey Karpov
This article demonstrates capabilities of the static code analysis methodology. The readers are offered to study the samples of one hundred errors found in open-source projects in C/C++.
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala - Part 2Philip Schwarz
(download for perfect quality) See aggregation functions defined inductively and implemented using recursion.
Learn how in many cases, tail-recursion and the accumulator trick can be used to avoid stack-overflow errors.
Watch as general aggregation is implemented and see duality theorems capturing the relationship between left folds and right folds.
Through the work of Sergei Winitzki and Richard Bird.
Folding Unfolded - Polyglot FP for Fun and Profit - Haskell and Scala Part 2 ...Philip Schwarz
(download for perfect quality) See aggregation functions defined inductively and implemented using recursion.
Learn how in many cases, tail-recursion and the accumulator trick can be used to avoid stack-overflow errors.
Watch as general aggregation is implemented and see duality theorems capturing the relationship between left folds and right folds.
Through the work of Sergei Winitzki and Richard Bird.
This version corrects the following issues:
slide 32: = reverse --> reverse =
Slide 33: 100_000 -> 1_000_000
It also adds slides 36, 37 and 38
I am Baddie K. I am a C++ Homework Expert at cpphomeworkhelp.com. I hold a Masters in Programming from The University of Queensland. I have been helping students with their homework for the past 9 years. I solve homework related to C++. Visit cpphomeworkhelp.com or email info@cpphomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with C++ Homework.
This PPT File helps IT freshers with the Basic Interview Questions, which will boost there confidence before going to the Interview. For more details and Interview Questions please log in www.rekruitin.com and click on Job Seeker tools. Also register on the and get employed.
By ReKruiTIn.com
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
1. CS 33 Intro Computer Systems Doeppner
Optimization Techniques in C
Fall, 2014
1 Code Motion
Code motion involves identifying bits of code that occur within loops, but need only be executed
once during that particular loop. A good example of such an operation is including a function call
in a for or while loop header — assuming that the header function’s return value will be constant
for the duration of the loop, the header function will uneccesarily execute during each iteration of
the loop.
For example, the following code shifts each character of a string.
void shift_char(char *str){
int i;
for(i=0;i<strlen(str);i++){
str[i]++;
}
}
However, there is no reason to calculate the length of the string on every iteration of the loop.
Thus, we can optimize the function by moving the function call strlen out of the loop:
void shift_char(char *str){
int i;
int len=strlen(str)
for(i=0;i<len;i++){
str[i]++;
}
}
The new version of the function only needs to calculate the value of the function once, saving time
in its execution.
2 Loop Unrolling
Another common optimization technique is called loop unrolling — this entails repeating the code
that will be executed multiple times in a loop, so that fewer loop iterations need to be made. The
concept is simple if counterintuitive, since programmers are used to using loops to avoid exactly this
sort of repetition (which is why this type of optimization is usually done by the compiler, rather
than by humans).
Here is a simple example. In the unoptimized program, a while loop is used to call a certain
function some finite, but large, number of times. Loops like this one are very common, and you
have surely written numerous similar code snippets over the course of your CS career:
2. CS 33 Optimization Techniques in C Fall, 2014
int i = 0;
while (i < num) {
a_certain_function(i);
i++;
}
There is a problem here, however: loops incur a certain amount of overhead, especially when
each iteration does only a small amount of work (such as calling a single function). The conditional
branch at the top of the loop, and the unconditional return jump at the bottom, are both operations
that take a non-trivial number of cycles to complete, and thus slow the program down, albeit by a
small amount. Thus, assuming that num is a multiple of 4, the following optimized code might be
preferable to the original:
int i = 0;
while (i < num) {
a_certain_function(i);
a_certain_function(i+1);
a_certain_function(i+2);
a_certain_function(i+3);
i += 4;
}
In this version, although the actual code visible to the user (and the binary) is longer, the program
will take less time to run, because it need only loop num/4 times to compute the same results as
the original version.
Loop unrolling does have one major drawback: it makes programs significantly longer (in terms
of lines of code and size of the compiled binary) than they would be otherwise. As in many cases
in computer science, the decision of whether to unroll or not to unroll is a tradeoff between time
and program size. For some applications, the added binary length might be worth the decreased
runtime; however, there are also applications for which a smaller binary is more important than
a shorter runtime (for example, if one were coding for a machine with very little RAM, or were
writing a program that was already very large).
3 Inlining
Inlining is the process by which the contents of a function are “inlined” — basically, copied and
pasted — instead of a traditional call to that function. This form of optimization avoids the
overhead of function calls, by eliminating the need to jump, create a new stack frame, and reverse
this process at the end of the function.
For example, consider the following piece of code:
2
3. CS 33 Optimization Techniques in C Fall, 2014
int get_random_number(void) {
return 4; // chosen by fair die roll
// guaranteed to be random
}
int main(int argc, char **argv) {
int a = get_random_number();
printf("%dn", a);
return 0;
}
Making a whole function call just to retrieve the number 4 seems a bit inefficient, and gcc would
agree with you on that. To optimize this function, the code inside of get random number() (which
happens to just be the integer 4) would be substituted for the function call in the body of main().
Thus, after optimization, the function would look like this:
int main(int argc, char **argv) {
int a = 4;
printf("%dn", a);
return 0;
}
A smart compiler might even eliminate the temporary variable a, although for the sake of this
example let us assume that it does not. The resultant program does not make any function calls;
however, it retrieves the same result as weas provided in the original version. This program will
run much faster than the original, given that it doesn’t have to jump, create a new stack frame,
and undo all of that time-consuming computation just to get the number 4.
Once inlining is brought into the picture, the distinction between “functions” and “macros” may
seem a bit fuzzier. However, there remain important semantic differences between the two. Most
importantly, a function always defines its own scope — that is, local variables defined within a
function may only be accessed from within that function (in C, this is true of any block of code
surrounded by curly braces). When inlining more complex functions with lots of local variables,
things get much more complex. We won’t get into the specifics here (this is a systems course, not
a semantics course), but let it suffice to say that the process is a bit more complicated than simply
copying and pasting the contents of one function into another.
4 Writing Cache-Friendly Code
Another optimization commonly made to programs is cache-friendly code — that is, code whose
organization and design utilizes the machine’s cache in the most efficient way possible.
Caches, as you have seen in lecture, store lines containing bytes that are located consecutively in
main memory. This design is based on the principle of spatial locality — that is, those memory
regions that are physically closer together are more likely to be accessed within a short time of one
another than those spread more thinly in RAM. In order to write cache-friendly code, a programmer
must respect this principle, by writing programs that primarily access closely-grouped memory
locations sequentially or simultaneously.
3
4. CS 33 Optimization Techniques in C Fall, 2014
For example, the following code will visit every entry in a two-dimensional array, and add 1 to each:
for (j = 0; j < NUM_COLS; j++) {
for (i = 0; i < NUM_ROWS; i++) {
array[i][j] += 1;
}
}
However, there is a problem here: two-dimensional arrays in C are actually stored as one-dimensional
arrays, with rows intact and listed one after the other, such that the underlying index of an ele-
ment can be computed by row * rowlen + col (In technical terminology, this is called row-major
order). When we iterate through the array in the above example, however, we iterate in column-
major order1. Unless our cache lines are very large, or the array is very small, this poses a problem,
because sequential memory accesses will each be separated from one another by NUM COLS. Thus,
if NUM COLS is greater than the length of a cache line, and NUM ROWS is greater than the number of
lines in the cache, we run into the following problem:
On each of the first n iterations (where n is the number of lines in the cache), the program suffers a
cache miss, since the next byte needed is not in any of the cache lines that are present in memory.
This is not a huge problem because we started with a cold cache, and some number of misses were
bound to be required in order to warm it up. However, on the next iteration, we again suffer a
miss, and since the cache is now full, one of the current cache lines must be evicted to make room
for the new line. Since each read requests an address that is on a different line and there are not
enough lines to store the entire array in the cache, a miss occurs on every iteration, slowing the
program down considerably. This is unacceptable — if we’re to have a miss each time we access
memory, we might as well not have the cache at all. Luckily, there is a simple fix: reverse the order
in which the rows and columns of the array are traversed:
for (i = 0; i < NUM_ROWS; i++) {
for (j = 0; j < NUM_COLS; j++) {
array[i][j] += 1;
}
}
Now we are iterating through the columns on the inside loop, and the rows on the outside. Turning
our attention back to the cache, we see a different pattern: The first memory access is a miss, since
the cache is cold. However, the next n − 1 accesses are hits, since the requisite values were loaded
into the cache with the first as part of the cache line. Only then do we encounter another miss.
This pattern continues, and misses become much less frequent than before.
5 Cache Blocking
What happens though when each iteration accesses two non-spatially related elements? For exam-
ple, the product of two matrices accesses the rows of the first matrix and the columns of the second
matrix. An example of this function is as follows:
1
Which is just like row-major order, except that the columns are intact and listed one after another.
4
5. CS 33 Optimization Techniques in C Fall, 2014
void matrixproduct(double *a, double *b, double *c, int n){
int i,j,k;
for(i=0; i<n; i++){
for(j=0; j<n; j++){
for(k=0; k<n; k++){
c[i*n+j] += a[i*n+k]*b[k*n+j];
}
}
}
}
This will generate a lot of cache misses like in the previous example. If we assume the cache block
to be 8 doubles in size, after the first iteration, there are n/8 + n cache misses. This is from the
fact that the entire column of the second matrix is traversed. If we were to look at the cache, the
last 8 doubles of the first matrix would be contained. Each cache miss results in loading a single
block into the cache, the size of which is 8 doubles. Thus, there are n/8 cache misses. However, for
the second matrix, 8 doubles from each of the last C rows would be contained where C is the size
of the cache. Since every row of the column would not be contained in the cache, there are n cache
misses. With n2 iterations, there are on the order of n3 cache misses! It would be nice if we could
take advantage rows of the second matrix despite traversing the columns.
Luckily, the matrix product algorithm lends itself to a blocking solution. That is, instead of dealing
with rows of the first matrix and columns of the second matrix, we deal with a block of the first
matrix and a block of the second matrix. Since the operations are communicative, the order doesn’t
matter. An example of a blocked matrix product is as follows:
void matrixproduct(double *a, double *b, double *c, int n){
int i,j,k;
for(i=0; i<n; i+=B){
for(j=0; j<n; j+=B){
for(k=0; k<n; k+B){
//Dealing with the product of the block
for(i2=i; i2<i+B; i2++){
for(j2=j; j2<j+B; j2++){
for(k2=k; k2<k+B; k2++){
c[i2*n+j2] += a[i2*n+k2]*b[k2*n+j2];
}
}
}
}
}
}
}
We will make the same cache size assumptions and assume that around 3 blocks fit into the cache.
Since each block is of size BxB, if we are computing the product of 2 blocks we need 3B2 space in
the cache to hold the multiplicand, the multiplier, and the product, with 3B2 < C. (This assumes
5
6. CS 33 Optimization Techniques in C Fall, 2014
the units of B and C are bytes.) Since a cache line is 64 byes, each line holds 8 doubles. Thus the
first access of a cache-line-aligned block results in a miss, but brings in 8 doubles. And thus 1/8 of
the accesses to matrix elements (each of which is a double) result in a miss, or B2
8 misses per block.
In each iteration, 2n
B blocks are examined (n
b for the multiplicand and n
b for the multiplier). Since
each block results in B2
8 misses, there are (2n
B )(B2
8 ) = nB
4 misses/iteration.
Another improvement is that we now have fewer iterations since we are dealing with a whole block
of data at once. We have ( n
B )2 iterations leading to a total of ( n
B )2(nB
4 ) = 1
4B n3 cache misses. By
using the largest possible block size for our cache, we can see that we get far fewer cache missses
by blocking.
Blocked solutions are not always easy to see or create but when used can make code much faster.
Complete knowledge of the algorithms used are needed in order to make the most efficient code.
6 Writing Pipeline-Friendly Code
Finally, optimized programs must take into account how the machine’s pipeline will affect their
performance. As you have seen in lecture, the nature of a pipeline makes it such that, after some
instructions (such as conditional branches), it is impossible to predict with perfect accuracy which
instruction will be executed next. However, since knowing the following instruction, even some
of the time, helps with efficiency, processor designers make every effort to anticipate the next
instruction as often as possible in these cases.
Branch prediction, however, only gets it right so often, so it’s up to the programmer to make
sure that it is easy for the processor to predict branches. There are a number of different branch
prediction schemes, which are detailed in your book and in lecture.
However, as was mentioned in the section on loop unrolling, the best solution is to avoid branching
as much as possible, since mispredicted branches are fairly expensive, and it is difficult to know the
branch prediction scheme of the machine on which your code will be run. In many processors, the
branch predictor is good enough that there is very little the programmer can do to help — thus,
while you shouldn’t go to too great lengths to avoid branches, you should be cognizant of where
they appear in your code and ask yourself if they are unnecessary.
6