Buffer Overflows: Common Security Threats Under 40 Characters
1. CHAPTER 6
Buffer Overflows
Slides adapted from "Foundations of Security: What Every Programmer
Needs To Know" by Neil Daswani, Christoph Kern, and Anita Kesavan
(ISBN 1590597842; http://www.foundationsofsecurity.com). Except as
otherwise noted, the content of this presentation is licensed under the
Creative Commons 3.0 License.
2. Agenda
Buffer overflows: attacker hijacks machine
Attacker injects malicious code into program
Preventable, but common
(50% CERT advisories
decade after Morris Worm)
Fixes: Safe string libraries,
StackGuard, Static Analysis
Other Types of Overflows: Heap, Integer, …
3. 6.1. Anatomy of a Buffer
Overflow
Buffer: memory used to
store user input, has
fixed maximum size
Buffer overflow: when
user input exceeds max
buffer size
Extra input goes into
unexpected memory
locations
4. 6.1.1. A Small Example
Malicious user enters >
1024 chars, but buf can
only store 1024 chars;
extra chars overflow
buffer
void get_input() {
char buf[1024];
gets(buf);
}
void main(int argc, char*argv[]){
get_input();
}
6. 6.1.2. checkPassword() Bugs
Execution stack: maintains current function state
and address of return function
Stack frame: holds vars and data for function
Extra user input (> 16 chars) overwrites return
address
Attackstring: 17-20th chars can specify address of
openVault() to bypass check
Address can be found with source code or binary
7. 6.1.2. Non-Executable Stacks
Don’t Solve It All
Attack could overwrite return address to point to
newly injected code
NX stacks can prevent this, but not the vault
example (jumping to an existing function)
Return-into-libc attack: jump to library functions
e.g.
/bin/sh or cmd.exe to gain access to a
command shell (shellcode) and complete control
8. 6.1.3. The safe_gets() Function
#define EOLN 'n'
void safe_gets (char *input, int max_chars) {
if ((input == NULL) || (max_chars < 1))) return;
if (max_chars == 1) { input[0] = 0; return; }
int count = 0;
char next_char;
do {
next_char = getchar(); // one character at a time
if (next_char != EOLN)
input[count++] = next_char;
} while ((count < max_chars-1) && // leave space for null
(next_char != EOLN));
input[count]=0;
}
Unlike gets(), takes parameter specifying max chars to
insert in buffer
Use in checkPassword() instead of gets() to
eliminate buffer overflow vulnerability
5 safe_gets(pass, 16);
9. 6.2. Safe String Libraries
C – Avoid (no bounds checks): strcpy(),
strcat(), sprintf(), scanf()
Use safer versions (with bounds checking):
strncpy(), strncat(), fgets()
Microsoft’s StrSafe, Messier and Viega’s
SafeStr do bounds checks, null termination
Must pass the right buffer size to functions!
C++: STL string class handles allocation
Unlike compiled languages (C/C++),
interpreted ones (Java/C#) enforce type
safety, raise exceptions for buffer overflow
10. 6.3. Additional Approaches
Rewriting old string manipulation code is
expensive, any other solutions?
StackGuard/canaries (Crispin Cowan)
Static checking (e.g. Coverity)
Non-executable stacks
Interpreted languages (e.g., Java, C#)
11. 6.3.1. StackGuard
Canary: random value, unpredictable to attacker
Compiler technique: inserts canary before return
address on stack
Corrupt Canary: code halts
program to thwart a
possible attack
Not comprehensive
protection
Source: C. Cowan et. al., StackGuard,
12. 6.3.2. Static Analysis Tools
Static Analysis: analyzing programs without
running them
Meta-level compilation
Findsecurity, synchronization, and memory bugs
Detect frequent code patterns/idioms and flag code
anomalies that don’t fit
Ex: Coverity, Fortify, Ounce Labs, Klockwork
Coverity found bugs in Linux device drivers
13. 6.4. Performance
Mitigating buffer overflow attacks incurs little
performance cost
Safe str functions take slightly longer to execute
StackGuard canary adds small overhead
But performance hit is negligible while security
payoff is immense
14. 6.5. Heap-Based Overflows
Ex: malloc() in C provides a fix chunk of
memory on the heap
Unless realloc() called, attacker could also
overflow heap buffer (fixed size), overwrite
adjacent data to modify control path of program
Same fixes: bounds-checking on input
15. 6.6. Other Memory
Corruption Vulnerabilities
Memory corruption vulnerability: Attacker
exploits programmer memory management error
Other Examples
Format String Vulnerabilities
Integer Overflows
Used to launch many attacks including buffer overflow
Can crash program, take full control
16. 6.6.1. Format String
Vulnerabilities
Format String in C directs how text is
formatted for output: e.g. %d, %s
Can contain info on # chars (e.g. %10s)
void format_warning (char *buffer,
char *username, char *message) {
sprintf (buffer, "Warning: %10s -- %8s",
message, username);
}
If message or username greater than 10 or
8 chars, buffer overflows
attackercan input a username string to insert
shellcode or desired return address
17. 6.6.2. Integer Overflows (1)
Exploits range of value integers can store
Ex:signed two-byte int stores between -232 and 232-1
Cause unexpected wrap-around scenarios
Attacker passes int greater than max (positive)
-> value wraps around to the min (negative!)
Can cause unexpected program behavior, possible
buffer overflow exploits
18. 6.6.2. Integer Overflows (2)
/* Writes str to buffer with offset
characters of blank spaces
preceding str. */
Attacker sets
3 void formatStr(char *buffer, int buflen, offset = 232,
4 int offset, char *str, int slen) { wraps around to
5 char message[slen+offset]; negative values!
6 int i; write outside
7
8 /* Write blank spaces */ bounds of
9 for (i = 0; i < offset; i++) message
10 message[i] = ' '; write arbitrary
11
12 strncpy(message+offset, str, slen); addresses
on heap!
// offset = 232!?
13 strncpy(buffer, message, buflen);
14 message[buflen-1] = 0;
/*Null terminate*/
15 }
19. Summary
Buffer overflows most common security threat!
Used in many worms such as Morris Worm
Affects both stacks and heaps
Attacker can run desired code, hijack program
Prevent by bounds-checking all buffers
And/or use StackGuard, Static Analysis…
Type of Memory Corruption:
Format String Vulnerabilities, Integer Overflow, etc…
Editor's Notes
Welcome to SEC103 on Secure Programming Techniques. In this course, I assume that you have some background in computer security, but now you want to put that background to use. For example, in the Computer Security Principles and Introduction To Cryptography courses, we cover topics such concerning trust and encryption. In this course, we put these principles into to practice, and I’ll show you have to write secure code that builds security into your applications from the ground up.
Buffer overflow: Dejavu all over again Up to 50 percent of today's widely exploited vulnerabilities are buffer overflows, according to an analysis by David Wagner, Jeffrey Foster, Eric Brewer, and Alexander Aiken in a paper presented at this year's Network and Distributed Systems Security conference (NDSS2000). Furthermore, the analysis suggests that the ratio is increasing over time. The data are extremely discouraging since the buffer overflow problem has been widely known in security circles for years. For some reason, developers have not readily moved to eliminate buffer overflows as the leading pitfall in software security. Number of vulnerabilities resulting in CERT/CC advisories for the last eleven years In chart above, the number of vulnerabilities that can be directly attributed to buffer overflows is displayed. As the data show, the problem is not getting any better. In fact, buffer overflows are becoming more common.
So before we define what a buffer overflow vulnerability is, let’s first clarify what we mean by a “buffer.” A buffer is simply a memory location that can be used to store user input. Very often, buffers have a fixed maximum size. If the user provides more input than can fit in the buffer, the extra input might end up in places in memory that we do not expect, and the buffer is said to overflow. Let’s look at a simple example of a buffer, and how a buffer overflow can occur. (Insert picture of spilled milk.)
Consider the example of the program on this slide. We will see that this program is vulnerable to a buffer overflow attack. In this slide, we have shown a very simple program written in the C programming language. In the main function of this small program, the get_input() function is called to accept input from the user. The get_input() function has a variable called “buf” that it uses to store up to 1024 bytes of input from the user. The gets() function simply makes a call to the operating system to tell it to accept input from the user until the user hits the carriage return and to store that input in the “buf” variable. The “buf” variable effectively stores a “buffer” that holds the user input. What makes this program vulnerable to a buffer overflow attack is that, while most users will not enter input that exceeds 1024 characters, a malicious user might enter more than 1024 characters before hitting the carriage return. The problem is that the “buf” buffer variable has only been allocated 1024 bytes of memory. So, what happen to any extra input that a malicious user might enter? Well, in a perfect world, that extra input might just get ignored, or the gets() function might return with an error, or perhaps even the program would be halted by the operating system. Unfortunately, in the real world, because of the way that the gets() function is written in the standard C library, something much worse can happen. However, the simple example program on this slide does not do much to begin with, so lets look at a program whose functionality is a bit more significant.
In this slide, we have a slightly more complicated example. The program on the left contains three functions: main, openVault, and checkPassword. The program’s purpose in life is to allow a user to enter a password, and if the user enters a correct password, the program calls the openVault function. In a real program, you can imagine that the openVault function might issue a command that really opens a bank vault, or dispenses cash out of an ATM machine, or does some other security critical operation based on the password check. The way the program works is that the main() function calls the checkPassword function to allow the user to enter a password. The checkPassword function returns the value 1 if the user enters the correct password, which, in this case, is the string “opensesame.” However, if the user does not enter a correct password, the checkPassword function returns 0. Looking back at the main() function if the checkPassword function returns a non-zero value (like 1), it calls the openVault function. While the code looks reasonable, it is vulnerable to a buffer overflow attack. Because of this vulnerability, it is possible for an attacker to have the openVault function called without entering the correct password. To understand why, first notice that a buffer of 16 bytes called “pass” has been allocated to allow the user to enter the password, and the same gets() function that we saw in the previous slide is being used to fill that buffer. Second, let’s think about what goes on at the machine level when one function calls another. Microprocessors use the concept of an execution stack to keep track of what they are doing at any particular time. The way a normal execution stack would look for our program on the left is illustrated in the middle box of the slide. When the program starts running, the address of the main() function is “pushed” onto the top of an execution stack. Then, we the main program calls the checkPassword() function, the checkPassword function is pushed onto the top of the stack. The microprocessor does this so that when it finishes executing the checkPassword function, it knows that it needs to jump back to executing the main() function. We have denoted the address of the main function by the “&main.” Since the checkPassword function also uses the “pass” buffer, space for that buffer is also allocated on the execution stack, and the space for the pass buffer is allocated right on top of the return address for the main function – the &main. If the user enters a password that is less than 16 characters, then all is well and good. However, if the user is malicious, and enters a password that is more than 16 characters, then the extra characters end up overwriting the return address for the main function. How bad is that? Very bad,a ctually. What this allows a malicious user to do is enter a password that is more than 16 characters in such a way that the return address for main could be overwritten with another return address! The malicious user does not care what are the first 16 characters he enters is, but he does care about what the 17 through 20 th character is. Assuming that the return address is 4 bytes, the malicious user can construct an input that has is 17 throgh 20 th characters that corresponds to the address of another function of the program. If the malicious user can determine what the address of the openValut function is, he can enter a “password” that has the address of the openVault function in it, and he doesn’t ever need to know the real password to get the program to jump to it. Once the user enters the malicious input that is a string that contains the return address, what will happen is that the string comparison against “opensesame” will fail, and the check password function will return 0 BUT the program will NOT return to the main function– it will return to the openValut function!!! A diagram of a compromised stack, in which the malicious user has entered a password that contains the return address of the openVault function is shown. It turns out that figuring out the address of the openValut function is relatively easy given a binary of the program. In the class exercise associated with this lecture, you will be given the opportunity to actually construct a buffer overflow attack, but for right now, it is most important to understand that by overflowing a buffer, a malicious user can cause a program to jump to a function of his or her choice instead of having the program continue its “normal” execution flow. So the next question is how do we fix the problem? We could just make the buffer larger. But then a malicious user could simply enter more input. A better solution would be to check the size of the input. Unfortunately, the gets() C function that is used to get the user’s input doesn’t really allow us to check the size of the input, or provide a maximum limit. So, we can replace the call to gets() with a call to another function called “safe_gets” that does check the size of the input, and will accept no more than 16 characters of input. // , and you will attack a program similar to this one in the class exercises.
At this point, you might be asking yourself how to prevent your programs from being vulnerable to buffer overflows. In the last slide, part of the problem was that the gets() function wrote into the buffer without checking the size of the buffer. As a result, a malicious user is able to use gets() to write addresses into the program’s execution stack. The gets() function assumes that the programmer knows exactly what she is doing and that the programmer is doing a check to make sure that the boundaries of the buffer are not overwritten. Unfortunately, gets() as well as a lot of other C functions assume that these types of checks are already being done, when very often they are not. One solution to make sure that your programs are not vulnerable to buffer overflow attacks is to not use functions like gets(). There are many such C functions that do not check the bounds of the buffers they write. Included in this class of C functions are some very common ones such as string-copy, sting-concatenate, sprintf, scanf, and many others. So, while re-coding your C programs to avoid the use of such functions is one option, doing so may take a lot of work, and it might be hard to do correctly. Later on, we’ll talk about some other approaches to dealing with the issue of buffer overflows. Also, you may be asking about whether or not programs that you write in other languages such as C++ or Java might be vulnerable also. The short answer is that C++ programs are highly susciptle while Java programs are not as vulnerable. We’ll talk about why later on and provide some further tips for how to deal with potential buffer overflows regardless of what programming language you use.