SlideShare a Scribd company logo
1 of 106
Download to read offline
Hacking in C
The C programming language
Radboud University, Nijmegen, The Netherlands
Spring 2018
The C programming language
I Invented by Dennis Ritchie in the early 70s
I First “Hello World” program written in C
Source: Wikipedia
2
The C programming language
I Invented by Dennis Ritchie in the early 70s
I First “Hello World” program written in C
I UNIX (and Linux) is written in C
I Still one of the top-5 most used
programming languages
I Compilers for almost all platforms
Source: Wikipedia
2
The C programming language
I Invented by Dennis Ritchie in the early 70s
I First “Hello World” program written in C
I UNIX (and Linux) is written in C
I Still one of the top-5 most used
programming languages
I Compilers for almost all platforms
I Many “interesting” security issues
Source: Wikipedia
2
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
3
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
I Standardized by ANSI in 1989 (C89) and ISO (C90)
I Second edition of K&R book used “ANSI C”, i.e., C89
3
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
I Standardized by ANSI in 1989 (C89) and ISO (C90)
I Second edition of K&R book used “ANSI C”, i.e., C89
I In 1995, ANSI published an amendment to the C standard (“C95”)
3
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
I Standardized by ANSI in 1989 (C89) and ISO (C90)
I Second edition of K&R book used “ANSI C”, i.e., C89
I In 1995, ANSI published an amendment to the C standard (“C95”)
I In 1999, ISO standardized updated C, ANSI adopted (C99)
3
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
I Standardized by ANSI in 1989 (C89) and ISO (C90)
I Second edition of K&R book used “ANSI C”, i.e., C89
I In 1995, ANSI published an amendment to the C standard (“C95”)
I In 1999, ISO standardized updated C, ANSI adopted (C99)
I Current standard is C11, standardized (ANSI and ISO) in 2011
I Standard draft online:
https://port70.net/~nsz/c/c11/n1570.html
3
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
I Standardized by ANSI in 1989 (C89) and ISO (C90)
I Second edition of K&R book used “ANSI C”, i.e., C89
I In 1995, ANSI published an amendment to the C standard (“C95”)
I In 1999, ISO standardized updated C, ANSI adopted (C99)
I Current standard is C11, standardized (ANSI and ISO) in 2011
I Standard draft online:
https://port70.net/~nsz/c/c11/n1570.html
I Compilers like gcc or clang also support GNU extensions
I Default for gcc: C11 plus GNU extensions (aka gnu11)
3
C standards and “standards”
I First definition in Kernighan&Ritchie: “The C Programming
Language”
I Also known as K&R C, book appared in 1978
I Standardized by ANSI in 1989 (C89) and ISO (C90)
I Second edition of K&R book used “ANSI C”, i.e., C89
I In 1995, ANSI published an amendment to the C standard (“C95”)
I In 1999, ISO standardized updated C, ANSI adopted (C99)
I Current standard is C11, standardized (ANSI and ISO) in 2011
I Standard draft online:
https://port70.net/~nsz/c/c11/n1570.html
I Compilers like gcc or clang also support GNU extensions
I Default for gcc: C11 plus GNU extensions (aka gnu11)
I You can switch gcc to other C standards using, e.g., -std=c89
I Use -pedantic flag to issue warnings if your code doesn’t conform
to the standard
3
C vs. C++
I C is the basis for C++, Objective-C, and many other languages
I C is not a subset of C++, e.g.,
int *x = malloc(sizeof(int) * 10);
is valid (and perfectly reasonable) C, but not valid C++!
4
C vs. C++
I C is the basis for C++, Objective-C, and many other languages
I C is not a subset of C++, e.g.,
int *x = malloc(sizeof(int) * 10);
is valid (and perfectly reasonable) C, but not valid C++!
I You can “mix” C and C++ code, but you have to be very careful
I In C++, declare C functions as extern "C", for example:
extern "C" int mycfunction(int);
I Now you can call mycfunction from your C++ code
I Use compiler by the same vendor to compile
4
C vs. C++
I C is the basis for C++, Objective-C, and many other languages
I C is not a subset of C++, e.g.,
int *x = malloc(sizeof(int) * 10);
is valid (and perfectly reasonable) C, but not valid C++!
I You can “mix” C and C++ code, but you have to be very careful
I In C++, declare C functions as extern "C", for example:
extern "C" int mycfunction(int);
I Now you can call mycfunction from your C++ code
I Use compiler by the same vendor to compile
I Lets you use, e.g., highly optimized C libraries
I Common scenario:
I Write high-speed code in C (and assembly)
I Write so-called wrappers around this for easy access in C++
4
A “portable assembler”
C has been characterized (both admiringly and invidiously) as a portable
assembly language
—Dennis Ritchie
I Idea of assembly:
I Programmer has full control over the program
I Choice of instructions, register allocation etc. left to programmer
I Programmer has “raw access” to memory
5
A “portable assembler”
C has been characterized (both admiringly and invidiously) as a portable
assembly language
—Dennis Ritchie
I Idea of assembly:
I Programmer has full control over the program
I Choice of instructions, register allocation etc. left to programmer
I Programmer has “raw access” to memory
I Need to rewrite programs for each architecture
I Need to re-optimize for each microarchitecture
5
A “portable assembler”
C has been characterized (both admiringly and invidiously) as a portable
assembly language
—Dennis Ritchie
I Idea of assembly:
I Programmer has full control over the program
I Choice of instructions, register allocation etc. left to programmer
I Programmer has “raw access” to memory
I Need to rewrite programs for each architecture
I Need to re-optimize for each microarchitecture
I Idea of C:
I Take away some bits of control from the programmer
I Stay as close as possible to assembly, but stay portable
I In particular: give programmer raw access to memory
5
A “portable assembler”
C has been characterized (both admiringly and invidiously) as a portable
assembly language
—Dennis Ritchie
I Idea of assembly:
I Programmer has full control over the program
I Choice of instructions, register allocation etc. left to programmer
I Programmer has “raw access” to memory
I Need to rewrite programs for each architecture
I Need to re-optimize for each microarchitecture
I Idea of C:
I Take away some bits of control from the programmer
I Stay as close as possible to assembly, but stay portable
I In particular: give programmer raw access to memory
I Use compiler to generate code for different architectures
I Use compiler to optimize for different microarchitectures
5
“If programming languages were. . . ”
I . . . vehicles
http:
//crashworks.org/if_programming_languages_were_vehicles/
I . . . countries
https://www.quora.com/
If-programming-languages-were-countries-which-country-would-each-l
I . . . GoT characters
https://techbeacon.com/
if-programming-languages-were-game-thrones-characters
I . . . beer
https:
//www.topcoder.com/blog/if-programming-languages-were-beer/
6
“If programming languages were. . . ”
I . . . vehicles
http:
//crashworks.org/if_programming_languages_were_vehicles/
I . . . countries
https://www.quora.com/
If-programming-languages-were-countries-which-country-would-each-l
I . . . GoT characters
https://techbeacon.com/
if-programming-languages-were-game-thrones-characters
I . . . beer
https:
//www.topcoder.com/blog/if-programming-languages-were-beer/
I . . . boats
http://compsci.ca/blog/if-a-programming-language-was-a-boat/
6
“If programming languages were. . . ”
“C is a nuclear submarine. The instructions are probably in a foreign
language, but all of the hardware itself is optimized for performance.
6
Syntax and semantics
Syntax of a programming language
I Spelling and grammar rules
I Defines the language of valid programs
I Syntax errors are caught by the compiler
I Classical example: forget a ; at the end of a line
7
Syntax and semantics
Syntax of a programming language
I Spelling and grammar rules
I Defines the language of valid programs
I Syntax errors are caught by the compiler
I Classical example: forget a ; at the end of a line
Semantics of a programming language
I Defines the meaning of a valid program
I In many languages semantics are fully specified
I Runtime errors (exceptions) are part of the semantics
I C is not fully specified!
7
Unspecified behavior
I Unspecified behavior is “implementation-specific”
I Semantics not defined by the standard, but have to specified by the
compiler
I Reason: allow better optimization
8
Unspecified behavior
I Unspecified behavior is “implementation-specific”
I Semantics not defined by the standard, but have to specified by the
compiler
I Reason: allow better optimization
I Examples:
I Shifting negative values to the right (e.g., int a = (-42) >> 3)
8
Unspecified behavior
I Unspecified behavior is “implementation-specific”
I Semantics not defined by the standard, but have to specified by the
compiler
I Reason: allow better optimization
I Examples:
I Shifting negative values to the right (e.g., int a = (-42) >> 3)
I Order of subexpression evaluation (e.g., f(g(), h()))
8
Unspecified behavior
I Unspecified behavior is “implementation-specific”
I Semantics not defined by the standard, but have to specified by the
compiler
I Reason: allow better optimization
I Examples:
I Shifting negative values to the right (e.g., int a = (-42) >> 3)
I Order of subexpression evaluation (e.g., f(g(), h()))
I Sizes of of various types (more later)
I Representation of data types (more later)
8
Unspecified behavior
I Unspecified behavior is “implementation-specific”
I Semantics not defined by the standard, but have to specified by the
compiler
I Reason: allow better optimization
I Examples:
I Shifting negative values to the right (e.g., int a = (-42) >> 3)
I Order of subexpression evaluation (e.g., f(g(), h()))
I Sizes of of various types (more later)
I Representation of data types (more later)
I Number of bits in one byte
8
Unspecified behavior
I Unspecified behavior is “implementation-specific”
I Semantics not defined by the standard, but have to specified by the
compiler
I Reason: allow better optimization
I Examples:
I Shifting negative values to the right (e.g., int a = (-42) >> 3)
I Order of subexpression evaluation (e.g., f(g(), h()))
I Sizes of of various types (more later)
I Representation of data types (more later)
I Number of bits in one byte
I Fairly hard to write fully specified C programs
I For this course: if not otherwise stated assume gcc (version 6.x or
7.x) compiling for AMD64.
8
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
9
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
I Undefined behavior means that the whole program has no
meaning anymore!
I This is essentially always a bug, often security critical
9
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
I Undefined behavior means that the whole program has no
meaning anymore!
I This is essentially always a bug, often security critical
I Examples:
I Access an array outside the bounds
I More generally: access memory at “illegal” position
9
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
I Undefined behavior means that the whole program has no
meaning anymore!
I This is essentially always a bug, often security critical
I Examples:
I Access an array outside the bounds
I More generally: access memory at “illegal” position
I Overflowing a signed integer ((INT_MAX+1))
9
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
I Undefined behavior means that the whole program has no
meaning anymore!
I This is essentially always a bug, often security critical
I Examples:
I Access an array outside the bounds
I More generally: access memory at “illegal” position
I Overflowing a signed integer ((INT_MAX+1))
I Left-shifting a signed integer ((-42) << 3)
9
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
I Undefined behavior means that the whole program has no
meaning anymore!
I This is essentially always a bug, often security critical
I Examples:
I Access an array outside the bounds
I More generally: access memory at “illegal” position
I Overflowing a signed integer ((INT_MAX+1))
I Left-shifting a signed integer ((-42) << 3)
I It is totally acceptable for a program to delete all your data when
running into undefined behavior
9
Undefined behavior
I Different from unspecified behavior: undefined behavior
I Program reaches a state in which it may do anything
I It may crash with arbitrary error code
I It may silently corrupt data
I It may give the right result
I The behavior may be “randomly” different in independent runs
I Undefined behavior means that the whole program has no
meaning anymore!
I This is essentially always a bug, often security critical
I Examples:
I Access an array outside the bounds
I More generally: access memory at “illegal” position
I Overflowing a signed integer ((INT_MAX+1))
I Left-shifting a signed integer ((-42) << 3)
I It is totally acceptable for a program to delete all your data when
running into undefined behavior
I Sometimes we can make a program do this (or something similar)
I Most attacks in the course: exploit undefined behavior
9
C compilation
I Four steps involved in compilation, can stop at any of those
I First step: Run the preprocessor (gcc -E)
I Include code from #include directives
I Expand macros from #define directives
I Expand compile-time (static) conditionals #if
I The C preprocessor is almost Turing complete
I See https://github.com/orangeduck/CPP_COMPLETE for a
Brainfuck interpreter written in the C preprocessor
10
C compilation
I Four steps involved in compilation, can stop at any of those
I First step: Run the preprocessor (gcc -E)
I Include code from #include directives
I Expand macros from #define directives
I Expand compile-time (static) conditionals #if
I The C preprocessor is almost Turing complete
I See https://github.com/orangeduck/CPP_COMPLETE for a
Brainfuck interpreter written in the C preprocessor
I Second step: Run compilation proper (gcc -S)
I Go from C to assembly level
I This is where you get syntax errors
10
C compilation
I Four steps involved in compilation, can stop at any of those
I First step: Run the preprocessor (gcc -E)
I Include code from #include directives
I Expand macros from #define directives
I Expand compile-time (static) conditionals #if
I The C preprocessor is almost Turing complete
I See https://github.com/orangeduck/CPP_COMPLETE for a
Brainfuck interpreter written in the C preprocessor
I Second step: Run compilation proper (gcc -S)
I Go from C to assembly level
I This is where you get syntax errors
I Third step: Generate machine code (gcc -c)
I Generates so-called object files
10
C compilation
I Four steps involved in compilation, can stop at any of those
I First step: Run the preprocessor (gcc -E)
I Include code from #include directives
I Expand macros from #define directives
I Expand compile-time (static) conditionals #if
I The C preprocessor is almost Turing complete
I See https://github.com/orangeduck/CPP_COMPLETE for a
Brainfuck interpreter written in the C preprocessor
I Second step: Run compilation proper (gcc -S)
I Go from C to assembly level
I This is where you get syntax errors
I Third step: Generate machine code (gcc -c)
I Generates so-called object files
I Fourth step: Linking (simply run gcc, this is default)
I Put object files together to a binary
I Linker errors include missing functions or function duplicates
I Also include external libraries here (e.g., -lm)
I Caution: order of arguments can matter!
10
Memory abstraction 1: where data is stored
I Programmers typically don’t know where data is stored
I For example, a variable can sit in
I a register of the CPU
I in any of the caches of the CPU
I in RAM
I on the hard drive (in so-called swap space)
11
Memory abstraction 1: where data is stored
I Programmers typically don’t know where data is stored
I For example, a variable can sit in
I a register of the CPU
I in any of the caches of the CPU
I in RAM
I on the hard drive (in so-called swap space)
I Compiler makes decisions about register allocation
I Compiler has some bit of influence on caching
I Other decisions are made by the OS (and the CPU)
11
Memory abstraction 1: where data is stored
I Programmers typically don’t know where data is stored
I For example, a variable can sit in
I a register of the CPU
I in any of the caches of the CPU
I in RAM
I on the hard drive (in so-called swap space)
I Compiler makes decisions about register allocation
I Compiler has some bit of influence on caching
I Other decisions are made by the OS (and the CPU)
I Sometimes important: always read the variable from memory
I C has keyword volatile to enforce this
I Disables certain optimization
11
Where is data allocated?
I C has the & operator that returns the address of a variable
I Example:
I Let’s say we have a variable int x = 12
I Now &x is the address where x is stored, aka a pointer to x
12
Where is data allocated?
I C has the & operator that returns the address of a variable
I Example:
I Let’s say we have a variable int x = 12
I Now &x is the address where x is stored, aka a pointer to x
I Much more on pointers later, for the moment let’s print them:
char x; int i; short s; char y;
printf("The address of x is %pn", &x);
printf("The address of i is %pn", &i);
printf("The address of s is %pn", &s);
printf("The address of y is %pn", &y);
I Note the %p format specifier for pointers
12
Where is data allocated?
I C has the & operator that returns the address of a variable
I Example:
I Let’s say we have a variable int x = 12
I Now &x is the address where x is stored, aka a pointer to x
I Much more on pointers later, for the moment let’s print them:
char x; int i; short s; char y;
printf("The address of x is %pn", &x);
printf("The address of i is %pn", &i);
printf("The address of s is %pn", &s);
printf("The address of y is %pn", &y);
I Note the %p format specifier for pointers
I The “inverse” of & is *, i.e., *(&x) gives the value of x
12
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
13
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
I C lets programmers “help” the compiler with keyword register
13
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
I C lets programmers “help” the compiler with keyword register
I Quote from Erik’s slides:
“you should never ever use this! Compilers are much better than you
are at figuring out which data is best stored in CPU registers.”
13
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
I C lets programmers “help” the compiler with keyword register
I Quote from Erik’s slides:
“you should never ever use this! Compilers are much better than you
are at figuring out which data is best stored in CPU registers.”
I I agree that I never (?) use register
13
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
I C lets programmers “help” the compiler with keyword register
I Quote from Erik’s slides:
“you should never ever use this! Compilers are much better than you
are at figuring out which data is best stored in CPU registers.”
I I agree that I never (?) use register
I Reason: I am (often) better than the compiler at figuring out which
data is best stored in CPU registers. . .
13
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
I C lets programmers “help” the compiler with keyword register
I Quote from Erik’s slides:
“you should never ever use this! Compilers are much better than you
are at figuring out which data is best stored in CPU registers.”
I I agree that I never (?) use register
I Reason: I am (often) better than the compiler at figuring out which
data is best stored in CPU registers. . .
I . . . and then I write in assembly and avoid the compiler alltogether
13
register
I Important task for the compiler: register allocation
I Map live variables (whose values are still needed) to registers
I Typical goal: minimize amount of “register spills”
I C lets programmers “help” the compiler with keyword register
I Quote from Erik’s slides:
“you should never ever use this! Compilers are much better than you
are at figuring out which data is best stored in CPU registers.”
I I agree that I never (?) use register
I Reason: I am (often) better than the compiler at figuring out which
data is best stored in CPU registers. . .
I . . . and then I write in assembly and avoid the compiler alltogether
I Problem with register: no guarantee that the value isn’t spilled
I Requesting the address of a register variable is invalid!
13
Memory abstraction 2: how data is stored
I You can think of memory as an array of bytes
I For this course: a byte consists of 8 bits
14
Memory abstraction 2: how data is stored
I You can think of memory as an array of bytes
I For this course: a byte consists of 8 bits
I Computer programs work with different data types
I Important step of compilation: map other types to bytes
14
Memory abstraction 2: how data is stored
I You can think of memory as an array of bytes
I For this course: a byte consists of 8 bits
I Computer programs work with different data types
I Important step of compilation: map other types to bytes
I Idea of C: you can program without needing to understand this
mapping
I Idea of this course: you can have more fun with C if you do!
14
Memory abstraction 2: how data is stored
I You can think of memory as an array of bytes
I For this course: a byte consists of 8 bits
I Computer programs work with different data types
I Important step of compilation: map other types to bytes
I Idea of C: you can program without needing to understand this
mapping
I Idea of this course: you can have more fun with C if you do!
I The CPU likes to see the memory as an array of words
I Words typically consist of several bytes (e.g., 4 or 8 bytes)
I (Most) registers have the size of machine words
I Often loads and stores are more efficient when aligned to a word
boundary
14
Memory abstraction 2: how data is stored
I You can think of memory as an array of bytes
I For this course: a byte consists of 8 bits
I Computer programs work with different data types
I Important step of compilation: map other types to bytes
I Idea of C: you can program without needing to understand this
mapping
I Idea of this course: you can have more fun with C if you do!
I The CPU likes to see the memory as an array of words
I Words typically consist of several bytes (e.g., 4 or 8 bytes)
I (Most) registers have the size of machine words
I Often loads and stores are more efficient when aligned to a word
boundary
I von Neumann architecture: also programs are just bytes in memory
I Only difference between data and program: what you do with it
14
char
I Most basic data type: char
I From the C11 standard:
“An object declared as type char is large enough to store any
member of the basic execution character set.”
I More useful definition: a char is a byte, i.e., the smallest
addressable unit of memory
I In all relevant scenarios: a char is an 8-bit integer
15
char
I Most basic data type: char
I From the C11 standard:
“An object declared as type char is large enough to store any
member of the basic execution character set.”
I More useful definition: a char is a byte, i.e., the smallest
addressable unit of memory
I In all relevant scenarios: a char is an 8-bit integer
I Traditionally a char is used to represent ASCII characters, yields
two common ways to initialize a char:
char a = ’2’;
char b = 2;
char c = 50;
I Which of those values are equal?
15
char
I Most basic data type: char
I From the C11 standard:
“An object declared as type char is large enough to store any
member of the basic execution character set.”
I More useful definition: a char is a byte, i.e., the smallest
addressable unit of memory
I In all relevant scenarios: a char is an 8-bit integer
I Traditionally a char is used to represent ASCII characters, yields
two common ways to initialize a char:
char a = ’2’;
char b = 2;
char c = 50;
I Which of those values are equal?
15
char
I Most basic data type: char
I From the C11 standard:
“An object declared as type char is large enough to store any
member of the basic execution character set.”
I More useful definition: a char is a byte, i.e., the smallest
addressable unit of memory
I In all relevant scenarios: a char is an 8-bit integer
I Traditionally a char is used to represent ASCII characters, yields
two common ways to initialize a char:
char a = ’2’;
char b = 2;
char c = 50;
I Which of those values are equal?
I It’s a and c, because ’2’ has ASCII value 50.
15
Another quick question. . .
I What does the following code do?:
char i;
for(i=42;i>=0;i--)
{
printf("Crypto stands for cryptographyn");
}
16
Another quick question. . .
I What does the following code do?:
char i;
for(i=42;i>=0;i--)
{
printf("Crypto stands for cryptographyn");
}
16
Another quick question. . .
I What does the following code do?:
char i;
for(i=42;i>=0;i--)
{
printf("Crypto stands for cryptographyn");
}
I Answer: it depends (and it really does!)
I C standard does not define whether char is signed or unsigned
I Make explicit by using signed char or unsigned char
16
Other integral types
I C11 provides 4 more integral types (each signed and unsigned):
I short: at least 2 bytes
I int: typically 4 (but sometimes 2) bytes
I long: typically 4 or 8 bytes
I long long: at least 8 bytes (in practice: exactly 8 bytes)
17
Other integral types
I C11 provides 4 more integral types (each signed and unsigned):
I short: at least 2 bytes
I int: typically 4 (but sometimes 2) bytes
I long: typically 4 or 8 bytes
I long long: at least 8 bytes (in practice: exactly 8 bytes)
I GNU extension: __int128 for architectures that support it
17
Other integral types
I C11 provides 4 more integral types (each signed and unsigned):
I short: at least 2 bytes
I int: typically 4 (but sometimes 2) bytes
I long: typically 4 or 8 bytes
I long long: at least 8 bytes (in practice: exactly 8 bytes)
I GNU extension: __int128 for architectures that support it
I Common misconception: long is as long as a machine word
I Think about how this would work on an 8-bit microcontroller. . .
17
Other integral types
I C11 provides 4 more integral types (each signed and unsigned):
I short: at least 2 bytes
I int: typically 4 (but sometimes 2) bytes
I long: typically 4 or 8 bytes
I long long: at least 8 bytes (in practice: exactly 8 bytes)
I GNU extension: __int128 for architectures that support it
I Common misconception: long is as long as a machine word
I Think about how this would work on an 8-bit microcontroller. . .
I Find size of any type in bytes using sizeof, e.g.:
int a;
printf("%zd", sizeof(a));
printf("%zd", sizeof(long));
17
Other integral types
I C11 provides 4 more integral types (each signed and unsigned):
I short: at least 2 bytes
I int: typically 4 (but sometimes 2) bytes
I long: typically 4 or 8 bytes
I long long: at least 8 bytes (in practice: exactly 8 bytes)
I GNU extension: __int128 for architectures that support it
I Common misconception: long is as long as a machine word
I Think about how this would work on an 8-bit microcontroller. . .
I Find size of any type in bytes using sizeof, e.g.:
int a;
printf("%zd", sizeof(a));
printf("%zd", sizeof(long));
I Integral constants can be written in
I Decimal, e.g., 255
I Hexadecimal, using 0x, e.g., 0xff
I Octal, using 0, e.g., 0377
17
Floating-point and complex values
I C also defines 3 “real” types:
I float: usually 32-bit IEEE 754 “single-precision” floats
I double: usually 64-bit IEEE 754 “double-precision” floats
I long double:: usually 80-bit “extended precision” floats
18
Floating-point and complex values
I C also defines 3 “real” types:
I float: usually 32-bit IEEE 754 “single-precision” floats
I double: usually 64-bit IEEE 754 “double-precision” floats
I long double:: usually 80-bit “extended precision” floats
I Corresponding “complex” types (need to include complex.h)
18
Floating-point and complex values
I C also defines 3 “real” types:
I float: usually 32-bit IEEE 754 “single-precision” floats
I double: usually 64-bit IEEE 754 “double-precision” floats
I long double:: usually 80-bit “extended precision” floats
I Corresponding “complex” types (need to include complex.h)
I This lecture: not much float hacking
I However, this is fun, see “What every computer scientist should
know about floating point arithmetic”
www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf
18
Floating-point and complex values
I C also defines 3 “real” types:
I float: usually 32-bit IEEE 754 “single-precision” floats
I double: usually 64-bit IEEE 754 “double-precision” floats
I long double:: usually 80-bit “extended precision” floats
I Corresponding “complex” types (need to include complex.h)
I This lecture: not much float hacking
I However, this is fun, see “What every computer scientist should
know about floating point arithmetic”
www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf
I Small example:
double a; /* assume IEEE 754 standard */
...
a += 6755399441055744;
a -= 6755399441055744;
I What does this code do to a?
18
Floating-point and complex values
I C also defines 3 “real” types:
I float: usually 32-bit IEEE 754 “single-precision” floats
I double: usually 64-bit IEEE 754 “double-precision” floats
I long double:: usually 80-bit “extended precision” floats
I Corresponding “complex” types (need to include complex.h)
I This lecture: not much float hacking
I However, this is fun, see “What every computer scientist should
know about floating point arithmetic”
www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf
I Small example:
double a; /* assume IEEE 754 standard */
...
a += 6755399441055744;
a -= 6755399441055744;
I What does this code do to a?
18
Floating-point and complex values
I C also defines 3 “real” types:
I float: usually 32-bit IEEE 754 “single-precision” floats
I double: usually 64-bit IEEE 754 “double-precision” floats
I long double:: usually 80-bit “extended precision” floats
I Corresponding “complex” types (need to include complex.h)
I This lecture: not much float hacking
I However, this is fun, see “What every computer scientist should
know about floating point arithmetic”
www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf
I Small example:
double a; /* assume IEEE 754 standard */
...
a += 6755399441055744;
a -= 6755399441055744;
I What does this code do to a?
I Answer: it rounds a according to the currently set rounding mode
18
Printing values
Have already seen various examples of format strings, let’s summarize:
printf("%d", a); /* prints signed integers in decimal */
printf("%u", b); /* prints unsigned integers in decimal */
printf("%x", c); /* prints integers in hexadecimal */
printf("%o", c); /* prints integers in octal */
printf("%lu", d); /* prints long unsigned integer in decimal */
printf("%llu", d); /* prints long long unsigned integer in decimal */
printf("%p", &d); /* prints pointers (in hexadecimal) */
printf("%f", e); /* prints single-precision floats */
printf("%lf", e); /* prints double-precision floats */
printf("%llf", e); /* prints extended-precision floats */
There’s quite a few more, but these get you fairly far.
19
stdint.h
I Often we need to know how large an integer is
I Example: crypto primitives are optimized to work on, e.g., 32-bit
words
I Solution: Fixed-size integer types defined in stdint.h
I uint8_t is an 8-bit unsigned integer
I int8_t is an 8-bit signed integer
I uint16_t is a 16-bit unsigned integer
I . . .
I int64_t is a 64-bit signed integer
20
stdint.h
I Often we need to know how large an integer is
I Example: crypto primitives are optimized to work on, e.g., 32-bit
words
I Solution: Fixed-size integer types defined in stdint.h
I uint8_t is an 8-bit unsigned integer
I int8_t is an 8-bit signed integer
I uint16_t is a 16-bit unsigned integer
I . . .
I int64_t is a 64-bit signed integer
I Problem: how do we print them in a portable way?
I printf("%llun", a); for uint64_t a may produce warnings
20
stdint.h
I Often we need to know how large an integer is
I Example: crypto primitives are optimized to work on, e.g., 32-bit
words
I Solution: Fixed-size integer types defined in stdint.h
I uint8_t is an 8-bit unsigned integer
I int8_t is an 8-bit signed integer
I uint16_t is a 16-bit unsigned integer
I . . .
I int64_t is a 64-bit signed integer
I Problem: how do we print them in a portable way?
I printf("%llun", a); for uint64_t a may produce warnings
I Solution: printf("%" PRIu64 "n", a)
I For signed values, e.g., PRId64
I Printing in hexadecimal: PRIx64
20
Implicit type conversion
I Sometimes we want to evaluate expressions involving different types
I Example:
float pi, r, circ;
a = 3.14159265;
circ = 2*pi*r;
21
Implicit type conversion
I Sometimes we want to evaluate expressions involving different types
I Example:
float pi, r, circ;
a = 3.14159265;
circ = 2*pi*r;
I C uses complex rules to implicitly convert types
I Often these rules are perfectly intuitive:
I Convert “less precise” type to more precise type, preserve values
I Compute modulo 216
, when casting from uint32_t to uint16_t
21
Implicit type conversion
I Sometimes we want to evaluate expressions involving different types
I Example:
float pi, r, circ;
a = 3.14159265;
circ = 2*pi*r;
I C uses complex rules to implicitly convert types
I Often these rules are perfectly intuitive:
I Convert “less precise” type to more precise type, preserve values
I Compute modulo 216
, when casting from uint32_t to uint16_t
I However, these rules can be rather counterintuitive:
unsigned int a = 1;
int b = -1;
if(b < a) printf("all goodn");
else printf("WTF?n");
21
Explicit casts
I Sometimes we need to convert explicitly
I Example: multiply two (32-bit) integers:
unsigned int a,b;
...
unsigned long long r = a*b;
22
Explicit casts
I Sometimes we need to convert explicitly
I Example: multiply two (32-bit) integers:
unsigned int a,b;
...
unsigned long long r = a*b;
I By “default”, result of a*b has 32-bits; upper 32 bits are “lost”
I Fix by casting one (or both) factors:
unsigned long long r = (unsigned long long)a*b;
22
Explicit casts
I Sometimes we need to convert explicitly
I Example: multiply two (32-bit) integers:
unsigned int a,b;
...
unsigned long long r = a*b;
I By “default”, result of a*b has 32-bits; upper 32 bits are “lost”
I Fix by casting one (or both) factors:
unsigned long long r = (unsigned long long)a*b;
I Can also use this to, e.g., truncate floats:
float a = 3.14159265;
float c = (int) a;
printf("%fn", trunc(a));
printf("%fn", c);
22
Explicit casts
I Sometimes we need to convert explicitly
I Example: multiply two (32-bit) integers:
unsigned int a,b;
...
unsigned long long r = a*b;
I By “default”, result of a*b has 32-bits; upper 32 bits are “lost”
I Fix by casting one (or both) factors:
unsigned long long r = (unsigned long long)a*b;
I Can also use this to, e.g., truncate floats:
float a = 3.14159265;
float c = (int) a;
printf("%fn", trunc(a));
printf("%fn", c);
I Careful, this does not generally work (undefined behavior ahead)!
22
A small quiz
What do you think this program will print?
unsigned char x = 128;
signed char y = x;
printf("The value of y is %dn", y);
23
A small quiz
What do you think this program will print?
unsigned char x = 128;
signed char y = x;
printf("The value of y is %dn", y);
(Obviously, the answer is “unspecified behavior” – it’s C after all)
23
Two’s complement
I Can represent a signed integer as “sign + absolute value”
I Disadvantage: zero has two representations (0 and -0)
24
Two’s complement
I Can represent a signed integer as “sign + absolute value”
I Disadvantage: zero has two representations (0 and -0)
I Other idea: flip all bits in a to obtain -a
I This is known as “ones complement”
I Still: zero has two representations
24
Two’s complement
I Can represent a signed integer as “sign + absolute value”
I Disadvantage: zero has two representations (0 and -0)
I Other idea: flip all bits in a to obtain -a
I This is known as “ones complement”
I Still: zero has two representations
I Much more common: two’s complement
I flip all bits in a
I add 1
24
Two’s complement
I Can represent a signed integer as “sign + absolute value”
I Disadvantage: zero has two representations (0 and -0)
I Other idea: flip all bits in a to obtain -a
I This is known as “ones complement”
I Still: zero has two representations
I Much more common: two’s complement
I flip all bits in a
I add 1
I Sanity test: a = -(-a)
24
Two’s complement
I Can represent a signed integer as “sign + absolute value”
I Disadvantage: zero has two representations (0 and -0)
I Other idea: flip all bits in a to obtain -a
I This is known as “ones complement”
I Still: zero has two representations
I Much more common: two’s complement
I flip all bits in a
I add 1
I Sanity test: a = -(-a)
I Range of k-bit signed integer: {−2k−1
, . . . , 2k−1
− 1}
I Example: signed (8-bit) byte: {−128, . . . , 127}
24
Two’s complement
I Can represent a signed integer as “sign + absolute value”
I Disadvantage: zero has two representations (0 and -0)
I Other idea: flip all bits in a to obtain -a
I This is known as “ones complement”
I Still: zero has two representations
I Much more common: two’s complement
I flip all bits in a
I add 1
I Sanity test: a = -(-a)
I Range of k-bit signed integer: {−2k−1
, . . . , 2k−1
− 1}
I Example: signed (8-bit) byte: {−128, . . . , 127}
I Can use the same hardware for signed and unsigned addition
24
Endianess
I Let’s consider the 32-bit integer 287454020 =0x11223344
I How would you put it into memory. . . ,like this?:
| 11 | 22 | 33 | 44 |
0x0...0 0x0...1 0x0...2 0x0...3
25
Endianess
I Let’s consider the 32-bit integer 287454020 =0x11223344
I How would you put it into memory. . . ,like this?:
| 11 | 22 | 33 | 44 |
0x0...0 0x0...1 0x0...2 0x0...3
I How about like this?
| 44 | 33 | 22 | 11 |
0x0...0 0x0...1 0x0...2 0x0...3
25
Endianess
I Let’s consider the 32-bit integer 287454020 =0x11223344
I How would you put it into memory. . . ,like this?:
| 11 | 22 | 33 | 44 |
0x0...0 0x0...1 0x0...2 0x0...3
I How about like this?
| 44 | 33 | 22 | 11 |
0x0...0 0x0...1 0x0...2 0x0...3
I A quick poll: What do you find more intuitive?
25
Endianess, let’s try again
I Take 4-byte integer a =
P3
i=0 ai28i
I The ai are the bytes of a
26
Endianess, let’s try again
I Take 4-byte integer a =
P3
i=0 ai28i
I The ai are the bytes of a
I How would you put it into memory. . . ,like this?:
| a0 | a1 | a2 | a3 |
0x0...0 0x0...1 0x0...2 0x0...3
26
Endianess, let’s try again
I Take 4-byte integer a =
P3
i=0 ai28i
I The ai are the bytes of a
I How would you put it into memory. . . ,like this?:
| a0 | a1 | a2 | a3 |
0x0...0 0x0...1 0x0...2 0x0...3
I Or would you rather have this?
| a3 | a2 | a1 | a0 |
0x0...0 0x0...1 0x0...2 0x0...3
I Again a quick poll: What do you find more intuitive?
26
Endianess, the conclusion
I Least significant bytes at low addresses: little endian
I Most significant bytes at low addresses: big endian
27
Endianess, the conclusion
I Least significant bytes at low addresses: little endian
I Most significant bytes at low addresses: big endian
I This is short for “little/big endian byte first”
27
Endianess, the conclusion
I Least significant bytes at low addresses: little endian
I Most significant bytes at low addresses: big endian
I This is short for “little/big endian byte first”
I Most CPUs today use little endian
27
Endianess, the conclusion
I Least significant bytes at low addresses: little endian
I Most significant bytes at low addresses: big endian
I This is short for “little/big endian byte first”
I Most CPUs today use little endian
I Examples for big-endian CPUs:
I PowerPC
I UltraSPARC
I ARM can switch endianess (is “bi-endian”)
27
Endianess, the conclusion
I Least significant bytes at low addresses: little endian
I Most significant bytes at low addresses: big endian
I This is short for “little/big endian byte first”
I Most CPUs today use little endian
I Examples for big-endian CPUs:
I PowerPC
I UltraSPARC
I ARM can switch endianess (is “bi-endian”)
I The problem with little-endian intuition is just that we write
left-to-right (but use Arabic numbers)
27

More Related Content

Similar to c.pdf

Similar to c.pdf (20)

Introduction to C Programming
Introduction to C ProgrammingIntroduction to C Programming
Introduction to C Programming
 
lect02--memory.pdf
lect02--memory.pdflect02--memory.pdf
lect02--memory.pdf
 
C language tutorial
C language tutorialC language tutorial
C language tutorial
 
C language
C languageC language
C language
 
67404923-C-Programming-Tutorials-Doc.pdf
67404923-C-Programming-Tutorials-Doc.pdf67404923-C-Programming-Tutorials-Doc.pdf
67404923-C-Programming-Tutorials-Doc.pdf
 
Overview of c
Overview of cOverview of c
Overview of c
 
Introduction to c language
Introduction to c language Introduction to c language
Introduction to c language
 
Basic c
Basic cBasic c
Basic c
 
C Fundamental.docx
C Fundamental.docxC Fundamental.docx
C Fundamental.docx
 
C lecture notes new
C lecture notes newC lecture notes new
C lecture notes new
 
The smartpath information systems c pro
The smartpath information systems c proThe smartpath information systems c pro
The smartpath information systems c pro
 
Intro to cprogramming
Intro to cprogrammingIntro to cprogramming
Intro to cprogramming
 
C programming orientation
C programming orientationC programming orientation
C programming orientation
 
C Programming language - introduction
C Programming  language - introduction  C Programming  language - introduction
C Programming language - introduction
 
C LANGUAGE UNIT-1 PREPARED BY MVB REDDY
C LANGUAGE UNIT-1 PREPARED BY MVB REDDYC LANGUAGE UNIT-1 PREPARED BY MVB REDDY
C LANGUAGE UNIT-1 PREPARED BY MVB REDDY
 
C Programming UNIT 1.pptx
C Programming  UNIT 1.pptxC Programming  UNIT 1.pptx
C Programming UNIT 1.pptx
 
00 C hello world.pptx
00 C hello world.pptx00 C hello world.pptx
00 C hello world.pptx
 
C PROGRAMMING
C PROGRAMMINGC PROGRAMMING
C PROGRAMMING
 
C and C++ Industrial Training Jalandhar
C and C++ Industrial Training JalandharC and C++ Industrial Training Jalandhar
C and C++ Industrial Training Jalandhar
 
Introduction to c
Introduction to cIntroduction to c
Introduction to c
 

Recently uploaded

Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdftimtebeek1
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypseTomasz Kowalczewski
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphGraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphNeo4j
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletAndrea Goulet
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024SimonedeGijt
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxNeo4j
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Maxim Salnikov
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio, Inc.
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...OnePlan Solutions
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AIAGATSoftware
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Clinic
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit MilanNeo4j
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...naitiksharma1124
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jNeo4j
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 

Recently uploaded (20)

Weeding your micro service landscape.pdf
Weeding your micro service landscape.pdfWeeding your micro service landscape.pdf
Weeding your micro service landscape.pdf
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with GraphGraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
GraphSummit Milan & Stockholm - Neo4j: The Art of the Possible with Graph
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?Prompt Engineering - an Art, a Science, or your next Job Title?
Prompt Engineering - an Art, a Science, or your next Job Title?
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 

c.pdf

  • 1. Hacking in C The C programming language Radboud University, Nijmegen, The Netherlands Spring 2018
  • 2. The C programming language I Invented by Dennis Ritchie in the early 70s I First “Hello World” program written in C Source: Wikipedia 2
  • 3. The C programming language I Invented by Dennis Ritchie in the early 70s I First “Hello World” program written in C I UNIX (and Linux) is written in C I Still one of the top-5 most used programming languages I Compilers for almost all platforms Source: Wikipedia 2
  • 4. The C programming language I Invented by Dennis Ritchie in the early 70s I First “Hello World” program written in C I UNIX (and Linux) is written in C I Still one of the top-5 most used programming languages I Compilers for almost all platforms I Many “interesting” security issues Source: Wikipedia 2
  • 5. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 3
  • 6. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 I Standardized by ANSI in 1989 (C89) and ISO (C90) I Second edition of K&R book used “ANSI C”, i.e., C89 3
  • 7. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 I Standardized by ANSI in 1989 (C89) and ISO (C90) I Second edition of K&R book used “ANSI C”, i.e., C89 I In 1995, ANSI published an amendment to the C standard (“C95”) 3
  • 8. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 I Standardized by ANSI in 1989 (C89) and ISO (C90) I Second edition of K&R book used “ANSI C”, i.e., C89 I In 1995, ANSI published an amendment to the C standard (“C95”) I In 1999, ISO standardized updated C, ANSI adopted (C99) 3
  • 9. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 I Standardized by ANSI in 1989 (C89) and ISO (C90) I Second edition of K&R book used “ANSI C”, i.e., C89 I In 1995, ANSI published an amendment to the C standard (“C95”) I In 1999, ISO standardized updated C, ANSI adopted (C99) I Current standard is C11, standardized (ANSI and ISO) in 2011 I Standard draft online: https://port70.net/~nsz/c/c11/n1570.html 3
  • 10. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 I Standardized by ANSI in 1989 (C89) and ISO (C90) I Second edition of K&R book used “ANSI C”, i.e., C89 I In 1995, ANSI published an amendment to the C standard (“C95”) I In 1999, ISO standardized updated C, ANSI adopted (C99) I Current standard is C11, standardized (ANSI and ISO) in 2011 I Standard draft online: https://port70.net/~nsz/c/c11/n1570.html I Compilers like gcc or clang also support GNU extensions I Default for gcc: C11 plus GNU extensions (aka gnu11) 3
  • 11. C standards and “standards” I First definition in Kernighan&Ritchie: “The C Programming Language” I Also known as K&R C, book appared in 1978 I Standardized by ANSI in 1989 (C89) and ISO (C90) I Second edition of K&R book used “ANSI C”, i.e., C89 I In 1995, ANSI published an amendment to the C standard (“C95”) I In 1999, ISO standardized updated C, ANSI adopted (C99) I Current standard is C11, standardized (ANSI and ISO) in 2011 I Standard draft online: https://port70.net/~nsz/c/c11/n1570.html I Compilers like gcc or clang also support GNU extensions I Default for gcc: C11 plus GNU extensions (aka gnu11) I You can switch gcc to other C standards using, e.g., -std=c89 I Use -pedantic flag to issue warnings if your code doesn’t conform to the standard 3
  • 12. C vs. C++ I C is the basis for C++, Objective-C, and many other languages I C is not a subset of C++, e.g., int *x = malloc(sizeof(int) * 10); is valid (and perfectly reasonable) C, but not valid C++! 4
  • 13. C vs. C++ I C is the basis for C++, Objective-C, and many other languages I C is not a subset of C++, e.g., int *x = malloc(sizeof(int) * 10); is valid (and perfectly reasonable) C, but not valid C++! I You can “mix” C and C++ code, but you have to be very careful I In C++, declare C functions as extern "C", for example: extern "C" int mycfunction(int); I Now you can call mycfunction from your C++ code I Use compiler by the same vendor to compile 4
  • 14. C vs. C++ I C is the basis for C++, Objective-C, and many other languages I C is not a subset of C++, e.g., int *x = malloc(sizeof(int) * 10); is valid (and perfectly reasonable) C, but not valid C++! I You can “mix” C and C++ code, but you have to be very careful I In C++, declare C functions as extern "C", for example: extern "C" int mycfunction(int); I Now you can call mycfunction from your C++ code I Use compiler by the same vendor to compile I Lets you use, e.g., highly optimized C libraries I Common scenario: I Write high-speed code in C (and assembly) I Write so-called wrappers around this for easy access in C++ 4
  • 15. A “portable assembler” C has been characterized (both admiringly and invidiously) as a portable assembly language —Dennis Ritchie I Idea of assembly: I Programmer has full control over the program I Choice of instructions, register allocation etc. left to programmer I Programmer has “raw access” to memory 5
  • 16. A “portable assembler” C has been characterized (both admiringly and invidiously) as a portable assembly language —Dennis Ritchie I Idea of assembly: I Programmer has full control over the program I Choice of instructions, register allocation etc. left to programmer I Programmer has “raw access” to memory I Need to rewrite programs for each architecture I Need to re-optimize for each microarchitecture 5
  • 17. A “portable assembler” C has been characterized (both admiringly and invidiously) as a portable assembly language —Dennis Ritchie I Idea of assembly: I Programmer has full control over the program I Choice of instructions, register allocation etc. left to programmer I Programmer has “raw access” to memory I Need to rewrite programs for each architecture I Need to re-optimize for each microarchitecture I Idea of C: I Take away some bits of control from the programmer I Stay as close as possible to assembly, but stay portable I In particular: give programmer raw access to memory 5
  • 18. A “portable assembler” C has been characterized (both admiringly and invidiously) as a portable assembly language —Dennis Ritchie I Idea of assembly: I Programmer has full control over the program I Choice of instructions, register allocation etc. left to programmer I Programmer has “raw access” to memory I Need to rewrite programs for each architecture I Need to re-optimize for each microarchitecture I Idea of C: I Take away some bits of control from the programmer I Stay as close as possible to assembly, but stay portable I In particular: give programmer raw access to memory I Use compiler to generate code for different architectures I Use compiler to optimize for different microarchitectures 5
  • 19. “If programming languages were. . . ” I . . . vehicles http: //crashworks.org/if_programming_languages_were_vehicles/ I . . . countries https://www.quora.com/ If-programming-languages-were-countries-which-country-would-each-l I . . . GoT characters https://techbeacon.com/ if-programming-languages-were-game-thrones-characters I . . . beer https: //www.topcoder.com/blog/if-programming-languages-were-beer/ 6
  • 20. “If programming languages were. . . ” I . . . vehicles http: //crashworks.org/if_programming_languages_were_vehicles/ I . . . countries https://www.quora.com/ If-programming-languages-were-countries-which-country-would-each-l I . . . GoT characters https://techbeacon.com/ if-programming-languages-were-game-thrones-characters I . . . beer https: //www.topcoder.com/blog/if-programming-languages-were-beer/ I . . . boats http://compsci.ca/blog/if-a-programming-language-was-a-boat/ 6
  • 21. “If programming languages were. . . ” “C is a nuclear submarine. The instructions are probably in a foreign language, but all of the hardware itself is optimized for performance. 6
  • 22. Syntax and semantics Syntax of a programming language I Spelling and grammar rules I Defines the language of valid programs I Syntax errors are caught by the compiler I Classical example: forget a ; at the end of a line 7
  • 23. Syntax and semantics Syntax of a programming language I Spelling and grammar rules I Defines the language of valid programs I Syntax errors are caught by the compiler I Classical example: forget a ; at the end of a line Semantics of a programming language I Defines the meaning of a valid program I In many languages semantics are fully specified I Runtime errors (exceptions) are part of the semantics I C is not fully specified! 7
  • 24. Unspecified behavior I Unspecified behavior is “implementation-specific” I Semantics not defined by the standard, but have to specified by the compiler I Reason: allow better optimization 8
  • 25. Unspecified behavior I Unspecified behavior is “implementation-specific” I Semantics not defined by the standard, but have to specified by the compiler I Reason: allow better optimization I Examples: I Shifting negative values to the right (e.g., int a = (-42) >> 3) 8
  • 26. Unspecified behavior I Unspecified behavior is “implementation-specific” I Semantics not defined by the standard, but have to specified by the compiler I Reason: allow better optimization I Examples: I Shifting negative values to the right (e.g., int a = (-42) >> 3) I Order of subexpression evaluation (e.g., f(g(), h())) 8
  • 27. Unspecified behavior I Unspecified behavior is “implementation-specific” I Semantics not defined by the standard, but have to specified by the compiler I Reason: allow better optimization I Examples: I Shifting negative values to the right (e.g., int a = (-42) >> 3) I Order of subexpression evaluation (e.g., f(g(), h())) I Sizes of of various types (more later) I Representation of data types (more later) 8
  • 28. Unspecified behavior I Unspecified behavior is “implementation-specific” I Semantics not defined by the standard, but have to specified by the compiler I Reason: allow better optimization I Examples: I Shifting negative values to the right (e.g., int a = (-42) >> 3) I Order of subexpression evaluation (e.g., f(g(), h())) I Sizes of of various types (more later) I Representation of data types (more later) I Number of bits in one byte 8
  • 29. Unspecified behavior I Unspecified behavior is “implementation-specific” I Semantics not defined by the standard, but have to specified by the compiler I Reason: allow better optimization I Examples: I Shifting negative values to the right (e.g., int a = (-42) >> 3) I Order of subexpression evaluation (e.g., f(g(), h())) I Sizes of of various types (more later) I Representation of data types (more later) I Number of bits in one byte I Fairly hard to write fully specified C programs I For this course: if not otherwise stated assume gcc (version 6.x or 7.x) compiling for AMD64. 8
  • 30. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs 9
  • 31. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs I Undefined behavior means that the whole program has no meaning anymore! I This is essentially always a bug, often security critical 9
  • 32. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs I Undefined behavior means that the whole program has no meaning anymore! I This is essentially always a bug, often security critical I Examples: I Access an array outside the bounds I More generally: access memory at “illegal” position 9
  • 33. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs I Undefined behavior means that the whole program has no meaning anymore! I This is essentially always a bug, often security critical I Examples: I Access an array outside the bounds I More generally: access memory at “illegal” position I Overflowing a signed integer ((INT_MAX+1)) 9
  • 34. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs I Undefined behavior means that the whole program has no meaning anymore! I This is essentially always a bug, often security critical I Examples: I Access an array outside the bounds I More generally: access memory at “illegal” position I Overflowing a signed integer ((INT_MAX+1)) I Left-shifting a signed integer ((-42) << 3) 9
  • 35. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs I Undefined behavior means that the whole program has no meaning anymore! I This is essentially always a bug, often security critical I Examples: I Access an array outside the bounds I More generally: access memory at “illegal” position I Overflowing a signed integer ((INT_MAX+1)) I Left-shifting a signed integer ((-42) << 3) I It is totally acceptable for a program to delete all your data when running into undefined behavior 9
  • 36. Undefined behavior I Different from unspecified behavior: undefined behavior I Program reaches a state in which it may do anything I It may crash with arbitrary error code I It may silently corrupt data I It may give the right result I The behavior may be “randomly” different in independent runs I Undefined behavior means that the whole program has no meaning anymore! I This is essentially always a bug, often security critical I Examples: I Access an array outside the bounds I More generally: access memory at “illegal” position I Overflowing a signed integer ((INT_MAX+1)) I Left-shifting a signed integer ((-42) << 3) I It is totally acceptable for a program to delete all your data when running into undefined behavior I Sometimes we can make a program do this (or something similar) I Most attacks in the course: exploit undefined behavior 9
  • 37. C compilation I Four steps involved in compilation, can stop at any of those I First step: Run the preprocessor (gcc -E) I Include code from #include directives I Expand macros from #define directives I Expand compile-time (static) conditionals #if I The C preprocessor is almost Turing complete I See https://github.com/orangeduck/CPP_COMPLETE for a Brainfuck interpreter written in the C preprocessor 10
  • 38. C compilation I Four steps involved in compilation, can stop at any of those I First step: Run the preprocessor (gcc -E) I Include code from #include directives I Expand macros from #define directives I Expand compile-time (static) conditionals #if I The C preprocessor is almost Turing complete I See https://github.com/orangeduck/CPP_COMPLETE for a Brainfuck interpreter written in the C preprocessor I Second step: Run compilation proper (gcc -S) I Go from C to assembly level I This is where you get syntax errors 10
  • 39. C compilation I Four steps involved in compilation, can stop at any of those I First step: Run the preprocessor (gcc -E) I Include code from #include directives I Expand macros from #define directives I Expand compile-time (static) conditionals #if I The C preprocessor is almost Turing complete I See https://github.com/orangeduck/CPP_COMPLETE for a Brainfuck interpreter written in the C preprocessor I Second step: Run compilation proper (gcc -S) I Go from C to assembly level I This is where you get syntax errors I Third step: Generate machine code (gcc -c) I Generates so-called object files 10
  • 40. C compilation I Four steps involved in compilation, can stop at any of those I First step: Run the preprocessor (gcc -E) I Include code from #include directives I Expand macros from #define directives I Expand compile-time (static) conditionals #if I The C preprocessor is almost Turing complete I See https://github.com/orangeduck/CPP_COMPLETE for a Brainfuck interpreter written in the C preprocessor I Second step: Run compilation proper (gcc -S) I Go from C to assembly level I This is where you get syntax errors I Third step: Generate machine code (gcc -c) I Generates so-called object files I Fourth step: Linking (simply run gcc, this is default) I Put object files together to a binary I Linker errors include missing functions or function duplicates I Also include external libraries here (e.g., -lm) I Caution: order of arguments can matter! 10
  • 41. Memory abstraction 1: where data is stored I Programmers typically don’t know where data is stored I For example, a variable can sit in I a register of the CPU I in any of the caches of the CPU I in RAM I on the hard drive (in so-called swap space) 11
  • 42. Memory abstraction 1: where data is stored I Programmers typically don’t know where data is stored I For example, a variable can sit in I a register of the CPU I in any of the caches of the CPU I in RAM I on the hard drive (in so-called swap space) I Compiler makes decisions about register allocation I Compiler has some bit of influence on caching I Other decisions are made by the OS (and the CPU) 11
  • 43. Memory abstraction 1: where data is stored I Programmers typically don’t know where data is stored I For example, a variable can sit in I a register of the CPU I in any of the caches of the CPU I in RAM I on the hard drive (in so-called swap space) I Compiler makes decisions about register allocation I Compiler has some bit of influence on caching I Other decisions are made by the OS (and the CPU) I Sometimes important: always read the variable from memory I C has keyword volatile to enforce this I Disables certain optimization 11
  • 44. Where is data allocated? I C has the & operator that returns the address of a variable I Example: I Let’s say we have a variable int x = 12 I Now &x is the address where x is stored, aka a pointer to x 12
  • 45. Where is data allocated? I C has the & operator that returns the address of a variable I Example: I Let’s say we have a variable int x = 12 I Now &x is the address where x is stored, aka a pointer to x I Much more on pointers later, for the moment let’s print them: char x; int i; short s; char y; printf("The address of x is %pn", &x); printf("The address of i is %pn", &i); printf("The address of s is %pn", &s); printf("The address of y is %pn", &y); I Note the %p format specifier for pointers 12
  • 46. Where is data allocated? I C has the & operator that returns the address of a variable I Example: I Let’s say we have a variable int x = 12 I Now &x is the address where x is stored, aka a pointer to x I Much more on pointers later, for the moment let’s print them: char x; int i; short s; char y; printf("The address of x is %pn", &x); printf("The address of i is %pn", &i); printf("The address of s is %pn", &s); printf("The address of y is %pn", &y); I Note the %p format specifier for pointers I The “inverse” of & is *, i.e., *(&x) gives the value of x 12
  • 47. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” 13
  • 48. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” I C lets programmers “help” the compiler with keyword register 13
  • 49. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” I C lets programmers “help” the compiler with keyword register I Quote from Erik’s slides: “you should never ever use this! Compilers are much better than you are at figuring out which data is best stored in CPU registers.” 13
  • 50. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” I C lets programmers “help” the compiler with keyword register I Quote from Erik’s slides: “you should never ever use this! Compilers are much better than you are at figuring out which data is best stored in CPU registers.” I I agree that I never (?) use register 13
  • 51. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” I C lets programmers “help” the compiler with keyword register I Quote from Erik’s slides: “you should never ever use this! Compilers are much better than you are at figuring out which data is best stored in CPU registers.” I I agree that I never (?) use register I Reason: I am (often) better than the compiler at figuring out which data is best stored in CPU registers. . . 13
  • 52. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” I C lets programmers “help” the compiler with keyword register I Quote from Erik’s slides: “you should never ever use this! Compilers are much better than you are at figuring out which data is best stored in CPU registers.” I I agree that I never (?) use register I Reason: I am (often) better than the compiler at figuring out which data is best stored in CPU registers. . . I . . . and then I write in assembly and avoid the compiler alltogether 13
  • 53. register I Important task for the compiler: register allocation I Map live variables (whose values are still needed) to registers I Typical goal: minimize amount of “register spills” I C lets programmers “help” the compiler with keyword register I Quote from Erik’s slides: “you should never ever use this! Compilers are much better than you are at figuring out which data is best stored in CPU registers.” I I agree that I never (?) use register I Reason: I am (often) better than the compiler at figuring out which data is best stored in CPU registers. . . I . . . and then I write in assembly and avoid the compiler alltogether I Problem with register: no guarantee that the value isn’t spilled I Requesting the address of a register variable is invalid! 13
  • 54. Memory abstraction 2: how data is stored I You can think of memory as an array of bytes I For this course: a byte consists of 8 bits 14
  • 55. Memory abstraction 2: how data is stored I You can think of memory as an array of bytes I For this course: a byte consists of 8 bits I Computer programs work with different data types I Important step of compilation: map other types to bytes 14
  • 56. Memory abstraction 2: how data is stored I You can think of memory as an array of bytes I For this course: a byte consists of 8 bits I Computer programs work with different data types I Important step of compilation: map other types to bytes I Idea of C: you can program without needing to understand this mapping I Idea of this course: you can have more fun with C if you do! 14
  • 57. Memory abstraction 2: how data is stored I You can think of memory as an array of bytes I For this course: a byte consists of 8 bits I Computer programs work with different data types I Important step of compilation: map other types to bytes I Idea of C: you can program without needing to understand this mapping I Idea of this course: you can have more fun with C if you do! I The CPU likes to see the memory as an array of words I Words typically consist of several bytes (e.g., 4 or 8 bytes) I (Most) registers have the size of machine words I Often loads and stores are more efficient when aligned to a word boundary 14
  • 58. Memory abstraction 2: how data is stored I You can think of memory as an array of bytes I For this course: a byte consists of 8 bits I Computer programs work with different data types I Important step of compilation: map other types to bytes I Idea of C: you can program without needing to understand this mapping I Idea of this course: you can have more fun with C if you do! I The CPU likes to see the memory as an array of words I Words typically consist of several bytes (e.g., 4 or 8 bytes) I (Most) registers have the size of machine words I Often loads and stores are more efficient when aligned to a word boundary I von Neumann architecture: also programs are just bytes in memory I Only difference between data and program: what you do with it 14
  • 59. char I Most basic data type: char I From the C11 standard: “An object declared as type char is large enough to store any member of the basic execution character set.” I More useful definition: a char is a byte, i.e., the smallest addressable unit of memory I In all relevant scenarios: a char is an 8-bit integer 15
  • 60. char I Most basic data type: char I From the C11 standard: “An object declared as type char is large enough to store any member of the basic execution character set.” I More useful definition: a char is a byte, i.e., the smallest addressable unit of memory I In all relevant scenarios: a char is an 8-bit integer I Traditionally a char is used to represent ASCII characters, yields two common ways to initialize a char: char a = ’2’; char b = 2; char c = 50; I Which of those values are equal? 15
  • 61. char I Most basic data type: char I From the C11 standard: “An object declared as type char is large enough to store any member of the basic execution character set.” I More useful definition: a char is a byte, i.e., the smallest addressable unit of memory I In all relevant scenarios: a char is an 8-bit integer I Traditionally a char is used to represent ASCII characters, yields two common ways to initialize a char: char a = ’2’; char b = 2; char c = 50; I Which of those values are equal? 15
  • 62. char I Most basic data type: char I From the C11 standard: “An object declared as type char is large enough to store any member of the basic execution character set.” I More useful definition: a char is a byte, i.e., the smallest addressable unit of memory I In all relevant scenarios: a char is an 8-bit integer I Traditionally a char is used to represent ASCII characters, yields two common ways to initialize a char: char a = ’2’; char b = 2; char c = 50; I Which of those values are equal? I It’s a and c, because ’2’ has ASCII value 50. 15
  • 63. Another quick question. . . I What does the following code do?: char i; for(i=42;i>=0;i--) { printf("Crypto stands for cryptographyn"); } 16
  • 64. Another quick question. . . I What does the following code do?: char i; for(i=42;i>=0;i--) { printf("Crypto stands for cryptographyn"); } 16
  • 65. Another quick question. . . I What does the following code do?: char i; for(i=42;i>=0;i--) { printf("Crypto stands for cryptographyn"); } I Answer: it depends (and it really does!) I C standard does not define whether char is signed or unsigned I Make explicit by using signed char or unsigned char 16
  • 66. Other integral types I C11 provides 4 more integral types (each signed and unsigned): I short: at least 2 bytes I int: typically 4 (but sometimes 2) bytes I long: typically 4 or 8 bytes I long long: at least 8 bytes (in practice: exactly 8 bytes) 17
  • 67. Other integral types I C11 provides 4 more integral types (each signed and unsigned): I short: at least 2 bytes I int: typically 4 (but sometimes 2) bytes I long: typically 4 or 8 bytes I long long: at least 8 bytes (in practice: exactly 8 bytes) I GNU extension: __int128 for architectures that support it 17
  • 68. Other integral types I C11 provides 4 more integral types (each signed and unsigned): I short: at least 2 bytes I int: typically 4 (but sometimes 2) bytes I long: typically 4 or 8 bytes I long long: at least 8 bytes (in practice: exactly 8 bytes) I GNU extension: __int128 for architectures that support it I Common misconception: long is as long as a machine word I Think about how this would work on an 8-bit microcontroller. . . 17
  • 69. Other integral types I C11 provides 4 more integral types (each signed and unsigned): I short: at least 2 bytes I int: typically 4 (but sometimes 2) bytes I long: typically 4 or 8 bytes I long long: at least 8 bytes (in practice: exactly 8 bytes) I GNU extension: __int128 for architectures that support it I Common misconception: long is as long as a machine word I Think about how this would work on an 8-bit microcontroller. . . I Find size of any type in bytes using sizeof, e.g.: int a; printf("%zd", sizeof(a)); printf("%zd", sizeof(long)); 17
  • 70. Other integral types I C11 provides 4 more integral types (each signed and unsigned): I short: at least 2 bytes I int: typically 4 (but sometimes 2) bytes I long: typically 4 or 8 bytes I long long: at least 8 bytes (in practice: exactly 8 bytes) I GNU extension: __int128 for architectures that support it I Common misconception: long is as long as a machine word I Think about how this would work on an 8-bit microcontroller. . . I Find size of any type in bytes using sizeof, e.g.: int a; printf("%zd", sizeof(a)); printf("%zd", sizeof(long)); I Integral constants can be written in I Decimal, e.g., 255 I Hexadecimal, using 0x, e.g., 0xff I Octal, using 0, e.g., 0377 17
  • 71. Floating-point and complex values I C also defines 3 “real” types: I float: usually 32-bit IEEE 754 “single-precision” floats I double: usually 64-bit IEEE 754 “double-precision” floats I long double:: usually 80-bit “extended precision” floats 18
  • 72. Floating-point and complex values I C also defines 3 “real” types: I float: usually 32-bit IEEE 754 “single-precision” floats I double: usually 64-bit IEEE 754 “double-precision” floats I long double:: usually 80-bit “extended precision” floats I Corresponding “complex” types (need to include complex.h) 18
  • 73. Floating-point and complex values I C also defines 3 “real” types: I float: usually 32-bit IEEE 754 “single-precision” floats I double: usually 64-bit IEEE 754 “double-precision” floats I long double:: usually 80-bit “extended precision” floats I Corresponding “complex” types (need to include complex.h) I This lecture: not much float hacking I However, this is fun, see “What every computer scientist should know about floating point arithmetic” www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf 18
  • 74. Floating-point and complex values I C also defines 3 “real” types: I float: usually 32-bit IEEE 754 “single-precision” floats I double: usually 64-bit IEEE 754 “double-precision” floats I long double:: usually 80-bit “extended precision” floats I Corresponding “complex” types (need to include complex.h) I This lecture: not much float hacking I However, this is fun, see “What every computer scientist should know about floating point arithmetic” www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf I Small example: double a; /* assume IEEE 754 standard */ ... a += 6755399441055744; a -= 6755399441055744; I What does this code do to a? 18
  • 75. Floating-point and complex values I C also defines 3 “real” types: I float: usually 32-bit IEEE 754 “single-precision” floats I double: usually 64-bit IEEE 754 “double-precision” floats I long double:: usually 80-bit “extended precision” floats I Corresponding “complex” types (need to include complex.h) I This lecture: not much float hacking I However, this is fun, see “What every computer scientist should know about floating point arithmetic” www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf I Small example: double a; /* assume IEEE 754 standard */ ... a += 6755399441055744; a -= 6755399441055744; I What does this code do to a? 18
  • 76. Floating-point and complex values I C also defines 3 “real” types: I float: usually 32-bit IEEE 754 “single-precision” floats I double: usually 64-bit IEEE 754 “double-precision” floats I long double:: usually 80-bit “extended precision” floats I Corresponding “complex” types (need to include complex.h) I This lecture: not much float hacking I However, this is fun, see “What every computer scientist should know about floating point arithmetic” www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf I Small example: double a; /* assume IEEE 754 standard */ ... a += 6755399441055744; a -= 6755399441055744; I What does this code do to a? I Answer: it rounds a according to the currently set rounding mode 18
  • 77. Printing values Have already seen various examples of format strings, let’s summarize: printf("%d", a); /* prints signed integers in decimal */ printf("%u", b); /* prints unsigned integers in decimal */ printf("%x", c); /* prints integers in hexadecimal */ printf("%o", c); /* prints integers in octal */ printf("%lu", d); /* prints long unsigned integer in decimal */ printf("%llu", d); /* prints long long unsigned integer in decimal */ printf("%p", &d); /* prints pointers (in hexadecimal) */ printf("%f", e); /* prints single-precision floats */ printf("%lf", e); /* prints double-precision floats */ printf("%llf", e); /* prints extended-precision floats */ There’s quite a few more, but these get you fairly far. 19
  • 78. stdint.h I Often we need to know how large an integer is I Example: crypto primitives are optimized to work on, e.g., 32-bit words I Solution: Fixed-size integer types defined in stdint.h I uint8_t is an 8-bit unsigned integer I int8_t is an 8-bit signed integer I uint16_t is a 16-bit unsigned integer I . . . I int64_t is a 64-bit signed integer 20
  • 79. stdint.h I Often we need to know how large an integer is I Example: crypto primitives are optimized to work on, e.g., 32-bit words I Solution: Fixed-size integer types defined in stdint.h I uint8_t is an 8-bit unsigned integer I int8_t is an 8-bit signed integer I uint16_t is a 16-bit unsigned integer I . . . I int64_t is a 64-bit signed integer I Problem: how do we print them in a portable way? I printf("%llun", a); for uint64_t a may produce warnings 20
  • 80. stdint.h I Often we need to know how large an integer is I Example: crypto primitives are optimized to work on, e.g., 32-bit words I Solution: Fixed-size integer types defined in stdint.h I uint8_t is an 8-bit unsigned integer I int8_t is an 8-bit signed integer I uint16_t is a 16-bit unsigned integer I . . . I int64_t is a 64-bit signed integer I Problem: how do we print them in a portable way? I printf("%llun", a); for uint64_t a may produce warnings I Solution: printf("%" PRIu64 "n", a) I For signed values, e.g., PRId64 I Printing in hexadecimal: PRIx64 20
  • 81. Implicit type conversion I Sometimes we want to evaluate expressions involving different types I Example: float pi, r, circ; a = 3.14159265; circ = 2*pi*r; 21
  • 82. Implicit type conversion I Sometimes we want to evaluate expressions involving different types I Example: float pi, r, circ; a = 3.14159265; circ = 2*pi*r; I C uses complex rules to implicitly convert types I Often these rules are perfectly intuitive: I Convert “less precise” type to more precise type, preserve values I Compute modulo 216 , when casting from uint32_t to uint16_t 21
  • 83. Implicit type conversion I Sometimes we want to evaluate expressions involving different types I Example: float pi, r, circ; a = 3.14159265; circ = 2*pi*r; I C uses complex rules to implicitly convert types I Often these rules are perfectly intuitive: I Convert “less precise” type to more precise type, preserve values I Compute modulo 216 , when casting from uint32_t to uint16_t I However, these rules can be rather counterintuitive: unsigned int a = 1; int b = -1; if(b < a) printf("all goodn"); else printf("WTF?n"); 21
  • 84. Explicit casts I Sometimes we need to convert explicitly I Example: multiply two (32-bit) integers: unsigned int a,b; ... unsigned long long r = a*b; 22
  • 85. Explicit casts I Sometimes we need to convert explicitly I Example: multiply two (32-bit) integers: unsigned int a,b; ... unsigned long long r = a*b; I By “default”, result of a*b has 32-bits; upper 32 bits are “lost” I Fix by casting one (or both) factors: unsigned long long r = (unsigned long long)a*b; 22
  • 86. Explicit casts I Sometimes we need to convert explicitly I Example: multiply two (32-bit) integers: unsigned int a,b; ... unsigned long long r = a*b; I By “default”, result of a*b has 32-bits; upper 32 bits are “lost” I Fix by casting one (or both) factors: unsigned long long r = (unsigned long long)a*b; I Can also use this to, e.g., truncate floats: float a = 3.14159265; float c = (int) a; printf("%fn", trunc(a)); printf("%fn", c); 22
  • 87. Explicit casts I Sometimes we need to convert explicitly I Example: multiply two (32-bit) integers: unsigned int a,b; ... unsigned long long r = a*b; I By “default”, result of a*b has 32-bits; upper 32 bits are “lost” I Fix by casting one (or both) factors: unsigned long long r = (unsigned long long)a*b; I Can also use this to, e.g., truncate floats: float a = 3.14159265; float c = (int) a; printf("%fn", trunc(a)); printf("%fn", c); I Careful, this does not generally work (undefined behavior ahead)! 22
  • 88. A small quiz What do you think this program will print? unsigned char x = 128; signed char y = x; printf("The value of y is %dn", y); 23
  • 89. A small quiz What do you think this program will print? unsigned char x = 128; signed char y = x; printf("The value of y is %dn", y); (Obviously, the answer is “unspecified behavior” – it’s C after all) 23
  • 90. Two’s complement I Can represent a signed integer as “sign + absolute value” I Disadvantage: zero has two representations (0 and -0) 24
  • 91. Two’s complement I Can represent a signed integer as “sign + absolute value” I Disadvantage: zero has two representations (0 and -0) I Other idea: flip all bits in a to obtain -a I This is known as “ones complement” I Still: zero has two representations 24
  • 92. Two’s complement I Can represent a signed integer as “sign + absolute value” I Disadvantage: zero has two representations (0 and -0) I Other idea: flip all bits in a to obtain -a I This is known as “ones complement” I Still: zero has two representations I Much more common: two’s complement I flip all bits in a I add 1 24
  • 93. Two’s complement I Can represent a signed integer as “sign + absolute value” I Disadvantage: zero has two representations (0 and -0) I Other idea: flip all bits in a to obtain -a I This is known as “ones complement” I Still: zero has two representations I Much more common: two’s complement I flip all bits in a I add 1 I Sanity test: a = -(-a) 24
  • 94. Two’s complement I Can represent a signed integer as “sign + absolute value” I Disadvantage: zero has two representations (0 and -0) I Other idea: flip all bits in a to obtain -a I This is known as “ones complement” I Still: zero has two representations I Much more common: two’s complement I flip all bits in a I add 1 I Sanity test: a = -(-a) I Range of k-bit signed integer: {−2k−1 , . . . , 2k−1 − 1} I Example: signed (8-bit) byte: {−128, . . . , 127} 24
  • 95. Two’s complement I Can represent a signed integer as “sign + absolute value” I Disadvantage: zero has two representations (0 and -0) I Other idea: flip all bits in a to obtain -a I This is known as “ones complement” I Still: zero has two representations I Much more common: two’s complement I flip all bits in a I add 1 I Sanity test: a = -(-a) I Range of k-bit signed integer: {−2k−1 , . . . , 2k−1 − 1} I Example: signed (8-bit) byte: {−128, . . . , 127} I Can use the same hardware for signed and unsigned addition 24
  • 96. Endianess I Let’s consider the 32-bit integer 287454020 =0x11223344 I How would you put it into memory. . . ,like this?: | 11 | 22 | 33 | 44 | 0x0...0 0x0...1 0x0...2 0x0...3 25
  • 97. Endianess I Let’s consider the 32-bit integer 287454020 =0x11223344 I How would you put it into memory. . . ,like this?: | 11 | 22 | 33 | 44 | 0x0...0 0x0...1 0x0...2 0x0...3 I How about like this? | 44 | 33 | 22 | 11 | 0x0...0 0x0...1 0x0...2 0x0...3 25
  • 98. Endianess I Let’s consider the 32-bit integer 287454020 =0x11223344 I How would you put it into memory. . . ,like this?: | 11 | 22 | 33 | 44 | 0x0...0 0x0...1 0x0...2 0x0...3 I How about like this? | 44 | 33 | 22 | 11 | 0x0...0 0x0...1 0x0...2 0x0...3 I A quick poll: What do you find more intuitive? 25
  • 99. Endianess, let’s try again I Take 4-byte integer a = P3 i=0 ai28i I The ai are the bytes of a 26
  • 100. Endianess, let’s try again I Take 4-byte integer a = P3 i=0 ai28i I The ai are the bytes of a I How would you put it into memory. . . ,like this?: | a0 | a1 | a2 | a3 | 0x0...0 0x0...1 0x0...2 0x0...3 26
  • 101. Endianess, let’s try again I Take 4-byte integer a = P3 i=0 ai28i I The ai are the bytes of a I How would you put it into memory. . . ,like this?: | a0 | a1 | a2 | a3 | 0x0...0 0x0...1 0x0...2 0x0...3 I Or would you rather have this? | a3 | a2 | a1 | a0 | 0x0...0 0x0...1 0x0...2 0x0...3 I Again a quick poll: What do you find more intuitive? 26
  • 102. Endianess, the conclusion I Least significant bytes at low addresses: little endian I Most significant bytes at low addresses: big endian 27
  • 103. Endianess, the conclusion I Least significant bytes at low addresses: little endian I Most significant bytes at low addresses: big endian I This is short for “little/big endian byte first” 27
  • 104. Endianess, the conclusion I Least significant bytes at low addresses: little endian I Most significant bytes at low addresses: big endian I This is short for “little/big endian byte first” I Most CPUs today use little endian 27
  • 105. Endianess, the conclusion I Least significant bytes at low addresses: little endian I Most significant bytes at low addresses: big endian I This is short for “little/big endian byte first” I Most CPUs today use little endian I Examples for big-endian CPUs: I PowerPC I UltraSPARC I ARM can switch endianess (is “bi-endian”) 27
  • 106. Endianess, the conclusion I Least significant bytes at low addresses: little endian I Most significant bytes at low addresses: big endian I This is short for “little/big endian byte first” I Most CPUs today use little endian I Examples for big-endian CPUs: I PowerPC I UltraSPARC I ARM can switch endianess (is “bi-endian”) I The problem with little-endian intuition is just that we write left-to-right (but use Arabic numbers) 27