Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

C Tutorials

on

  • 2,665 views

 

Statistics

Views

Total Views
2,665
Views on SlideShare
2,665
Embed Views
0

Actions

Likes
1
Downloads
59
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • In my first summer internship my task was to convert a numeric program from FORTRAN to C, knowing nothing about either language – I only knew Pascal and BASIC. The result was that I sort of learned C, but in a sort of voodoo way, never really being sure what was going on, but getting some things to work, at least some of the time. Since then I’ve gradually worked out how things really work, and really make sense, and become proficient…. But this was a slow process. Today I hope to help shortcut some of that process. How this is going to proceed. I have limited time here so I will go kind of fast. I may mention some things in passing that I won’t fully explain. However, afterwards you will have the .ppt file and the notes. All the stuff I mention in passing is in the slides and the notes. This way, even if you don’t “get” all the details the first time, you can go back and refer to the slides and notes.
  • Gcc compiler options: -Wall tells the compiler to generate ALL “warnings”. These warnings will often identify stupid mistakes. -g tells the compiler to generate debugging information If you don’t supply a –o option to set an output filename, it will create an executable called “a.out” ./? To run a program in the current directory use ./program ‘.’ is the current directory. Otherwise the shell will only run program that are in your “PATH” variable, which is usually “standard” paths like /bin and /usr/bin What if it doesn’t work? If your program compiles but doesn’t work, there are many debugging tools at your disposal. “ strace” traces all system calls made by your program. This will tell you when you call into the OS, e.g. open(), read(), write(), etc. “ ltrace” traces all standard library calls made by your program. “ gdb” is a source-level debugger. With this tool you can analyse core dumps, and step through your program line by line as it runs. “ valgrind” is an x86 virtual machine that is useful for catching memory errors such as use of freed memory, bad pointers, etc. Another common way to trace a program is “printf debugging”, where you put printf statements throughout your code to see what’s going on.
  • What do the <> mean? Include directives use (brackets) to indicate that the compiler should look in a “standard” place, such as /usr/include/… They use “x.h” (double quotes) to indicate that it should look first in the current directory. Can your program have more than one .c file? A ‘.c’ file is called a “module”. Many programs are composed of several ‘.c’ files and libraries that are ‘linked’ together during the compile process.
  • Why preprocess? Having two phases to a process with different properties often helps to make a very flexible yet robust system. The underlying compiler is very simple and its behavior is easily understood. But because of that simplicity it can be hard to use. The preprocessor lets you “customize” the way your code “looks and feels” and avoid redunancy using a few simple facilities, without increasing the complexity of the underlying compiler. Macros can be used to make your code look clean and easy to understand. They can also be used for “evil”. What are continued lines? Continued lines are lines that need to be very long. When the preprocessor encounters a ‘\\’ (that is not inside a string), it will ignore the next character. If that character is a newline, then this “eats” the newline and joins the two lines into one. This is often used when defining macros with #define, because a macro must be all on one line. Be careful! If you have a space after a ‘\\’, it will eat the space and will NOT join the lines.
  • Include directives use (brackets) to indicate that the compiler should look in a “standard” place, such as /usr/include/… They use “x.h” (double quotes) to indicate that it should look first in the current directory.
  • What’s 72? ASCII is the coding scheme that maps bytes to characters. Type ‘man ascii’ at the shell to get a listing of the mapping. Not always… Types such as int vary in size depending on the architecture. On a 32 bit or 64 bit platform, ints are 4 bytes (32 bits). On an 8 or 16 bit platform, an int is 2 bytes (16 bits). To be safe, it’s best to specify types exactly e.g. int32_t, int16_t., int8_t. Defined in #include Signed? Signedness is a common source of problems in C. It’s always clearest to specify signness explicitly (and size) using the [u]int[8,16,32]_t types. Signedness can introduce surprises when casting types because of “sign extension”. For example, char is signed (although it’s normally used as unsigned 0-255). If you cast char to int, it will be a signed integer and the sign will be extended. If you then cast that to unsigned, it turns a small negative number into an enormous int value (high bit set). That is, (uint32_t)(int)(char)(128) == 4294967168 (not 128!) One easy solution to this is to always use uint8_t rather than char for byte streams.
  • Symbol table: The “Symbol Table” is the mapping from symbol to address. You can extract this from a compiled program (if debugging is enabled aqt compile time with –g) using the nm utility. Declaration and definition mean slightly different things in C. Declaration is mapping a name to a type, but not to a memory location. Definition maps a name and type to a memory location (which could be the body of a function). Definitions can assign an initial value. Declarations can point to something that never actually gets defined, or is defined in a library external to your program. What names are legal? Symbols in C (function and variable names) must start with an alphabetic character or ‘_’. They may contain numeric digits as well but cannot start with a digit. Common naming strategies are ‘CamelCase’, ‘lowerUpper’, and ‘under_bar’. Some people use complicated schemes but I prefer under_bar. extern, static, and const are modifiers of variable declarations. ‘extern’ means the variable being declared is defined elsewhere in some other program module (.c file). ‘static’ means that the variable cannot be accessed outside of the current scope. ‘const’ means the variable’s value cannot be changed (directly).
  • Returns nothing.. “void” is used to denote that nothing is returned, or to indicate untyped data. What if you define a new “char b”? What happens to the old one? If you defile a variable in your local scope that has the same name as a variable in an enclosing scope, the local version takes precedence. Are definitions allowed in the middle of a block? Older versions of C and some derivatives like NesC only allow definitions at the beginning of a scope; d would be illegal. Modern C (gcc 3.0+) allows this, which can make the code easier to read as the definitions are closer to use. However, note that definitions are not always allowed in certain places, such as after a jump target, e.g. { goto target; /* … */ target: int z; /* error */ /* … */ }
  • Despite being syntactically correct, if (x=y) is generally to be avoided because it is confusing. Doing complicated things with ++ and – are also not generally recommended. It is almost never important from an efficiency perspective and it just makes the code harder to understand.
  • Need braces? Braces are only needed if your block contains a single statement. However, braces, even when not necessary, can increase the clarity of your code, especially when nesting ‘if’ statements. X ? Y : Z.. There is a shortcut to if then else that can be done in an expression: the ‘?:’ syntax. (cond ? alt1 : alt2) is an expression returning alt1 if cond, and alt2 if not cond. There is no truly equivalent form to this construct. Short circuit evaluation. Sometimes when evaluating an if conditional, the result is known before the evaluation completes. For example, consider if (X || Y || Z). If X is true, then Y and Z don’t matter. C will not evaluate Y and Z if X is true. This only matters in cases where there are side effects of evaluating the rest of the expression, for example if it contains function calls, or invalid operations. The converse evaluation rules apply to &&; if (X && Y && Z) will only evaluate Y and Z if X is true. Note, this is why it’s OK to write: char *x; if ((x != NULL) && (*x == ‘Z’)) … because if x is NULL (and therefore an invlalid pointer) the second clause dereferencing x will not be run. Note that this means that it’s not always possible to replace an if() with a function call because all args to a function are evaluated before calling the function. Detecting brace errors . Many text editors such as emacs include features that help you detect brace errors. Emacs will remind you of the opening brace whenever you close a brace, and will auto-indent your code according to braces – wrong indenting gives you a hint that something is wrong. In emacs, will autoindent the line you are on.. Keeping your code properly indented is a good way to detect a lot of different kinds of error.
  • Variables declared ‘static’ inside of a function are allocated only once, rather than being allocated in the stack frame when the function is called. Static variables therefore retain their value from one call to the next, but may cause problems for recursive calls, or if two threads call the function concurrently. Java users: Just to be aware: recursion doesn’t always work in Java because objects are always passed by reference (i.e they are not copied). In java, this recursive function would only work if it was a primitive type like int being passed.
  • What about other languages? Some languages such as scheme implement optimized “tail recursion”, which reuses the current stack frame when the recursive call constitutes the last use of the current stack frame. In these cases, recursion costs no more than iteration.
  • Java In java, primitive types like int are passed by value, but all objects are passed by reference (basically by pointer, but without any explicit awareness of the address). In C++, things operate like in C except that there is a syntax to pass arguments by reference, declaring the argument to be a “reference” type. For example, int f(int &x); C does not support this; you have to do it “manually” with pointers. But it’s not really fundamentally different anyway.
  • How should pointers be initialized? Always initialize pointers to 0 (or NULL). NULL is never a valid pointer value, but it is known to be invalid and means “no pointer set”.
  • Packing? The layout of a structure in memory is actually architecture-dependent. The order of the fields will always be preserved; however, some architectures will pad fields so that they are word-aligned. Word alignment is important for some processor architectures because they do not support unaligned memory access, e.g. accessing a 4 byte int that is not aligned to a 4 byte boundary. It is also usually more efficient to process values on word boundaries. So in general, structs are architecture-specific. For instance, on x86 platforms, all fields are word-aligned, so struct { uint16_t x; uint32_t y; }; Will have a hidden two bytes of padding between x and y. However, on an atmel atmega128 this padding will not be present. One way to address this is to predict it, and explicity add the padding fields. Another solution is to add __attribute__ ((“packed”)) after the structure declaration to force the compiler to generate code to handle unaligned accesses. This code will operate slower but this is often the easiest solution to dealing with cross-platform compatibility. The other common problem with cross-platform compatibility is endianness. X86 machines are little-endian; many modern processors such as the PXA (xscale) can select their endianness. The functions ntohl, htonl, etc. can be used to convert to a standard network byte order (which is big endian). =========== Why a zero length array? A zero length array is a typed “pointer target” that begins at the byte after the end of the struct. The zero length array takes up no space and does not change the result of sizeof(). This is very useful for referring to data following the struct in memory, for example if you have a packet header and a packet payload following the header: struct hdr { int src; int dst; uint8_t data[0]; }; If you have a buffer of data containing the packet and the payload: uint8_t buf[200]; int buf_len; /* cast the buffer to the header type (if it’s long enough!) */ if (buf_len >= sizeof(struct hdr)) { struct hdr *h = (struct hdr *)buf; /* now, h->data[0] is the first byte of the payload */ }
  • char x[] and char *x are subtlely different. First, what’s the same: both will evaluate to an address that is type char *. In both cases, x[n] == *(x+n) and x == &(x[0]). The difference lies in the fact that char x[10] allocates memory for that data and both &x and x always evaulates to the base address of the array, whereas char *x allocates only space for a pointer, and x simply evaulates to the value of that pointer. Thus, in the case of char x[10], &x == x by definition. However this is almost never true for char *x because x contains the address of the memory where the characters are stored, &(x[0]), and &x is the address of that pointer itself!
  • %m: When %m is included in a printf format string, it prints a string naming the most recent error. A global variable ‘errno’ always contains the most recent error value to have occurred, and the function strerr() converts that to a string. %m is shorthand for printf(“%s”, strerr(errno)) Emstar tips : Emstar includes glib which includes various useful functions and macros. One of these is g_new0(). g_new0(typename, count)  (typename *)calloc(count, sizeof(typename)); Emstar also provides elog() and elog_raw(), logging functions that integrate with emstar’s logging facilities. elog() works like printf, but takes one additional loglevel argument. You do not need to include a ‘\\n’ in elog() messages as it is implied. elog_raw() takes a buffer and a length and dumps the bytes out in a prettyprinted fashion. Emstar includes the useful “buf_t” library that manages automatically growing buffers. This is often the easiest way to allocate memory.
  • Reference counting is one approach to tracking allocated memory. However, there is no perfect scheme. The main problem with reference counting is that correctly handling cyclically linked structures is very difficult, and likely to result in mis-counting in one direction or the other. Garbage collection is another solution; this is used by Java and other languages. However, this often causes performance problems and requires a very strict control over typing in the language, which C can’t easily provide by its nature of having direct access to the low levels. Emstar uses a static reference model, where there is exactly one canonical reference to each object, and that reference will be automatically cleared when the object is destroyed. The only requirements are: the reference must be in stable allocation (static or dynamic, but you can’t move the memory) and you must always access the object via that single reference. We have found this scheme to be quite workable, as long as you are cognizant of these requirements.
  • What to do when malloc fails? Except in certain cases, it’s going to be very difficult to recover from this state because lots of functions will start failing and the correct way to handle these errors is not obvious. If you continue without your memory, it is likely that your program will get into some very seriously broken state. The main case where recovery IS possible is the case that the malloc() that failed is part of a single too-large transaction that can fail independently of the rest of the system. For example if the user asks your program to allocate a zillion bytes and you can’t, just abort that request and ask for the next request. But if your system is just generally out of memory, there’s not much you can do. In emstar, processes automatically respawn and the system is generally designed to handle module failures, so a slow memory leak that causes you to exit and restart may allow a fairly graceful recovery. Memmove memmove() is written so that it will copy in the correct order to be able to shift data in a buffer, that is, if the source and destination buffers overlap. This is not true of the memcpy() function. Use pointers as implied in-use flags! One clever way to reduce the number of state variables is to avoid the use of “in use” flags when there is also a pointer that can be NULL. That is, if the item is not allocated (and therefore a NULL pointer), it’s not in use. If there’s a pointer there, then you assume that it is valid, initialized, etc. By avoiding this redundancy of state varaibles, you avoid the various cases where the two variables are out of sync.. Pointer there but not in use.. Or in use but no pointer.
  • The difference between a macro and a static inline . These two constructs have some properties in common: they must be included in any module that uses them, they can’t be linked to, and they are generally inlined into the program. The key difference between them is that the macro is processed by the preprocessor whereas the static inline conforms to all of the semantics of any other function. The reason it must be included in each file is that it’s declared static, and the reason it’s more efficient is that it is inlined by default. Static inlines cause their arguments to be evaluated before being applied whereas macros are a pure text substitution with no evaluation. Arguments to static inlines are type-checked whereas arguments to macros can be any type and are substituted as raw text. Static inline invocations can invoke statements AND return a value, whereas a macro can EITHER return a value (if it’s an expression) OR invoke statements. A good example of the differences is the implementation of sqr(): #define SQR(a) ((a)*(a)) static inline float sqr(float a) { return a*a; } SQR will evaulate its argument twice, i.e. SQR(x++) will compute x*(x+1) sqr will evaliate its argument once, i.e. sqr(x++) will compute x*x SQR can compute the square of any numberic quantity, whereas sqr can only handle floats. However, sqr will return a more helpful error message if you attempt to square a struct. Since macros can do text manipulation, one nice way to use them is to use macros to generate static inlines. More on C constants . Check K&R for the details on the modifiers that allow you to correctly type constants. For example, 45L is a long int. Watch out for preceding 0’s, this means it will interpret it as octal. These issues most often crop up when you are trying to specify a constant that is larger than the default int type. Enums: For defining sets of integer values, enums are preferable. Enums enforce uniqueness within an enum and can automatically number your values sequentially. But macros are preferred for constants that define tuning parameters (that don’t want uniqueness or sequential values) and other types like floats and strings. Why expressions in parens : If you leave off the parens then your expression may introduce precedence surprises when combined in other expressions. For example: #define VALUE 2+45 int c = VALUE*3; What was meant was (2+45)*3 .. But what we got was 2+(45*3) It’s also a good idea to put parens around args of a macro, to avoid similar problems if one of the args is an expression: #define SQR(x) ((x)*(x)) Why use do{}while(0)? Multi-statement macros should be enclosed in do{}while(0) to avoid surprises when the macro is called as one statement in an if (): if (fail) DBG(“help\\n”); would cause only the first statement to be conditional.
  • Why return -1? Return values in C are usually 0 or positive for success and negative to signal failure. The errno codes (man errno) list some meanings that can be used for error return codes. When a pointer is returned, NULL signifies a failure. Errno can be set to indicate a reason for the failure (by just assigning to errno).
  • The recursive solution uses more stack space, but only a limited amount (max. 32 recursions), is easier to read and understand. Readability is valuable and not worth sacrificing to optimality unless there is a good reason to optimize. So I vote for recursion in this case.

C Tutorials C Tutorials Presentation Transcript

  • A Quick Introduction to C Programming
  • or , What I wish I had known about C during my first summer internship With extra info in the NOTES
  • High Level Question: Why is Software Hard?
    • Answer(s):
    • Complexity : Every conditional (“if”) doubles number of paths through your code, every bit of state doubles possible states
      • Solution: reuse code with functions, avoid duplicate state variables
    • Mutability : Software is easy to change.. Great for rapid fixes  .. And rapid breakage  .. always one character away from a bug
      • Solution: tidy, readable code, easy to understand by inspection.
      • Avoid code duplication; physically the same  logically the same
    • Flexibility : Programming problems can be solved in many different ways. Few hard constraints  plenty of “rope”.
      • Solution: discipline and idioms; don’t use all the rope
  • Writing and Running Programs #include <stdio.h> /* The simplest C Program */ int main(int argc, char **argv) { printf(“Hello Worldn”); return 0; } 1. Write text of program (source code) using an editor such as emacs, save as file e.g. my_program.c 2. Run the compiler to convert program from source to an “executable” or “binary”: $ gcc –Wall –g my_program.c –o my_program my_program $ gcc -Wall –g my_program.c –o my_program tt.c: In function `main': tt.c:6: parse error before `x' tt.c:5: parm types given both in parmlist and separately tt.c:8: `x' undeclared (first use in this function) tt.c:8: (Each undeclared identifier is reported only once tt.c:8: for each function it appears in.) tt.c:10: warning: control reaches end of non-void function tt.c: At top level: tt.c:11: parse error before `return' 3-N. Compiler gives errors and warnings; edit source file, fix it, and re-compile N. Run it and see if it works  $ ./my_program Hello World $ ▌ -Wall –g ? . / ? What if it doesn’t work?
  • C Syntax and Hello World #include <stdio.h> /* The simplest C Program */ int main(int argc, char **argv) { printf(“Hello Worldn”); return 0; } The main() function is always where your program starts running. #include inserts another file. “.h” files are called “header” files. They contain stuff needed to interface to libraries and code in other “.c” files. This is a comment. The compiler ignores this. Blocks of code (“lexical scopes”) are marked by { … } Print out a message. ‘n’ means “new line”. Return ‘0’ from this function What do the < > mean? Can your program have more than one .c file?
  • A Quick Digression About the Compiler #include <stdio.h> /* The simplest C Program */ int main(int argc, char **argv) { printf(“Hello Worldn”); return 0; } my_program __extension__ typedef unsigned long long int __dev_t; __extension__ typedef unsigned int __uid_t; __extension__ typedef unsigned int __gid_t; __extension__ typedef unsigned long int __ino_t; __extension__ typedef unsigned long long int __ino64_t; __extension__ typedef unsigned int __nlink_t; __extension__ typedef long int __off_t; __extension__ typedef long long int __off64_t; extern void flockfile (FILE *__stream) ; extern int ftrylockfile (FILE *__stream) ; extern void funlockfile (FILE *__stream) ; int main(int argc, char **argv) { printf(“Hello Worldn”); return 0; } Compilation occurs in two steps: “ Preprocessing” and “Compiling”
    • In Preprocessing, source code is “expanded” into a larger form that is simpler for the compiler to understand. Any line that starts with ‘#’ is a line that is interpreted by the Preprocessor.
    • Include files are “pasted in” (#include)
    • Macros are “expanded” (#define)
    • Comments are stripped out ( /* */ , // )
    • Continued lines are joined ( )
    Preprocess Compile The compiler then converts the resulting text into binary code the CPU can run directly. ? Why ?
  • OK, We’re Back.. What is a Function? #include <stdio.h> /* The simplest C Program */ int main(int argc, char **argv) { printf(“Hello Worldn”); return 0; } Function Arguments Return type, or void Calling a Function: “printf()” is just another function, like main(). It’s defined for you in a “library”, a collection of functions you can call from your program. A Function is a series of instructions to run. You pass Arguments to a function and it returns a Value . “ main()” is a Function. It’s only special because it always gets called first when you run your program. Returning a value
  • What is “Memory”? Memory is like a big table of numbered slots where bytes can be stored. The number of a slot is its Address. One byte Value can be stored in each slot. Some “logical” data values span more than one slot, like the character string “Hellon” 72? A Type names a logical meaning to a span of memory. Some simple types are: char char [10] int float int64_t a single character (1 slot) an array of 10 characters signed 4 byte integer 4 byte floating point signed 8 byte integer not always… Signed?… Addr Value 0 1 2 3 4 ‘ H’ (72) 5 ‘ e’ (101) 6 ‘ l’ (108) 7 ‘ l’ (108) 8 ‘ o’ (111) 9 ‘ n’ (10) 10 ‘ 0’ (0) 11 12
  • What is a Variable? char x; char y=‘e’; A Variable names a place in memory where you store a Value of a certain Type . You first Define a variable by giving it a name and specifying the type, and optionally an initial value declare vs define? The compiler puts them somewhere in memory. symbol table? Symbol Addr Value 0 1 2 3 x 4 ? y 5 ‘ e’ (101) 6 7 8 9 10 11 12 Type is single character (char) extern? static? const? Name What names are legal? Initial value Initial value of x is undefined
  • Multi-byte Variables char x; char y=‘e’; int z = 0x01020304; Different types consume different amounts of memory. Most architectures store data on “word boundaries”, or even multiples of the size of a primitive data type (int, char) 0x means the constant is written in hex An int consumes 4 bytes padding Symbol Addr Value 0 1 2 3 x 4 ? y 5 ‘ e’ (101) 6 7 z 8 4 9 3 10 2 11 1 12
  • Lexical Scoping Every Variable is Defined within some scope. A Variable cannot be referenced by name (a.k.a. Symbol ) from outside of that scope. The scope of Function Arguments is the complete body of the function. void p(char x) { /* p,x */ char y; /* p,x,y */ char z; /* p,x,y,z */ } /* p */ char z; /* p,z */ void q(char a) { char b; /* p,z,q,a,b */ { char c; /* p,z,q,a,b,c */ } char d; /* p,z,q,a,b,d (not c) */ } /* p,z,q */ (Returns nothing) The scope of Variables defined inside a function starts at the definition and ends at the closing brace of the containing block Lexical scopes are defined with curly braces { }. The scope of Variables defined outside a function starts at the definition and ends at the end of the file. Called “ Global ” Vars. legal? char b?
  • Expressions and Evaluation Expressions combine Values using Operators, according to precedence. 1 + 2 * 2  1 + 4  5 (1 + 2) * 2  3 * 2  6 Symbols are evaluated to their Values before being combined. int x=1; int y=2; x + y * y  x + 2 * 2  x + 4  1 + 4  5 Comparison operators are used to compare values. In C, 0 means “false”, and any other value means “true”. int x=4; (x < 5)  (4 < 5)  <true> (x < 4)  (4 < 4)  0 ((x < 5) || (x < 4))  (<true> || (x < 4))  <true> Not evaluated because first clause was true
  • Comparison and Mathematical Operators == equal to < less than <= less than or equal > greater than >= greater than or equal != not equal && logical and || logical or ! logical not
    • + plus
    • minus
    • * mult
    • / divide
    • % modulo
    The rules of precedence are clearly defined but often difficult to remember or non-intuitive. When in doubt, add parentheses to make it explicit. For oft-confused cases, the compiler will give you a warning “Suggest parens around …” – do it!
    • Beware division:
    • If second argument is integer, the
    • result will be integer (rounded):
    • 5 / 10  0 whereas 5 / 10.0  0.5
    • Division by 0 will cause a FPE
    & bitwise and | bitwise or ^ bitwise xor ~ bitwise not << shift left >> shift right Don’t confuse & and &&.. 1 & 2  0 whereas 1 && 2  <true>
  • Assignment Operators x = y assign y to x x++ post-increment x ++x pre-increment x x-- post-decrement x --x pre-decrement x Note the difference between ++x and x++: Don’t confuse = and ==! The compiler will warn “suggest parens”. int x=5; int y; y = ++x; /* x == 6, y == 6 */ int x=5; int y; y = x++; /* x == 6, y == 5 */ int x=5; if (x=6) /* always true */ { /* x is now 6 */ } /* ... */ int x=5; if (x==6) /* false */ { /* ... */ } /* x is still 5 */ x += y assign (x+y) to x x -= y assign (x-y) to x x *= y assign (x*y) to x x /= y assign (x/y) to x x %= y assign (x%y) to x recommendation
  • A More Complex Program: pow #include <stdio.h> #include <inttypes.h> float pow(float x, uint32_t exp) { /* base case */ if (exp == 0) { return 1.0; } /* “recursive” case */ return x*pow(x, exp – 1); } int main(int argc, char **argv) { float p; p = pow(10.0, 5); printf(“p = %fn”, p); return 0; } Challenge: write pow() so it requires log(exp) iterations
    • Tracing “pow()”:
    • What does pow(5,0) do?
    • What about pow(5,1)?
    • “ Induction”
    “ if” statement /* if evaluated expression is not 0 */ if ( expression ) { /* then execute this block */ } else { /* otherwise execute this block */ } Need braces? detecting brace errors Short-circuit eval? X ? Y : Z
  • The “Stack” Recall lexical scoping. If a variable is valid “within the scope of a function”, what happens when you call that function recursively? Is there more than one “exp”? #include <stdio.h> #include <inttypes.h> float pow(float x, uint32_t exp) { /* base case */ if (exp == 0) { return 1.0; } /* “recursive” case */ return x*pow(x, exp – 1); } int main(int argc, char **argv) { float p; p = pow(5.0, 1); printf(“p = %fn”, p); return 0; } Yes. Each function call allocates a “stack frame” where Variables within that function’s scope will reside. Return 1.0 Return 5.0 static Java? float x 5.0 uint32_t exp 1 float x 5.0 uint32_t exp 0 int argc 1 char **argv 0x2342 float p undefined int argc 1 char **argv 0x2342 float p 5.0 Grows
  • Iterative pow(): the “while” loop Problem: “recursion” eats stack space (in C). Each loop must allocate space for arguments and local variables, because each new call creates a new “scope”. float pow(float x, uint exp) { int i=0; float result=1.0; while (i < exp) { result = result * x; i++; } return result; } int main(int argc, char **argv) { float p; p = pow(10.0, 5); printf(“p = %fn”, p); return 0; } Other languages? Solution: “while” loop. loop: if ( condition ) { statements ; goto loop; } while ( condition ) { statements ; }
  • The “for” loop float pow(float x, uint exp) { float result=1.0; int i; for (i=0; (i < exp); i++) { result = result * x; } return result; } int main(int argc, char **argv) { float p; p = pow(10.0, 5); printf(“p = %fn”, p); return 0; } float pow(float x, uint exp) { float result=1.0; int i; i=0; while (i < exp) { result = result * x; i++; } return result; } int main(int argc, char **argv) { float p; p = pow(10.0, 5); printf(“p = %fn”, p); return 0; } The “for” loop is just shorthand for this “while” loop structure.
  • Referencing Data from Other Scopes So far, all of our examples all of the data values we have used have been defined in our lexical scope float pow(float x, uint exp) { float result=1.0; int i; for (i=0; (i < exp); i++) { result = result * x; } return result; } int main(int argc, char **argv) { float p; p = pow(10.0, 5); printf(“p = %fn”, p); return 0; } Nothing in this scope Uses any of these variables
  • Can a function modify its arguments? What if we wanted to implement a function pow_assign() that modified its argument, so that these are equivalent: float p = 2.0; /* p is 2.0 here */ pow_assign(p, 5); /* p is 32.0 here */ float p = 2.0; /* p is 2.0 here */ p = pow(p, 5); /* p is 32.0 here */ void pow_assign(float x, uint exp) { float result=1.0; int i; for (i=0; (i < exp); i++) { result = result * x; } x = result; } Would this work?
  • NO! In C, all arguments are passed as values But, what if the argument is the address of a variable? Java/C++? void pow_assign(float x, uint exp) { float result=1.0; int i; for (i=0; (i < exp); i++) { result = result * x; } x = result; } { float p=2.0; pow_assign(p, 5); } Remember the stack! float x 2.0 uint32_t exp 5 float result 1.0 float p 2.0 Grows float x 2.0 uint32_t exp 5 float result 32.0 float x 32.0 uint32_t exp 5 float result 32.0
  • Passing Addresses Recall our model for variables stored in memory What if we had a way to find out the address of a symbol, and a way to reference that memory location by address? address_of(y) == 5 memory_at[5] == 101 void f(address_of_char p) { memory_at[p] = memory_at[p] - 32; } char y = 101; /* y is 101 */ f(address_of(y)); /* i.e. f(5) */ /* y is now 101-32 = 69 */ Symbol Addr Value 0 1 2 3 char x 4 ‘ H’ (72) char y 5 ‘ e’ (101) 6 7 8 9 10 11 12
  • “ Pointers”
    • Pointers are used in C for many other purposes:
    • Passing large objects without copying them
    • Accessing dynamically allocated memory
    • Referring to functions
    This is exactly how “pointers” work. “ address of” or reference operator: & “ memory_at” or dereference operator: * void f(char * p) { *p = *p - 32; } char y = 101; /* y is 101 */ f(&y); /* i.e. f(5) */ /* y is now 101-32 = 69 */ void f(address_of_char p) { memory_at[p] = memory_at[p] - 32; } char y = 101; /* y is 101 */ f(address_of(y)); /* i.e. f(5) */ /* y is now 101-32 = 69 */ A “pointer type”: pointer to char
  • Pointer Validity A Valid pointer is one that points to memory that your program controls. Using invalid pointers will cause non-deterministic behavior, and will often cause Linux to kill your process (SEGV or Segmentation Fault).
    • There are two general causes for these errors:
    • Program errors that set the pointer value to a strange number
    • Use of a pointer that was at one time valid, but later became invalid
    How should pointers be initialized? char * get_pointer() { char x=0; return &x; } { char * ptr = get_pointer(); *ptr = 12; /* valid? */ } Will ptr be valid or invalid?
  • Answer: Invalid! A pointer to a variable allocated on the stack becomes invalid when that variable goes out of scope and the stack frame is “popped”. The pointer will point to an area of the memory that may later get reused and rewritten. char * get_pointer() { char x=0; return &x; } { char * ptr = get_pointer(); *ptr = 12; /* valid? */ other_function(); } But now, ptr points to a location that’s no longer in use, and will be reused the next time a function is called! Return 101 100 char * ptr ? Grows 101 char x 0 100 char * ptr 101 101 char x 0 101 char x 12 101 int average 456603
  • More on Types We’ve seen a few types at this point: char, int, float, char *
    • Types are important because:
    • They allow your program to impose logical structure on memory
    • They help the compiler tell when you’re making a mistake
    • In the next slides we will discuss:
    • How to create logical layouts of different types (structs)
    • How to use arrays
    • How to parse C type names (there is a logic to it!)
    • How to create new types using typedef
  • Structures struct: a way to compose existing types into a structure #include <sys/time.h> /* declare the struct */ struct my_struct { int counter; float average; struct timeval timestamp; uint in_use:1; uint8_t data[0]; }; /* define an instance of my_struct */ struct my_struct x = { in_use: 1, timestamp: { tv_sec: 200 } }; x.counter = 1; x.average = sum / (float)(x.counter); struct my_struct * ptr = &x; ptr->counter = 2; (*ptr).counter = 3; /* equiv. */ struct timeval is defined in this header structs can contain other structs fields can specify specific bit widths A newly-defined structure is initialized using this syntax. All unset fields are 0. structs define a layout of typed fields Fields are accessed using ‘.’ notation. A pointer to a struct. Fields are accessed using ‘->’ notation, or (*ptr).counter Packing? Why?
  • Arrays Arrays in C are composed of a particular type, laid out in memory in a repeating pattern. Array elements are accessed by stepping forward in memory from the base of the array by a multiple of the element size. /* define an array of 10 chars */ char x[5] = {‘t’,’e’,’s’,’t’,’0’}; /* accessing element 0 */ x[0] = ‘T’; /* pointer arithmetic to get elt 3 */ char elt3 = *(x+3); /* x[3] */ /* x[0] evaluates to the first element; * x evaluates to the address of the * first element, or &(x[0]) */ /* 0-indexed for loop idiom */ #define COUNT 10 char y[COUNT]; int i; for (i=0; i<COUNT; i++) { /* process y[i] */ printf(“%cn”, y[i]); } Brackets specify the count of elements. Initial values optionally set in braces. Arrays in C are 0-indexed (here, 0..9) x[3] == *(x+3) == ‘t’ ( NOT ‘s’!) What’s the difference between char x[] and char *x? For loop that iterates from 0 to COUNT-1. Memorize it! Symbol Addr Value char x [0] 100 ‘ t’ char x [1] 101 ‘ e’ char x [2] 102 ‘ s’ char x [3] 103 ‘ t’ char x [4] 104 ‘ 0’
  • How to Parse and Define C Types At this point we have seen a few basic types, arrays, pointer types, and structures. So far we’ve glossed over how types are named. int x; /* int; */ typedef int T; int *x; /* pointer to int; */ typedef int *T; int x[10]; /* array of ints; */ typedef int T[10]; int *x[10]; /* array of pointers to int; */ typedef int *T[10]; int (*x)[10]; /* pointer to array of ints; */ typedef int (*T)[10]; C type names are parsed by starting at the type name and working outwards according to the rules of precedence: Arrays are the primary source of confusion. When in doubt, use extra parens to clarify the expression. typedef defines a new type int (*x)[10]; x is a pointer to an array of int int *x[10]; x is an array of pointers to int
  • Function Types The other confusing form is the function type. For example, qsort: (a sort function in the standard library) void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *)); For more details: $ man qsort /* function matching this type: */ int cmp_function(const void *x, const void *y); /* typedef defining this type: */ typedef int (*cmp_type) (const void *, const void *); /* rewrite qsort prototype using our typedef */ void qsort(void *base, size_t nmemb, size_t size, cmp_type compar); The last argument is a comparison function const means the function is not allowed to modify memory via this pointer. void * is a pointer to memory of unknown type. size_t is an unsigned int
  • Dynamic Memory Allocation So far all of our examples have allocated variables statically by defining them in our program. This allocates them in the stack. But, what if we want to allocate variables based on user input or other dynamic inputs, at run-time? This requires dynamic allocation. int * alloc_ints(size_t requested_count) { int * big_array; big_array = (int *)calloc(requested_count, sizeof(int)); if (big_array == NULL) { printf(“can’t allocate %d ints: %mn”, requested_count); return NULL; } /* now big_array[0] .. big_array[requested_count-1] are * valid and zeroed. */ return big_array; } calloc() allocates memory for N elements of size k Returns NULL if can’t alloc For details: $ man calloc %m ? Emstar tips sizeof() reports the size of a type in bytes It’s OK to return this pointer. It will remain valid until it is freed with free()
  • Caveats with Dynamic Memory Dynamic memory is useful. But it has several caveats: Whereas the compiler enforces that reclaimed stack space can no longer be reached, it is easy to accidentally keep a pointer to dynamic memory that has been freed. Whenever you free memory you must be certain that you will not try to use it again. It is safest to erase any pointers to it. Whereas the stack is automatically reclaimed, dynamic allocations must be tracked and free()’d when they are no longer needed. With every allocation, be sure to plan how that memory will get freed. Losing track of memory is called a “memory leak”. Reference counting Because dynamic memory always uses pointers, there is generally no way for the compiler to statically verify usage of dynamic memory. This means that errors that are detectable with static allocation are not with dynamic
  • Some Common Errors and Hints /* allocating a struct with malloc() */ struct my_struct *s = NULL; s = (struct my_struct *)malloc(sizeof(*s)); /* NOT sizeof(s)!! */ if (s == NULL) { printf(stderr, “no memory!”); exit(1); } memset(s, 0, sizeof(*s)); /* another way to initialize an alloc’d structure: */ struct my_struct init = { counter: 1, average: 2.5, in_use: 1 }; /* memmove(dst, src, size) (note, arg order like assignment) */ memmove(s, &init, sizeof(init)); /* when you are done with it, free it! */ free(s); s = NULL; sizeof() can take a variable reference in place of a type name. This gurantees the right allocation, but don’t accidentally allocate the sizeof() the pointer instead of the object ! malloc() does not zero the memory, so you should memset() it to 0. Always check for NULL.. Even if you just exit(1). malloc() allocates n bytes Why? memmove is preferred because it is safe for shifting buffers Why? Use pointers as implied in-use flags!
  • Macros Macros can be a useful way to customize your interface to C and make your code easier to read and less redundant. However, when possible, use a static inline function instead. Macros and static inline functions must be included in any file that uses them, usually via a header file. Common uses for macros: What’s the difference between a macro and a static inline function? /* Macros are used to define constants */ #define FUDGE_FACTOR 45.6 #define MSEC_PER_SEC 1000 #define INPUT_FILENAME “my_input_file” /* Macros are used to do constant arithmetic */ #define TIMER_VAL (2*MSEC_PER_SEC) /* Macros are used to capture information from the compiler */ #define DBG(args...) do { fprintf(stderr, “%s:%s:%d: “, __FUNCTION__, __FILE__, __LINENO__); fprintf(stderr, args...); } while (0) /* ex. DBG(“error: %d”, errno); */ Float constants must have a decimal point, else they are type int Put expressions in parens. Why? Multi-line macros need args… grabs rest of args Enclose multi-statement macros in do{}while(0) Why? More on C constants? enums
  • Macros and Readability Sometimes macros can be used to improve code readability… but make sure what’s going on is obvious. /* often best to define these types of macro right where they are used */ #define CASE(str) if (strncasecmp(arg, str, strlen(str)) == 0) void parse_command(char *arg) { CASE(“help”) { /* print help */ } CASE(“quit”) { exit(0); } } /* and un-define them after use */ #undef CASE Macros can be used to generate static inline functions. This is like a C version of a C++ template. See emstar/libmisc/include/queue.h for an example of this technique. void parse_command(char *arg) { if (strncasecmp(arg, “help”, strlen(“help”)) { /* print help */ } if (strncasecmp(arg, “quit”, strlen(“quit”)) { exit(0); } }
  • Using “goto” Some schools of thought frown upon goto, but goto has its place. A good philosophy is, always write code in the most expressive and clear way possible. If that involves using goto, then goto is not bad. An example is jumping to an error case from inside complex logic. The alternative is deeply nested and confusing “if” statements, which are hard to read, maintain, and verify. Often additional logic and state variables must be added, just to avoid goto. goto try_again; goto fail;
  • Unrolling a Failed Initialization using goto state_t *initialize() { /* allocate state struct */ state_t *s = g_new0(state_t, 1); if (s) { /* allocate sub-structure */ s->sub = g_new0(sub_t, 1); if (s->sub) { /* open file */ s->sub->fd = open(“/dev/null”, O_RDONLY); if (s->sub->fd >= 0) { /* success! */ } else { free(s->sub); free(s); s = NULL; } } else { /* failed! */ free(s); s = NULL; } } return s; } state_t *initialize() { /* allocate state struct */ state_t *s = g_new0(state_t, 1); if (s == NULL) goto free0; /* allocate sub-structure */ s->sub = g_new0(sub_t, 1); if (s->sub == NULL) goto free1; /* open file */ s->sub->fd = open(“/dev/null”, O_RDONLY); if (s->sub->fd < 0) goto free2; /* success! */ return s; free2: free(s->sub); free1: free(s); free0: return NULL; }
  • High Level Question: Why is Software Hard?
    • Answer(s):
    • Complexity : Every conditional (“if”) doubles number of paths through your code, every bit of state doubles possible states
      • Solution: reuse code paths, avoid duplicate state variables
    • Mutability : Software is easy to change.. Great for rapid fixes  .. And rapid breakage  .. always one character away from a bug
      • Solution: tidy, readable code, easy to understand by inspection.
      • Avoid code duplication; physically the same  logically the same
    • Flexibility : Programming problems can be solved in many different ways. Few hard constraints  plenty of “rope”.
      • Solution: discipline and idioms; don’t use all the rope
  • Addressing Complexity
    • Complexity : Every conditional (“if”) doubles number of paths through your code, every bit of state doubles possible states
      • Solution: reuse code paths, avoid duplicate state variables
    On receive_packet: if queue full, drop packet else push packet, call run_queue On transmit_complete: state=idle, call run_queue Run_queue: if state==idle && !queue empty pop packet off queue start transmit, state = busy reuse code paths On input, change our state as needed, and call Run_queue. In all cases, Run_queue handles taking the next step…
  • Addressing Complexity
    • Complexity : Every conditional (“if”) doubles number of paths through your code, every bit of state doubles possible states
      • Solution: reuse code paths, avoid duplicate state variables
    avoid duplicate state variables int transmit_busy; msg_t *packet_on_deck; int start_transmit(msg_t *packet) { if (transmit_busy) return -1; /* start transmit */ packet_on_deck = packet; transmit_busy = 1; /* ... */ return 0; } msg_t *packet_on_deck; int start_transmit(msg_t *packet) { if (packet_on_deck != NULL) return -1; /* start transmit */ packet_on_deck = packet; /* ... */ return 0; } Why return -1?
  • Addressing Mutability
    • Mutability : Software is easy to change.. Great for rapid fixes  .. And rapid breakage  .. always one character away from a bug
      • Solution: tidy, readable code, easy to understand by inspection.
      • Avoid code duplication; physically the same  logically the same
    Tidy code.. Indenting, good formatting, comments, meaningful variable and function names. Version control.. Learn how to use CVS Avoid duplication of anything that’s logically identical. struct pkt_hdr { int source; int dest; int length; }; struct pkt { int source; int dest; int length; uint8_t payload[100]; }; struct pkt_hdr { int source; int dest; int length; }; struct pkt { struct pkt_hdr hdr; uint8_t payload[100]; }; Otherwise when one changes, you have to find and fix all the other places
  • Solutions to the pow() challenge question Which is better? Why? Recursive float pow(float x, uint exp) { float result; /* base case */ if (exp == 0) return 1.0; /* x^(2*a) == x^a * x^a */ result = pow(x, exp >> 1); result = result * result; /* x^(2*a+1) == x^(2*a) * x */ if (exp & 1) result = result * x; return result; } float pow(float x, uint exp) { float result = 1.0; int bit; for (bit = sizeof(exp)*8-1; bit >= 0; bit--) { result *= result; if (exp & (1 << bit)) result *= x; } return result; } Iterative