Design of Algorithms
C Review
C Review
Table of Contents
- Data types
- Function declarations
main
Function- Compilation
- Preprocessor directives
- Library functions
- Pointers
- Arrays
- Structs
- Dynamic Memory Allocation
- Header Files
- Import guards
- Makefiles
- Linking with external libraries
- Debug
- Function pointers
- Polymorphism
static
const
Data types
Integer
int
: 2 or 4 bytes (platform dependent)char
: 1 byteshort
: 2 bytes-
long
: 4 bytes - corresponding
unsigned
types for non-negative numbers - e.g.
int
may store -32768 to 32767unsigned int
stores integers from 0 to 65535
Floating point numbers
float
double
char
s and strings
char
stores a single ASCII character- Strings: arrays of chars terminated by a null byte (
'\0'
)- e.g. “Hello world!” is stored as the array of characters:
['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!', '\0']
- e.g. “Hello world!” is stored as the array of characters:
Boolean values
- no built-in boolean type, integers can be used
- non-zero values: true
-
0: false
- C99 with
stdbool.h
providesbool
data type withtrue
andfalse
Function declarations
- place function prototype declarations at top of file as good practice so you don’t need to worry about ordering of functions in file
// prototype (at top of file)
return_type function_name(arg_type arg_name);
// function implementation
return_type function_name(arg_type arg_name) {
return ret_value;
}
main
Function
- when a C program is run from command line,
main
function is executed argc
: argument counter; number of arguments suppliedargv
: argument vector; array of argument strings- return value: indicates success (0) or failure (non-zero) of program
Program to print the number of arguments and what they are:
int main(int argc, char **argv) {
int i;
printf("Number of arguments: %d\n", argc);
for (i = 0; i < argc; i++) {
printf("%s\n", argv[i]);
}
return 0;
}
Compilation
To compile hello.c
$ gcc -Wall -pedantic -o hello hello.c
-Wall
: warnings all; highest level compiler warnings turned on-pedantic
: enables another set of compiler errors-o <file_name>
: output program should be called<file_name>
<source>.c
: source file- for debugging, compile with
-g
to access source code/variable names/function names from inside debuggers e.g.gdb
,lldb
Preprocessor directives
- keywords that start with
#
e.g.#define, #include
- these are evaluated prior to compilation by the preprocessor, which effectively copy and pastes the definition/included function definition into the code
Library functions
Standard library header files imported using #include
preprocessor directive
#include <assert.h> // contains assert, frequently used to verify malloc
#include <math.h> // math functions e.g. cos, sin, log, sqrt, ceil, floor
#include <stdio.h> // input/output e.g. printf, scanf
#include <stdlib.h> // contains NULL, memory allocation e.g. malloc, free
int main(int argc, char **argv) {
/* ... */
return 0;
}
Pointers
- pointers are memory addresses
- we can have types which hold memory addresses to integers and floats using an asterisk
int *my_ptr
: contains address of an intint **
: pointer to a pointer; address of an address to an integer&foo
: memory address/pointer tofoo
; “address of foo”*bar
: access data stored at pointerbar
; “data stored at bar”- pointer arithmetic: pointer type knows which data type it points to, and
therefore knows the size. If
int *my_ptr
is a pointer to the start of an array of integers, you can jump forward the size of anint
withmy_ptr+1
Arrays
- creating a static array:
int my_array[100];
to create an array with room for 100 integers my_array[7]
to access the 8th element of the array- arrays in C are simply pointers to the first element of the array, so:
my_array[10]
$\iff$*(my_array + 10)
&my_array[10]
$\iff$my_array + 10
-
explicit definition of static array:
int arr[] = {1, 2, 3, 4, 5};
- tip: always use pointer notation for data types (in function definitions etc.) i.e.
// preferred int get_length(int *array) { /* ... */ return length; } // not recommended int get_length(int array[]) { /* ... */ return length; }
Structs
- encapsulate multiple pieces of data e.g. student record
typedef struct student Student; struct student { char *first_name; char *last_name; int id; float mark; }
- here we created a struct
student
which can be referred to withstruct student
- syntactic sugar:
typedef
this toStudent
, such thatStudent
is an alias forstruct student
- an alternative that avoids the intermediate name is:
typedef struct { char *first_name; char *last_name; int id; float mark; } Student;
- this doesn’t allow you to reference the struct within the definition e.g. nodes
for a linked list/graph:
typedef struct node Node; struct node { int data; Node *next; }
Accessing fields
Student matthew;
// dot notation
matthew.student_number = 123456;
Student *james = malloc(sizeof(*james));
assert(james);
// arrow notation
james->student = 654321;
free(james);
james = NULL;
foo.bar
$\iff$(&foo)->bar
foo->bar
$\iff$(*foo).bar
Dynamic Memory Allocation
- variables declared inside a function are usually stored on the stack
- function’s local variables and function parameters exist in a stack frame
specific to the function
- stack frame only lasts as long as the function is running
- once the function returns the local variables/function parameters are de-allocated
- size of variables needs to be known at compile time
malloc
requests specific amount of memory on the heap which exists until we explicitlyfree
it- memory allocated at runtime, and may fail e.g. program already has used full allowance of memory OS has reserved for it
- use
assert
to check the pointer is not NULL i.e. has been successfully allocated malloc
returns a void pointer
void *malloc(size_t size) // size: size of memory block [bytes]
Example: allocating memory for an int
int *my_int = malloc(sizeof(*my_int)); // cast to (int *)
assert(my_int); // check pointer is not null, i.e. malloc succeeded
/* do stuff */
free(my_int); // free the memory
my_int = NULL; // ensure that we don't inadvertently access freed memory
Variable-sized array
- arrays are pointers to first element in the array, so you can use
malloc
to allocate a variable sized array. Forn
items you can allocate a block with enough space forn
adjacent items:int n = 10000; double *array = malloc(sizeof(*array) * n); /* magic happens here */ free(array); array = NULL;
Header Files
- modules are used to separate out code into related groups. Consists of:
module.h
: consists of a header file, containing:- info on how to use the module,
- function prototypes
- type definitions
module.c
: file containing implementations
#include "module.h"
is then used to access the definitions
Import guards
- C doesn’t allow you to declare things more than once
- good practice: use if guards to prevent a
.h
file being included more than once - define a macro per header file, and only declare anything if it hasn’t been defined yet
e.g. to write a hello world module
hello.h
:
// import guard
#ifndef HELLO_H
#define HELLO_H
// print "hello, {name}!" on a line
void hello(char *name);
#endif
hello.c
:
#include <stdio.h>
#include "hello.h"
// print "hello, {name}!" on a line
void hello(char *name) {
printf("Hello, %s!\n", name);
}
main.c
#include "hello.h"
int main(int argc, char **argv) {
char *name = "Barney";
hello(name);
return 0;
}
To compile a program with multiple `.c` files:
```console
$ gcc -o <executable name> <list of .c files>
For this example
$ gcc -o main main.c hello.c
Makefiles
make
keeps track of changes across various files, only compiles what needs to be
recompiled when something changes
- example Makefile for compiling C programs
# # # # # # #
# Sample Makefile for compiling a simple multi-module C program
#
# created for COMP20007 Design of Algorithms 2017
# by Matt Farrugia <matt.farrugia@unimelb.edu.au>
#
# Welcome to this sample Makefile. If you're new to make and makefiles, have a
# read through with the comments and follow their instructions.
# VARIABLES - change the values here to match your project setup
# specifying the C Compiler and Compiler Flags for make to use
CC = gcc
CFLAGS = -Wall
# exe name and a list of object files that make up the program
EXE = main-2
OBJ = main-2.o list.o stack.o queue.o
# RULES - these tell make when and how to recompile parts of the project
# the first rule runs by default when you run 'make' ('make rule' for others)
# in our case, we probably want to build the whole project by default, so we
# make our first rule have the executable as its target
# |
# v
$(EXE): $(OBJ) # <-- the target is followed by a list of prerequisites
$(CC) $(CFLAGS) -o $(EXE) $(OBJ)
# ^
# and a TAB character, then a shell command (or possibly multiple, 1 line each)
# (it's very important to use a TAB here because that's what make is expecting)
# the way it works is: if any of the prerequisites are missing or need to be
# recompiled, make will sort that out and then run the shell command to refresh
# this target too
# so our first rule says that the executable depends on all of the object files,
# and if any of the object files need to be updated (or created), we should do
# that and then link the executable using the command given
# okay here's another rule, this time to help make create object files
list.o: list.c list.h
$(CC) $(CFLAGS) -c list.c
# this time the target is list.o. its prerequisites are list.c and list.h, and
# the command (its 'recipe') is the command for compiling (but not linking)
# a .c file
# list.c and list.h don't get their own rules, so make will just check if the
# files of those names have been updated since list.o was last modified, and
# re-run the command if they have been changed.
# actually, we don't need to provide all that detail! make knows how to compile
# .c files into .o files, and it also knows that .o files depend on their .c
# files. so, it assumes these rules implicitly (unless we overwrite them as
# above).
# so for the rest of the rules, we can just focus on the prerequisites!
# for example stack.o needs to be rebuilt if our list module changes, and
# also if stack.h changes (stack.c is an assumed prerequisite, but not stack.h)
stack.o: stack.h list.h
# note: we only depend on list.h, not also list.c. if something changes inside
# list.c, but list.h remains the same, then stack.o doesn't need to be rebuilt,
# because the way that list.o and stack.o are to be linked together will remain
# the same (as per list.h)
# likewise, queue.o depends on queue.h and the list module
queue.o: queue.h list.h
# so in the future we could save a lot of space and just write these rules:
# $(EXE): $(OBJ)
# $(CC) $(CFLAGS) -o $(EXE) $(OBJ)
# list.o: list.h
# stack.o: stack.h list.h
# queue.o: queue.h list.h
# finally, this last rule is a common convention, and a real nice-to-have
# it's a special target that doesn't represent a file (a 'phony' target) and
# just serves as an easy way to clean up the directory by removing all .o files
# and the executable, for a fresh start
# it can be accessed by specifying this target directly: 'make clean'
clean:
rm -f $(OBJ) $(EXE)
Linking with external libraries
e.g. to access math functions sqrt, log
etc. in math.h
, C source code:
calc.c
#include <math.h>
- static libraries: stored in archive files (
.a
)- created with GNU archiver tool
ar
- created with GNU archiver tool
- library search path: where
gcc
looks for library files- default: standard libraries found searched for in:
/usr/local/lib
/usr/lib
- search for file is from top to bottom, with first file found taking precedence
- math library:
/usr/lib/libm.a
- standard library:
/usr/lib/libc.a
- default: standard libraries found searched for in:
- include path: where
gcc
looks for header files- corresponding headers in
/usr/include
- math header:
/usr/include/math.h
- corresponding headers in
-l<name>
Link the math
library with full path:
$ gcc -Wall calc.c /usr/lib/libm.a -o calc
More succinctly: compile with -lm
flag to link math library
$ gcc -Wall calc.c -lm -o calc
- linkers typically search for functions from left to right in libraries specified
- if
data.c
uses librarylibglpk.a
which useslibm.a
, compile as:$ gcc -Wall data.c -lglpk -lm
-I/path/to/dir
and -L/path/to/dir
-I
: specify include path-L
: specify library path- e.g.
dbmain.c
: makes uses of headergdbm.h
and library `libgdbm.a’#include <gdbm>
- GDBM v1.8.3 package installed under `/opt/gdbm-1.8.3’:
- header file:
/opt/gdbm-1.8.3/include/gdbm.h
- library:
/opt/gdbm-1.8.3/lib/libgdm.a
- header file:
- compile and link
dbmain.c
with$ gcc -Wall -I/opt/gdbm-1.8.3/include -L/opt/gdbm-1.8.3/lib dbmain.c -lgdbm
Environment variables
- by specifying environment variables, this can be simplified:
$ C_INCLUDE_PATH=/opt/gdbm-1.8.3/include $ export C_INCLUDE_PATH $ LIBRARY_PATH=/opt/gdbm-1.8.3/lib $ export LIBRARY_PATH $ gcc -Wall dbmain.c -lgdbm
- extended search paths:
DIR1:DIR2:DIR3:...
- e.g. include current directory and /opt/gdbm-1.8.3/include
$ C_INCLUDE_PATH=.:/opt/gdbm-1.8.3/include
- compiler searches directories in order:
- command-line: -I, -L, left-to-right
- environment variables
- default system directories
Shared libraries
- static library
.a
- shared libraries:
.so
(shared object)- uses more advanced linking, reducing size of executable
- library can be updated without recompiling dependent programs
- dynamic linking: before executable starts running, machine code for external functions
is copied from shared library file
- executable linked against shared library contains only a small table of functions it needs, rather than complete machine code from object files for external functions
- reduces executable size: only one copy of a library needed for multiple programs
- most OSs provide virtual memory so that one copy of a shared library in physical memory can be used by all running programs
gcc
compiles to use shared libraries by default- if
.so
file found in link path, this is used in preference to.a
(static library)
- if
- when executable file is started, loader must find shared library to load into memory
- by default loader searches in default system directories
/usr/local/lib, /usr/lib
- by default loader searches in default system directories
- to set load path:
$ LD_LIBRARY_PATH=/opt/gdbm-1.8.3/lib $ export LD_LIBRARY_PATH $ ./a.out # runs successfully
- environment variables can be set in your bash/shell profile
Forced static linking
-static
avoids use of shared libraries ```console $ gcc -Wall -static -I/opt/gdbm-1.8.3/include/ -L/opt/gdbm-1.8.3/lib/ dbmain.c -lgdbm $ ./a.outruns successfully
``
Warnings
-Wall
shows a variety of warnings- To help find problems:
$ gcc -ansi -pedantic -Wall -W -Wconversion -Wshadow -Wcast-qual -Wwrite-strings
Debug
- conditional compilation ```c #define DEBUG
#ifdef DEBUG // stuff here only compiles when DEBUG is defined #endif
- `gcc` has built in debug support with the `-DDEBUG` flag, without you needing
to define `DEBUG`
```console
$ gcc -Wall -DDEBUG -o program program.c
Function pointers
- e.g. Moffatt 10.4
double (*F) (double); F= sqrt("x=%.4f, F(x)=%.4f\n", x, F(x)); // prints "x=2.0000, F(x)=1.4142"
- allow you to pass in an arbitrary function as an argument to another function
Polymorphism
- polymorphic library: allows software modules to be abstracted and reused
- additional design effort but much more versatile
- make use of
void *
for generic data - implementation specific functions are passed by function pointer (e.g. to execute comparison between instances)
e.g. Moffatt 10.5
// treeops.h
typedef struct node node_t;
struct node {
void *data; // pointer to stored structure
node_t *left; // left subtree of node
node_t *right; // right subtree of node
};
typedef struct {
node_t *root; // root node of tree
int (*cmp)(void*, void*); // function pointer
} tree_t;
// create an empty tree, pass in a comparison function to be used subsequently
tree_t *make_empty_tree(int func(void*, void*));
int is_empty_tree(tree_t *tree);
void *search_tree(tree_t *tree, void *key);
tree_t *insert_in_order(tree_t, *tree, void *value);
// traverse the tree, with pointer to action function to take
void traverse_tree(tree_t *tree, void action(void*));
void free_tree(tree_t *tree);
static
static
variable: allows functions to maintain state between calls- variable cannot be accessed outside the function
- do not use with recursion
static
function: cannot be accessed outside the source file in which it is defined; way to ensure private routines are only accessible within a module
const
- storage class
const
can be used to tag variables that do not change in the execution of the program, allowing the compiler to handle more efficiently