Wednesday, May 2

Directly Executing Chunks of Memory: Function Pointers In C

In the first part of this series, we covered the basics of pointers in C, and went on to more complex arrangements and pointer arithmetic in the second part. Both times, we focused solely on pointers representing data in memory.

But data isn’t the only thing residing in memory. All the program code is accessible through either the RAM or some other executable type of memory, giving each function a specific address inside that memory as entry point. Once again, pointers are simply memory addresses, and to fully utilize this similarity, C provides the concept of function pointers. Function pointers provide us with ways to make conditional code execution faster, implement callbacks to make code more modular, and even provide a foothold into the running machine code itself for reverse engineering or exploitation. So read on!

Function Pointers

In general, function pointers aren’t any more mysterious than data pointers: the main difference is that one references variables and the other references functions. If you recall from last time how arrays decay into pointers to their first element, a function equally decays into a pointer to the address of its entry point, with the () operator executing whatever is at that address. As a result, we can declare a function pointer variable fptr and assign a function func() to it: fptr = func;. Calling fptr(); will then resolve to the entry point of function func() and execute it.

Admittedly, the idea of turning a function into a variable may seem strange at first and might require some getting used to, but it gets easier with time and it can be a very useful idiom. The same is true for the function pointer syntax, which can be intimidating and confusing in the beginning. But let’s have a look at that ourselves.

Function Pointer Syntax

If we break down a standard function declaration, we have a return type, a name, and an optional list of parameters: returntype name(parameters). Defining a function pointer is analogous. The function pointer must have a return type and parameters that match the function it is referencing. And just as with data pointers, we use the asterisk * to declare the pointer.

There’s one catch. If we have a function int func(void), writing int *fptr(void) won’t give us a function pointer, but instead another function returning an integer pointer int * as the * sticks to the int by default. Parentheses will turn the statement into a function pointer declaration by grouping the asterisk with the pointer name instead of the return type: int (*fptr)(void).

So the general pattern is: returntype (*name)(parameters). Let’s have a look at a couple of different function pointer declarations.

// function without parameters returning int, the one we just had
int (*fptr1)(void); // --> int function1(void) { ... }

// function with one int parameter returning int
int (*fptr2)(int);  // --> int function2(int param) { ... }

// function with void * parameters returning char *
char *(*fptr3)(void *); // --> char *function3(void *param) { ... }

// function with fptr1 type function pointer as parameter returning void
void (*fptr4)(int (*)(void)); // --> void function4(int (*param)(void)) { ... }

// function with int parameter returning pointer to a fptr1 type function
int (*(*fptr5)(int))(void); // --> int (*function5(int))(void);

Evidently, the syntax can become rather messy, and you may waste your time trying to make sense of it. If you ever encounter a pointer construct in the wild and have difficulties figuring out what goes where, the cdecl command line tool can be of great help, and it also comes as online version.

cdecl> explain int (*fptr1)(void)
declare fptr1 as pointer to function (void) returning int
cdecl> explain int (*(*fptr5)(int))(void)
declare fptr5 as pointer to function (int) returning pointer to function (void) returning int
cdecl>

On the other hand, if you find yourself in a situation where you need to write a pointer construct that may take multiple attempts to get it right, it’s probably a good idea to make use of C’s typedef operator.

// define new type "mytype" as int
typedef mytype int;
mytype x = 123;

// define new type "fptr_t" as pointer to a function without parameters returning int
typedef int (*fptr_t)(void);
fptr_t fptr = function1;
fptr();

// use fptr_t as parameter and return type
void function4(fptr_t param) { ... }
fptr_t function5(void);

You may have noticed that all the examples neither used the ampersand & when referencing functions, nor the asterisk * when dereferencing pointers to execute their underlying functions. Since functions implicitly decay to pointers, there is no need to use them, but we still can. Note that a function call has higher precedence over dereferencing, so we need to use parentheses accordingly.

// explicitly referencing with ampersand
fptr_t fptr = &function1;
// explicitly deferencing with asterisk
(*fptr)();
// not *fptr();

Whether to use the ampersand or asterisk is in the end a matter of taste. If you are at liberty to choose for yourself, go with whichever feels more natural and easier to read and understand to you. For our examples here, we will omit them both in the code, but we will add the alternative notation as comments.

Assigning And Using Function Pointers

Now that we’ve seen how to declare function pointers, it’s time to actually use them. In the true spirit of pointers, our first use case example is the reference to a previous article handling the implementation of a state machine with function pointers. It’s worth a read on its own, especially if you want to know more about state machines, so we don’t want to give away too much here. But summarized, instead of using a switch statement to determine and handle the current state, the state handler functions themselves assign the next state’s handler to a function pointer variable. The program itself is then periodically executing whichever function is stored in the variable.

void (*handler)(void);

void handle_some_state(void) {
    if (condition) {
        handler = handle_some_other_state;
     // handler = &handle_some_other_state;
    }
}

int main(void) {
    while (1) {
        handler();
    // (*handler)();
        ...
    }
    return 0;
}

Arrays of Function Pointers

We can apply a similar approach when we don’t have some pre-defined or linear state transitions, but for example want to handle random user input, with each input calling its own handler function.

void handle_input(int input) {
    switch (input) {
        case 0:
            do_something();
            break;
        case 1:
            do_something_else();
            break;
        ...
    }
}

// function that is periodically called from main()
void handler_loop(void) {
    status = read_status_from_somewhere();
    handle_input(status);
}

Since function pointers are just variables, we can pack them into an array. Just like with other arrays, the brackets [] are attached directly to the name.

// declare array of function pointer of type void func(void)
void(*function_array[])(void) = {
    do_something,      // &do_something,
    do_something_else, // &do_something_else,
    ...
};

We can now replace the previous switch statement by using input as array index:

void handle_input(int input) {
    // execute whichever function is at array index "input"
    function_array[input]();
 // (*function_array[input])();
}

In theory, this changed the complexity from O(n) to O(1) and made the execution time more predictable, but in practice, there are too many other factors, such as architecture, branch prediction, and compiler optimization weighing in. On a simple microcontroller, function pointers are often faster than if/then or case statements. Still, function pointers don’t come for free. Each pointer needs a place in the memory, and dereferencing the pointer requires copying the address, adding some extra CPU cycles.

A more convincing argument to replace the switch statement with function pointers is the added flexibility. We can now set and replace the handling function at runtime, which is especially useful when we are writing a library or some plugin framework where we provide the main logic, and let the user decide what to actually do on each state/input/event/etc. (Insert comment about sanitizing user input here.) Naturally, using an array makes most sense if the states or input values we handle are integers in consecutive order. In other cases, we might need a different solution.

Function Pointers as Function Parameters

Say we created a library that sends HTTP requests to a web server, and user-defined handlers deal with its response based on the HTTP status code. Taking the status code as array index will give us a huge, mostly empty array, wasting a lot of memory. The old switch statement would have been the better choice here. Well, instead of adjusting the body of the handle_input() function, how about we just turn the whole thing into a function pointer? You’ve just invented the callback.

// function pointer to the handle function
void (*handle_input_callback)(int);

// function to set our own handler
void set_input_handler(void (*callback)(int)) {
    handle_input_callback = callback;
}

// same handler, just using the function pointer now
void handler_loop(void) {
    status = read_status_from_somewhere();
    if (handle_input_callback != NULL) {
        // execute only if a handler was set
        handle_input_callback(status);
     // (*handle_input_callback)(status);
    }
}

On the user-defined side of the code we would then declare our own handler function and pass it on.

void my_input_handler(int status) {
    switch (status) {
        case 200: // OK
            ...
            break;
        case 404: // Not Found
            ...
    }
}

int main(void) {
    set_input_handler(my_input_handler);
 // set_input_handler(&my_input_handler);
    while (1) {
        handler_loop();
        ...
    }
}

A real-world system that heavily uses callbacks for user-defined behavior is the non-OS C SDK for ESP8266. An example where a function pointer parameter is used to define the function’s own behavior can be found in the C standard library’s Quick Sort function, qsort(), which takes a compare function as parameter to determine the order of any given data type.

Function Pointers as struct Members

After all what we’ve seen about pointers by now, it shouldn’t come as a surprise that function pointers can be struct members, and we declare and use them just like any other member.

struct something {
    int regular_member;
    void (*some_handler)(void);
    int (*another_handler)(char *, int);
} foo;

...
    foo.some_handler = some_function;
 // foo.some_handler = &some_function;
    foo.another_handler(buf, size);
 // (*foo.another_handler)(buf, size);
...

structs with an assortment of function pointers as members are commonly found in plugin systems, or where hardware dependent code is separated from the common, hardware independent logic. A classic example are Linux device drivers, but it doesn’t have to be that extreme for starters. Remember that overly complex LED toggle example from the first part where we defined the GPIO port as pointer? If we used the same concept for a push button input, and added some handler functions, we’d end up with a complete generic, hardware independent button handler framework.

Note that we can assign a NULL pointer also to function pointers, and executing such a function will result just like dereferencing any other NULL pointer in a segmentation fault. But if we expect it as possible value and check against it before execution, we can use it to disable the execution and make the handling itself optional.

Casting Function Pointers

If function pointers behave no different than any other pointer, we should be able to cast them to other pointers as we please. And yes, it is technically possible, and the compiler won’t stop us, but from the language standard’s point of view, chances are executing such a cast function pointer will result in undefined behavior. Take the following example:

int function(int a, int b) { ... }

void foo(void) {
    // cast function to type "int func(int)"
    int (*fptr)(int) = (int (*)(int)) function;
    fptr(10); // -> function(10, ???);
}

While technically a valid cast, we end up calling function() without a value for parameter b. How this is handled depends on the underlying system and its calling convention. Unless you have good reasons, you probably don’t want to cast a function to a mismatching function pointer. Curiosity is always a good reason, though.

Casting Between Function Pointers and Data Pointers

Can you turn functions into data or data into functions? Sure! And doing so is one of the cornerstones of hacking. As we recall, we need an accessible address and enough allocated memory at that address to successfully handle data pointers. With a function we have both: the function code as allocated memory, and the function itself as address. If we cast a function pointer to, say, an unsigned char pointer, dereferencing the pointer will give us the compiled machine code for that function, ready for reverse engineering.

Going the other way around, any data pointer that we cast to a function pointer can be run if points to valid machine code. Assembling machine code yourself, stashing it in a variable, recasting a pointer to that variable, and finally running it surely seems like a lot of hassle for everyday use. But that’s the basic recipe for exploiting security vulnerabilities by injecting shellcode, and generally a fun way to experiment with machine code — and maybe worth its own article some time in the future.

The End

While pointers can create headaches and frustration, they also offer us freedom and possible ways of working that are rarely matched by any other language construct out there. And while the world may be a safer place thanks to modern language designs that “solve” the problems that can arise with pointers if they are not handled with care, pointers will always have their place in both making, and breaking, software.

No comments:

Post a Comment