Friday, October 9

Code Craft: Subtle Interrupt Problems Stack Up

[Elliot Williams’] column, Embed with Elliot, just did a great series on interrupts. It came in three parts, illustrating the Good, the Bad, and the Ugly of using interrupts on embedded systems. More than a few memories floated by while reading it. Some pretty painful because debugging interrupt problems can be a nightmare.

One of the things I’ve learned to watch out for over the years is the subtlety of stack based languages, like C/C++, which can ensnare the unwary. This problem has to do with the corruption of arrays of values on a stack during interrupt handling. The fix for this problem points up another one often used by black hats to gain access to systems.

Almost all processors popular with hackers today use a stack.  To visualize a stack think of a stack of plates. You can take the top plate off the stack. You can add some plates back on the top of the stack. What you can’t do is add or remove plates from the middle or the bottom. The stack in a processor works the same way. You can push data onto the stack and you pop data to take it off. But just like the plates, you can’t remove something from the middle of the stack.

Basics of the Stack

For some strange historical reason in diagrams stacks are always drawn growing downward so the top is the bottom. I think some hardware guy did that. Actually, it’s because stacks start at higher memory addresses and grow toward lower memory addresses.
abcd stack diagramThe diagram should help explain how stacks work. The actions on the stack are shown on the top line and each column is the stack as it changes. For instance, the first action is to push D and E onto the stack and the second to pop one item off the stack.

A CPU has multiple registers. The two of import here are the stack pointer (SP) and the program counter (PC). The SP always points to the top of the stack. If something is pushed onto the stack the SP is decreased. (Remember it grows toward lower memory addresses). If an item is popped, the SP is increased. The amount of change to the SP depends on the size of the data pushed onto the stack.

The PC points to the code being executed. As instructions are executed the PC steps from one to another. When a function is called the instruction pushes the contents of the PC onto the stack. The instruction then loads the PC with the address of the first instruction of the called function. The first code sequence executed does some housekeeping and adjusts the SP to allow space for the local variables of the called function.

At the return instruction the housekeeping is undone and the SP is moved to point where the return address was stored. The address is popped to the PC and execution continues in the calling function.

Consider two functions, one that adds data to a buffer array and another that calls the first function but itself will be interrupted (shown as a comment in this example).

char* <span class="mceItemHidden" data-mce-bogus="1"><span class="hiddenSpellError" pre="char " data-mce-bogus="1">putInBuffer</span></span>(...<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...) {
   char local_buf[25];
   ... code to put something in local_buf ...
   return local_buf;
}

void <span class="hiddenSpellError" pre="void " data-mce-bogus="1">callingFunc</span>(...<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...) {
   ...some code...
   char* buffer = <span class="hiddenSpellError" pre="buffer " data-mce-bogus="1">putInBuffer</span>(....<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

Here are the details:

  1. When callingFunc calls putInBuffer the location of that call is taken from the PC and pushed onto the stack. The SP is adjusted.
  2. The space for local_buf is allocated on the stack, i.e. 25 bytes are allocated. The SP is adjusted to allow for that space.
  3. The putInBuffer code puts something into local_buf.stack pointer diagram
  4. At the return, the pointer to local_buf is passed back to callingFunc. The details don’t matter here.
  5. The SP is adjusted to point at the return address.
  6. The return address is popped from the stack into the PC.
  7. Execution continues in callingFunc.
  8. Somewhat later callingFunc uses the data pointed to by buffer.

Consider where the SP is pointing. It is pointing above the address where it stored the PC. This means that the pointer in buffer is pointing to an address beyond the top of the stack.

When an interrupt occurs:

  1. The PC counter is pushed onto the stack.
  2. Many of the other registers in the CPU are pushed onto the stack.
  3. The PC is changed to point at the code for the interrupt.
  4. The interrupt code executes and returns.
  5. The CPU registers are restored.
  6. The PC counter is restored.
  7. The SP is back where it started and processing continues in callingFunc.

Assume the interrupt occurred where I marked it in callingFunc. 

What happened to the data that was in local_buf?

An Interrupt Ate My Data

The data in local_buf got clobbered by the interrupt stack manipulation. This happens before callingFunc could use it. As I said, subtle. I’ve seen inexperienced and experienced developers fall into this pitfall a number of times.

It is especially tricky to find this error because you can get away with it when returning non-pointer values. If you return an integer or floating point value it is actually put into the storage allocated in callingFunc. It doesn’t matter what interrupt stack pointer diagramhappens to the stack in this situation. What makes it really nasty is the code will work correctly most of the time. An intermittent bug like this takes painstaking analysis to find. I’ve seen a function with this problem re-written five times by a developer to fix the bug. He never found the problem until I,  after looking at it more times than I’d like to admit, was finally able to pointed it out.

The function putInBuffer can be saved, maybe, by making local_buf a static variable. A static local variable is not kept on the stack but in the global memory space (there’s a post on the static keyword if you want to learn more). That also means the data from the previous call is still in the variable. Sometimes that is an acceptable approach. It looks like this:

char* <span class="mceItemHidden" data-mce-bogus="1"><span class="hiddenSpellError" pre="char " data-mce-bogus="1">putInBuffer</span></span>(...<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...) {
   static char local_buf[25];
   ... code to put something in local_buf ...
   return local_buf;
}

void <span class="hiddenSpellError" pre="void " data-mce-bogus="1">callingFunc</span>(...<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...) {
   ...some code...
   char* buffer = <span class="hiddenSpellError" pre="buffer " data-mce-bogus="1">putInBuffer</span>(....<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

Reentrant Functions

I said it’s maybe possible to fix the function using static because by doing so putInBuffer is no longer reentrant. A reentrant function is one that can be called simultaneously or recursively. Simultaneous calls require the use of a multi-tasking operating system with multiple threads. In that case thread R could call the function, be swapped out, and then thread S also calls the function. One of those two threads is going to get the wrong data because it’s stored in the global memory space.

What about recursion? And what is it? The old programming joke is recursion: see recursion. Recursion is when a routine calls itself. There are classes of problems where a recursive solution is easier. In this case, if a recursive function called putInBuffer, and then called itself before using the data, the data is going to be messed up.

The real solution is to change putInBuffer so the calling program provides the buffer and the size of the buffer. The buffer size is critical to avoid another problem: a buffer overrun exploit. This is a black hat hack used to compromise systems. A buffer passed into a function must be a fixed length. The function must be sure any input data does run past the end of the buffer. In the best case this just causes your system to crash. In the worst, it provides a way for black hat’s to introduce malicious code. Here’s the best way to write this routine.

char* <span class="mceItemHidden" data-mce-bogus="1"><span class="hiddenSpellError" pre="char " data-mce-bogus="1">putInBuffer</span></span>(char* buffer, const int buf_size, ...<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...) {
   if (buf_size &lt; ...whatever the output size is going to be...) {
      return null;   // or another value callers will understand
   }
   // doesn't matter if interrupt is called here
   ... code to put something in buffer ...
   return buffer;
}

void <span class="hiddenSpellError" pre="void " data-mce-bogus="1">callingFunc</span>(...<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...) {
   ...some code...
   char buffer[20];
   <span class="hiddenSpellError" pre="" data-mce-bogus="1">putInBuffer</span>(buffer, <span class="hiddenSpellError" pre="" data-mce-bogus="1">sizeof</span>(buffer), ....<span class="hiddenSpellError" pre="" data-mce-bogus="1">params</span>...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

It’s a convenience to the calling routine to return the pointer to the buffer. This allows the called routine to be used as an argument in other function calls. One typical situation is converting a numeric value to text.

Stack related problems live in one of those dirty little corners of C/C++, and any stack based language, that can cause hours of hair pulling frustration. You can see why I’m bald.


Filed under: Hackaday Columns, Microcontrollers, Software Development

No comments:

Post a Comment