A little while back, we were talking about utilizing compiler warnings as first step to make our C code less error-prone and increase its general stability and quality. We know now that the C compiler itself can help us here, but we also saw that there’s a limit to it. While it warns us about the most obvious mistakes and suspicious code constructs, it will leave us hanging when things get a bit more complex.
But once again, that doesn’t mean compiler warnings are useless, we simply need to see them for what they are: a first step. So today we are going to take the next step, and have a look at some other common static code analysis tools that can give us more insight about our code.
You may think that voluntarily choosing C as primary language in this day and age might seem nostalgic or anachronistic, but preach and oxidate all you want: C won’t be going anywhere. So let’s make use of the tools we have available that help us write better code, and to defy the pitfalls C is infamous for. And the general concept of static code analysis is universal. After all, many times a bug or other issue isn’t necessarily caused by the language, but rather some general flaw in the code’s logic.
Compiler Warnings Recap
But let’s first take a step back again to compiler warnings. If we recall the nonnull
attribute which indicates that a function’s parameter can’t and therefore won’t be NULL
, we saw that the compiler’s perspective is extremely shortsighted on it:
extern void foo(char *) __attribute__((nonnull)); void bar(void) { char *ptr = NULL; foo(NULL); // warning foo(ptr); // no warning here }
The compiler will warn about the foo(NULL)
call, as it is an obvious violation of the nonnull
declaration, but it won’t realize that the second call will eventually also pass NULL
as parameter. To be fair though, why should it understand that, its primary job is to generate a machine-readable executable from our source code?
Now, this example is a rather clear case, and while the compiler may not warn about it, it is still easy to spot. If you have decent code review practices in place, it should be straightforward to detect the mishap. But sometimes it’s just us by ourselves, no other developer to review our code, and due to tiredness or other reasons, it might simply slip by our eyes. Other times, the potential issue hiding underneath is a lot less obvious, and it might take a whole series of unfortunate events for it to become an actual problem. We’d have to go mentally through every possible execution path to be sure it’s all good.
Either way, it rather sounds like a waste of time to use manual labor for something that practically screams for automatization. So let’s have a look at a few common tools made just for that. Note that we’ll be merely scratching the surface here, consider this more a brief overview of what tools are available.
Static Code Analysis Tools
Static code analysis involves inspecting our program just by analyzing its source code, without ever executing it. For example, it won’t consider the actual data that is processed in a set of functions, but instead make sure that data is passed along and handled in a safe and logical way. This is certainly a subject where throwing money at the problem will get you bigger and shinier tools, and while they have their place in the professional world, we’ll focus on the everyday hacker tinkering on their free time projects, and see what the open source community has to offer.
While the initial example was good to recall the shortcomings of compiler warnings, demonstrating the full strength of the other tools cannot be done with a simple scenario. The best way is to see for yourself by using them on either your own code, some other tools and programs you frequently compile or use, or then browsing for some random projects on GitHub and the likes.
clang
Yes, let’s start with clang
. But before you start to groan and think “drop the compiler warnings already and move on”, there’s more to clang
than its compiler infrastructure, such as its own static code analyzer. It supports the same targets clang
does, and can be invoked by preceding your usual build command with the scan-build
command.
$ scan-build clang -o foo foo.c
The analyzer doesn’t necessarily require clang
as compiler, so this will work as well:
$ scan-build gcc -o foo foo.c
Or then you just run make
:
$ scan-build make ... scan-build: n bugs found scan-build: Run 'scan-view /tmp/scan-build-xyz' to examine bug reports. $
While you can’t simply pass a list of source files to scan-build
, but rather need to perform an actual build, it has the advantage that the compilation and analysis are done at the same time. This makes the analysis part of the build process itself, instead of some tedious extra task you should always remember about. After all, it’s up to us to actually use and act on what the tools can provide us. The less they interfere with our flow, the less reluctant we might be to eventually use them and see what they have to say.
Speaking of seeing what they have to say, if you take another look at the last output line scan-build
displays, you will find a command to display the results of the analysis. Behind the scan-view
command is a simple Python script that starts a local web server and opens the report overview page in your browser. You’ll get more or less the same if you just open file:///tmp/scan-build-xyz/index.html
in your browser, and in case you despise anything that doesn’t run in a terminal, this works well enough in your common text mode browsers.
When running scan-build
, it might for example output that in a specific place NULL
might be passed somewhere where it shouldn’t be, but it won’t tell you under which circumstances. The great thing about the browser-based report here is that you can navigate through the code and follow step by step, for each loop and condition branch, how a potential issue might turn into a bug. Keep in mind that the program is never actually run, so you might encounter some false positives that are never a valid or possible scenario in reality. The other way around, each tool has a different focus, so some issues might not even be considered.
Static code analysis is by no means a one-size-fits-all job, so it won’t hurt to use more than a single tool for it. Well, let’s move on to the next one then.
(sp)lint
The probably best known tool for static code analysis is lint
, which has somewhat become a synonym for static code analysis itself. In your average Linux distribution, you should find splint
as one implementation of it. Unlike clang
‘s static analyzer, splint
will take the source files and analyzes them without running any compilation.
$ splint foo.c ... Finished checking --- 3 code warnings $
splint
is a quite complex tool with plenty of flags to enable and disable checks, and control its behavior. It also comes with its own source code annotations defined with a special formatted comment /*@annotation@*/
that will influence what is analyzed and reported. Whether you like this sort of (debatable) noise in your code is of course up to you.
You should probably be aware though that the latest release of splint
is from 2007. Of course, that doesn’t mean it’s outdated, plenty of potential issues are timeless and have been around for longer than the last 11 years. Theoretically, you should also be able to use splint
for code targeting for example AVR microcontrollers, but that might have some emphasis on the “theoretical” part. It will generally take a lot of tweaking and digging through the output to get the most out of it. If you are curious and persistent enough, the splint manual is probably a good place to start.
flawfinder
As mentioned before, every tool usually has a different focus area. In case of flawfinder
, that focus is security vulnerabilities, Common Weakness Enumerations (CWE) in particular. While this offers a generally good overview of insecure C functions and practices, it mainly warns whenever a dangerous construction is detected. It doesn’t seem to check if there is an actual problem in the code, just that there might be in case you end up using it wrong.
Nevertheless, there is a reason for the word common in CWE, so even though you made sure everything is okay with your current implementation, it doesn’t hurt to be reminded every once in a while about those common weaknesses, without proactively digging through every man page. And on a side note, the author or flawfinder
has also written a book about secure programming and released it under the GNU Free Documentation License, in case you want to read up some more on that topic.
cppcheck
The last tool we’ll be mentioning, albeit its misleading name, is cppcheck
, which covers both C++ and C, and focuses on undefined behavior. If you can afford or already possess the MISRA rule texts, you can include them as well. Some of them are also covered out of the box, and of course, it’s still a fully functional code analyzer even without purchasing the rule texts.
cppcheck
also lets you write your own rules, and reports its finding either as custom formattable text, or as XML, and offers integration to most common IDEs. And in case you want to click something every once in a while, or are otherwise somewhat put off by wading through walls of console text, it also comes with a graphical user interface as alternative to the command line, which will show the reported issues along with the matching source code.
Honorable Mention
One more tool that sounds promising and might be worth looking into is frama-c
.
Limitations
Clearly, no single tool can analyze and detect every possible flaw, otherwise the list would have been a lot shorter. And just as some tools will miss some issues, they can also overcompensate by enthusiasticly reporting what turn out to be false positives. As mentioned before, you need to decide for yourself which warning you consider valid and something you need to address. This may seem tedious and a waste of time — exactly what the tools were supposed to help you avoid. And maybe it often is, but it will also help you to better understand your own code, and see some of its implications from an angle you never may have considered. And when it does find a rare bug, it’ll pay off.
After some initial fiddling with the tools, you will also notice that some of them will require a lot of tweaking to get the most out of them, as was already mentioned for splint
. So it’s again up to you to weigh whether investing that time will be worth in the long run. Unlike compiler warnings, getting rid of each and every warning from code analysis tools might not be the most rewarding process, especially when so many are false positives. Coders discretion is advised.
Of course, static code analysis has by design the limitation that actual data and its meaning is neither considered nor checked. An int
is an int
, and as long as we don’t cause an overflow or other operations that violate the language specification or end in undefined territory, we’ll be most likely good to go from static code analysis point of view. It won’t detect or care if the int
‘s value must be in a certain range in order to make sense and cause no harm in the rest of the program’s context, for instance. We’d have to actually execute our code to know what’s happening there. So with that being said, next time we will talk about assertions, and why it’s often better to go out with a bang early on.
No comments:
Post a Comment