#include <rules>
15 Nov 2010We’re stuck with C++, at least for another console generation. C++ has many quirks that I wish were not there, but there is no real alternative as of today. While modern languages tend to adopt the bulk compilation and/or smart linkers and so can have a proper module system and eat the cake too, C++ is stuck with header files (on the other hand, C++ builds are incremental and almost embarrassingly parallel). While the strategy of dealing with header files and staying sane seems more or less obvious, I’m amazed as to how many people still get this wrong. I hope that this post helps to clear the mud somewhat. The post applies to C as well, but is useless for people who are blessed to work with other languages.
The problem with include files is that the preprocessor is usually quite dumb - you tell it to include the file, it includes the entire contents of the file, recursively. If you don’t tell it to include the file but try to use the symbol from that file - you get a compilation error. If you tell it to include too many files, it includes all of them, and the compilation time suffers.
In general, the more a header is included in other files (including transitive inclusion, i.e. A includes B includes C means that A indirectly includes C), the more files you’ll need to recompile once the header changes. Iteration time is very important - which is a topic for another time - so we’d like to minimize the amount of header inclusion. This brings us to the first important rule: Each file should include the minimum amount of files. The rule helps ensure that your code builds fast.
Now, let’s suppose that the header file contains a class declaration. By the nature of C++, a class declaration won’t compile without some other declarations - for example if a class A inherits from a class B and contains a field of type C, then you have to give the compiler declarations of both B and C in the same translation unit (i.e. in the cpp file that you’re compiling - after preprocessor has done its work) - before A’s declaration. Now, there are two options here - you can either include the relevant header files in the header with A’s declaration, or force the user to always include B and C headers manually before A. The problem is that sometimes the user does not know about these dependencies (i.e. the field of type B can be private), sometimes the dependencies change, so every time you’re adding some declaration dependencies to your types you’re breaking user’s code, and, since declaration dependencies are transitive, often to include a single header you’ll need a dozen or more seemingly unrelated ones. For this reasons, it’s important for all headers to be self-contained - anybody should be able to include any header in any cpp file without compilation errors. Which brings us to the second important rule - each file should include all dependent headers, i.e. for each declaration that’s required by the compiler there should be a corresponding include. This rule helps ensure that the programmers stay sane.
These two rules together define the algorithm for proper header file authoring: for each required declaration, include a corresponding header in your header file; don’t include more headers than that. In order to guarantee that you did not forget the necessary headers, make sure that your header file is the first #include in the corresponding source file, except the common header, if your codebase has one.
Do not include a header for a dependency declaration where a forward declaration will suffice; use forward declarations when possible (if you’re not familiar with forward declarations, google it). Sometimes it pays off to go to extra lengths to remove header dependencies, using techniques like pimpl - this depends on the exact situation, but avoid including heavy platform files, like windows.h or d3d9.h, to popular headers (I’ve written about a way to make a slim version of d3d9.h in a blog post, scroll down to the last section).
With the rules above, there is only one thing left - since we can include a header twice accidentally (i.e. A depends on B and C, and B depends on C, so C is included twice into A), we’ll need some protection against that. So each file should include the guards against multiple inclusion. There are two methods for this - either use #pragma once or use header guards. #pragma once is a non-standard technique, that tells the preprocessor explicitly “don’t include this file more than once in a single translation unit”. Header guards can emulate the behavior using preprocessor defines:
#ifndef FILE_NAME_H
#define FILE_NAME_H
...
#endif
Many people don’t know this, but #pragma once is widely supported in modern compilers. It’s superior to header guards in two ways: it can be faster than header guards (i.e. MSVC does not read the file with #pragma once more than once, but does read the file with header guards several times), and it’s foolproof - you don’t have to invent the identifier for a header so you can’t screw it. So use #pragma once if you can, use header guards if you must. If some compilers that you use don’t support #pragma once and you can’t convince the vendors to add the feature, make sure that the header guards are unique using a deterministic generation algorithm. For example, you can use something like “take the list consisting of the name of the project, and all components of the relative file path; convert all elements to upper case and join with underscore”, resulting with identifiers like THEGAME_RENDER_LIGHTING_POINTLIGHT_H. Do not use short file names alone, they are not unique! (unless your coding standard requires that). Oh, and if you don’t use an autogenerating macro, don’t put a comment after the #endif (i.e. #endif // THEGAME_RENDER_LIGHTING_POINTLIGHT_H) - such comments are only useful as a copy-paste history.
While using header guards allows you to have the same file included several times in a single translation unit, it also allows you to test whether the file was already included, i.e. #ifdef THEGAME_RENDER_LIGHTING_POINTLIGHT_H. You should never conditionally exclude a section of a header file based on whether some file was included! Doing this introduces the inclusion order dependency which is unnatural, and hard to debug without a preprocessor output. If you’re thinking about something like “oh, if the renderer interface was included, I should probably provide a light renderer class, but otherwise it would just add unnecessary clutter”, you should split your header file in two parts, and the second part should explicitly include the renderer interface, since it depends on it.
At least in game development, the language is frequently extended with some generally useful primitives that are used throughout the whole codebase. The most used one is probably an assertion macro (since the standard one sucks, you should have your own), but there are other examples - logging facilities, fixed-size types, min/max functions, various platform/configuration defines (“are we on a big-endian platform?”), memory management-related macros. It’s common practice to put all of those in a single common header file; you should control the size of this file (where by ‘size’ I mean the cumulative size of all headers it includes, of course), and you should make sure that each source file includes the common header before everything else - otherwise you’ll get into trouble (sometimes you’ll spend several hours looking for the reasons - i.e. if you include a header that checks platforms endianness before the common file, you’re in the world of hurt).
Well, I think that’s all about header files; there are also the include paths though. In order to include the file, you have to specify the path to it - either a “relative to the current file” path, or “relative to one of the include directories” path. There are two important goals here:
-
If you’re writing a library - a relatively small one, i.e. not a platform like Unreal Engine - the header files should require minimal configuration, so ideally the user does not have to add include directories to compile or use your library. For such projects, consider making all include paths current file-relative.
-
Otherwise, include paths should be easily greppable - the path to the same file should ideally be the same in all other files. So make all include paths include directory-relative; moreover, try to make sure that include paths are unambiguous - i.e. that you don’t have two different representations for the same file path, like and inside render project.
-
Whatever rule you use, try to make sure it’s consistent between different projects, as much as necessary. Ideally even the include directories should be the same, i.e. include directories for the engine project should be a strict subset of include directories for the game project.
And as a final advice - learn to use the preprocessor output (cl /E, gcc -E), learn to use the include output (cl /showIncludes, gcc -M), gather the codebase statistics (average size after preprocessing, most included header files, header files with largest payload, etc.) and optimize your codebase by eliminating dependencies and spreading the word. Nothing beats a sub-second iteration time.
Oh, did I mention that good header dependencies decrease the linking time?
« Lua callstack with C++ debugger | Z7: Everything old is new again » |