Cxref And The Linux Kernel

Cxref is designed to be able to process any C source code. As a test of this ability, I have used the Linux kernel source code as an example. This is a large program containing many different C language constructs, and GNU C extensions. These tests have been performed using cxref version 1.3, cxref version 1.4 and cxref version 1.5a.

How I performed the tests using cxref version 1.3

I used version 2.0.27 of the Linux kernel and version 1.3 of cxref for this test.

Create a consistent set of source files.: This is done by running make configure and selecting options that give the maximum amount of code. This means selecting all of the optional drivers, experimental features, and not using modules.
Build the kernel.: Use make zImage > make.out for this step. The make output will contain a list of all of the files that are used in the kernel as well as the extra options that are used by the compiler.
Create a script that runs cxref on each of the C source files.: This script is created from the make.out file from the previous stage by using perl make2cxref.pl < make.out > cxref-c.sh; chmod +x cxref-c.sh.
Run cxref on all of the C source files: Run the script cxref-c.sh.
Create a script that runs cxref on each of the C header files.: This script is created from the database created by cxref in the previous stage by using perl includes2cxref.pl < cxref/linux.include > cxref-h.sh; chmod +x cxref-h.sh.
Run cxref on all of the C header files.: Run the script cxref-h.sh.
Create the complete set of cross-references by running cxref on all of the files again.: Run the scripts cxref-c.sh and cxref-h.sh again.

Results

There are 556 .c files and ~148 .h files used in this test kernel.

1 .c file is created 'on the fly' by the Makefile so is missing (drivers/scsi/fake.c).
3 .c files fail due to a known cxref bug that I do not intend to fix yet. These are drivers/cdrom/sbpcd[234].c, and they work by including sbpcd.c, confusing cxref about which file the functions come from.
3 .c files are only ever included into other files and not compiled.
550 .c files are cross referenced OK.
~60 .h files fail due to unknown data types (See Question 2.4 in the FAQ). (These can be fixed easily.)
~4 .h files contain too much context depending on the file they are included from to be cross referenced.
~80 .h files are cross referenced OK.

Conclusions

The cxref program is now able to cope with the vast majority of C source code. This includes a large amount of GNU extensions that are used in the kernel. The reasons that cxref fails on the source code is not due to errors in cxref parsing the code. Instead it is that the files are used in ways that do not allow them to be cross referenced easily. Including one source code file into several other files does not fit in with the cxref idea of individual files. The one file no longer has any meaning out of the context of the files it is included into.

How I performed the tests using cxref version 1.4

With version 1.4 of cxref, there are a number of features that make it easier to use cxref to document an existing set of source code. The main one of these is a drop-in replacement for gcc that compiles the code and then cross-references and documents it. Actually it is a shell script that calls the compiler and then calls cxref, but however it works, it means that existing Makefiles only need to have the CC variable changed and documentation of the C files will be produced.

I used version 2.0.30 of the Linux kernel for this test

Create a consistent set of source files.: This is done by running make configure and selecting options that give the maximum amount of code. This means selecting all of the optional drivers, experimental features, and not using modules.
Prepare to use cxref: Edit the Makefile to replace gcc with cxref-cc. Set the CXREFCC environment variable to point to gcc. Put a .cxref file in each directory with the options required.
Compile the kernel: run make dep ; make clean ; make.
Create a script that runs cxref on each of the C header files.: This script is created from the database created by cxref in the previous stage by using perl includes2cxref.pl < cxref/linux.include > cxref-h.sh; chmod +x cxref-h.sh.
Run cxref on all of the C header files.: Run the script cxref-h.sh. This will need to be edited to cope with missing definitions (See Question 2.4 in the FAQ) the easiest way to do this is to use the -include switch to cpp to include the file before processing the source file. This switch can be given to cxref, this being one of the new features of this version.
Create the complete set of cross-references by running cxref on all of the files again.: Compile the kernel again and run the script cxref-h.sh again.

Results

There are 538 .c files and 136 .h files used in this test kernel.

All .c files are cross referenced OK.
~60 .h files fail due to unknown data types (See Question 2.4 in the FAQ). (These can be fixed easily.)
~4 .h files I could not find what they needed to include and gave up.
~72 .h files are cross referenced OK.

Conclusions

The new version of cxref can cope much better with the Linux kernel now. There is no longer the problem with including one C source file into another, and none of the C source files could not be cross-referenced and documented. The header files are more difficult, but if you understand how the program works then it would be easy enough to make it work.

How I performed the tests using cxref version 1.5a

For this test I used version 2.2.9 of the Linux kernel. The test method was the same as with version 1.4 of cxref with one small change. I changes the includes2cxref.pl script so that it directly calls cxref rather than creating a script as its output. This new script will be included in the next release of cxref.

Results

There are 1102 .c files and 312 .h files used in this test kernel.

1101 .c files are cross referenced OK.
1 .c file contains a syntax error (gcc ignores it though).
~156 .h files are cross referenced OK.
~156 .h files fail due to expected problems (See Question 2.4 in the FAQ).

One very noticeable feature of this version of cxref is that it takes a long time to perform the cross referencing. There are a large number of global variables that are visible in all of the source files since they are declared extern in a header file. This leads to a large amount of cross referencing being required and hence a long time to produce the output. For the first pass of the program where there are no outputs being produced the cross referencing is not required. Noting this in the source code and not doing the cross referencing produces a large increase in speed. This is not possible for the second pass where the outputs are being produced.

Another feature of this version of cxref is that it recognises all of the include files in the include directory as local include files rather than system ones. This is even when they are included into the C source files using the #include <...> format. This means that the HTML (or other format) contains large amounts of listing of which include files include each other. The reason for this is that the GNU CPP is recognising the files as local instead of global, the programmers choice should override this and changes have been made to do this.

Conclusions

The current version of cxref becomes very slow when the index files that it creates become very large. A solution has been implemented that will remove most of this penalty when creating the indexes in the first pass, but in the second pass when creating the output files (HTML etc) the cross referencing must be done and will be slow.